LOGML 2024

Marco Fumero

Marco Fumero is an ELLIS Ph.D. student at Sapienza University of Rome (moving to IST Austria as PostDoc in prof. Locatello group) in the GLADIA research group led by Professor Emanuele Rodolà. His previous experiences includes a research internship at Autodesk AI LAB and Amazon AWS AI working on geometric deep earning and causal representation learning topics. His research stands at the intersection of geometry and deep learning with a focus on representation learning, disentanglement and out-of-distribution generalization. He has been recently focusing on the direction of latent space communication, and, more broadly, on the question of when how and why distinct learning processes yield similar representation, organizing also the UniReps workshop at NeurIPS 2023 on these topics.

Project

Neural networks embed the geometric structure of a data manifold lying in a high-dimensional space into latent representations. Ideally, the distribution of the data points in the latent space should depend only on the task, the data, the loss, and other architecture-specific constraints. However, factors such as the random weights initialization, training hyperparameters, or other sources of randomness in the training phase may induce incoherent latent spaces that hinder any form of reuse. In a recent work [1] , it has been empirically observed that, under the same data and modeling choices, the angles between the encodings within distinct latent spaces do not change.To exploit this observation,we can compute a representation based on the latent similarity between each sample and a fixed set of training points, denoted as anchors. Given a specific choice of anchors and similarity function, the representation will retain different properties: choosing cosine similarity in the latent spaces enforces the desired invariance to angle preserving transformation of the latent space, without any additional training procedures. Neural architectures can leverage these relative representations to guarantee, in practice, invariance to latent isometries and rescalings, effectively enabling latent space communication: from zero-shot model stitching to latent space comparison between diverse settings, on different datasets, spanning various modalities (images, text, graphs), tasks (e.g., classification, reconstruction) and architectures (e.g., CNNs, GCNs, transformers).

The choice of similarity function should not be limited to capture invariances to angle preserving transformations. As shown in [2], other choices are good as well, and there’s not a clear best choice for capturing transformation across distinct latent spaces, depending on the setting and nuisance factors that affect the representations and cause the spaces to differ.

In this project the objective is to follow this direction by extending the notion of similarity function in two different ways aiming at enhancing the expressivity of this framework using differential geometry and topological analysis tools.

Higher Order Relative Representations: The first direction aims to enhance the scope of the relative representation function by considering a similarity function that takes three or more arguments in input, switching from considering binary relations between data points to n-way relations. From a geometrical perspective, the standard relative representation corresponds to constructing a bipartite graph between anchors and sample: considering n-way relations is analogous to constructing a simplicial complex in the latent space. For instance, three-way relationships correspond to triangles, four-way to tetrahedra, and so on. The objective here is to assess whether this extension enriches the framework’s expressiveness from both practical and theoretical standpoints.

Relative Geodesic Representations: Our second area of exploration involves utilizing geodesic distance as a similarity function, calculated within latent spaces, as a metric for relative representation. This approach ensures that the relative space remains invariant to the isometries of the data’s manifold, defined by geodesic distance. The primary objective of this project is to experimentally determine whether this set of transformations results in a more expressive and efficient relative space, one that more accurately encapsulates the relationships and transformations across different latent spaces. A secondary goal is to devise effective approximation techniques for computing geodesics, thereby enhancing the practical efficiency of the method.

[1] Moschella, L., Maiorca, V., Fumero, M., Norelli, A., Locatello, F., & Rodola, E. (2022). Relative representations enable zero-shot latent space communication.ICLR 2023

[2] Cannistraci, I., Moschella, L., Fumero, M., Maiorca, V., & Rodolà, E. (2023). From Bricks to Bridges: Product of Invariances to Enhance Latent Space Communication. arXiv

[3] Shao, H., Kumar, A., & Thomas Fletcher, P. (2018). The Riemannian geometry of deep generative models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 315-323).