Projects

Data reductions for graph attention variants
blankphoto.png
blankphoto.png
blankphoto.png

There are a lot of data reduction techniques that have been applied over general transformers like Linformer (JL-Lemma), Reformer (using LSH), Nymstromformer (using Nymstrom method), etc. However, many of these approaches which have sped up attention computation have not been explored for speeding up graph attention variants. I am vouching for performing a baseline set of experiments to test some of these data reduction approaches to approximate GAT/GATv2's attention and measure how much drop on some downstream task is seen.

Machine learning the fine interior
blankphoto.png
blankphoto.png
blankphoto.png

First described in 1983, the Fine interior is a key combinatorial tool in Mirror Symmetry. Broadly speaking, a convex lattice polytope P corresponds to a hypersurface Z in a toric variety. Associate to P is the Fine interior F(P): the rational polytope given by moving all supporting hyperplanes of P in by lattice distance 1. Many geometric properties of Z can be deduced from combinatorial properties of F(P). For example, there exists a unique canonical model of Z if F(P) is non-empty, and the Kodaira dimension is determined by the dimension of F(P). Computing the Fine interior F(P) is computationally challenging and, despite being so important, almost nothing is known about how the combinatorics of P determines the dimension of F(P). This is an area perfect for investigation via Machine Learning.


In this project we will explore the classification of certain four-dimension lattice simplices -- those containing a single interior lattice point. Each of these 338,752 simplices can be described uniquely by an integer-valued vector (a_0,...,a_4), and in nearly every case we know the Fine interior as a result of brute-force computations totalling many decades of CPU time. We will ask whether Machine Learning can predict the dimension of F(P) directly from the vector (a_0,...,a_4) and, if successful, attempt to understand how the machine is performing this calculation. This should present us with unique insights into the combinatorics of the Fine interior, which in turn will generate a richer understanding of the underlying geometry.

Group invariant machine learning for Calabi-Yau polyhedra
blankphoto.png
blankphoto.png
blankphoto.png

There are many group invariant machine learning models, i.e. learnable functions that give the same output if the input is acted on by a group. One novel approach is using fundamental domain projections - an approach which is particularly useful if the group which acts is very large [1]. There are many large string theory datasets with large symmetry groups, which make good benchmarks for group invariant machine learning models. One such example is the Kreuzer-Skarke dataset of Calabi-Yau three-folds coming from reflexive polyhedra and their Hodge numbers [2].


In this project, we will apply invariant machine learning via fundamental domain projections to the Kreuzer-Skarke dataset and compare this with other group invariant machine learning techniques (e.g. data augmentation and deep sets), as well models that are not group invariant.


References

[1] Group invariant machine learning by fundamental domain projections, Benjamin Aslan, Daniel Platt, David Sheard, arXiv 2022

[2] Calabi-Yau data

Distilling large GNNs for molecules
blankphoto.png
blankphoto.png
blankphoto.png

Predicting the energy and forces of an atomic system is a central task for multiple fields of science, such as computational chemistry and material science. Large directional GNNs like GemNet [1] currently set the state of the art for this task on diverse datasets such as COLL and OC20. Unfortunately, these models are computationally expensive since they are centered around directed edge embeddings and their message passing is based on triplets or even quadruplets of nodes. This makes them prohibitively expensive for long simulations and large systems such as proteins. This project aims at making these models substantially faster, while retaining a similar level of accuracy. In order to achieve this, we will explore a combination of (i) cheap models that circumvent edge-based representations either partially or altogether such as PaiNN [2] and (ii) distilling large directional GNNs into cheaper regular GNNs.


There are three aspects that make distillation of atomic GNNs especially promising. First, the cheaper student GNN can actually have more learnable parameters than the teacher model. This is offset by using the parameters in a more efficient way that circumvents higher-order graph representations. Second, energy-and-force models allow us to generate an arbitrary amount of additional data without requiring a transfer set. Third, previous evidence suggests that models trained on small atomic systems generalize well to large systems. We thus do not need to train the expensive teacher GNN on large systems. Together, these properties suggest that an accurate and cheap atomic GNN might be within reach.


References

[1] Gasteiger, Becker, Günnemann. GemNet: Universal Directional Graph Neural Networks for Molecules. NeurIPS 2021

[2] Schütt, Unke, Gastegger. Equivariant message passing for the prediction of tensorial properties and molecular spectra. ICML 2021

Equivariant machine learning for vegetation dynamics
blankphoto.png
blankphoto.png
blankphoto.png

Predicting vegetation dynamics is a fundamental problem in ecology, especially in the context of climate change. In this project we aim to learn the equations that rule the vegetation dynamics from data with machine learning.


The input features to the learning problem include observables such as rainfall, vegetation density, waterabsorbed into the soil, etc. Each of these observables have precise units (liters per day per square meter, grams per square meter, and  liters per square meter,  respectively for the  examples  above); therefore  the learning should be units-equivariant. This means that if – for instance – we do a transformation where all the input features with units of meters are rescaled to inches, the predictions should transform accordingly. This symmetry is known as unit equivariance and it corresponds to group equivariance by an action by the (non-compact) group of scalings (see Section 3 of [3]).


A few methods, based  on classical dimensional analysis, have been recently proposed to model units-equivariant machine learning problems (see [3, 1]). In this project we propose to combine these ideas with symbolic regression and PDE integrators to learn the equations that produce the dynamics from data. For more information about the prediction problem and how the data is generated refer to Section 7 of [3] or contact Prof. Villar at soledad.villar@jhu.edu.


References

[1] Joseph  Bakarji, Jared  Callaham, Steven L Brunton, and J Nathan Kutz. Dimensionally consistent learning with buckingham Pi.arXiv:2202.04643, 2022.

[2] Max Rietkerk, Maarten C Boerlijst, Frank van Langevelde, Reinier HilleRisLambers, Johan van de Kop-pel, Lalit Kumar, Herbert HT Prins, and Andr ́e M de Roos. Self-organization of vegetation in aridecosystems. The American Naturalist, 160(4):524–530, 2002.

[3]  Soledad Villar, Weichi Yao, David W Hogg, Ben Blum-Smith, and Bianca Dumitrascu. Dimensionless machine learning: Imposing exact units equivariance. arXiv preprint arXiv:2204.00887, 2022.

Equivariant poset representations
blankphoto.png
blankphoto.png
blankphoto.png

Partially-ordered data is pervasive across a wide range of domains: from online user forums, to natural language understanding, to computer programs and, to bioinformatics. To develop successful applications in these domains, we need to learn representations mapping a hierarchy to a meaningful vector space. Partially ordered sets (posets) are combinatorial objects encoding such hierarchies. Despite the recent plethora of works on equivariant representations for other combinatorial structures, such as graphs and sets, posets have been consistently neglected in this context. In this project we will first discuss learning equivariant representations over combinatorial structures in general and then posets in particular. As a first model we will consider representing the poset as a Directed Acyclic Graph (DAG) and applying a Graph Neural Network (GNN) over it. We will then show how the DAG view does not explicitly encode all known aspects of a poset. Instead, we will consider applying a GNN over the poset zeta matrix (the analogous of the adjacency matrix for a poset). Finally, we will explore non-GNN-based equivariant architectures for representing posets. We will take inspiration from related notions of convolution over powersets to motivate the development of such a machinery.

Towards training GNNs using explanation feedbacks
blankphoto.png
blankphoto.png
blankphoto.png

Introduction. Graph Neural Networks (GNNs) are increasingly used as powerful tools for representing graph-structured data, such as social, information, chemical, and biological networks. As these models are deployed in critical applications (e.g., drug repurposing and crime forecasting), it becomes essential to ensure that the relevant stakeholders understand and trust their decisions. To this end, several approaches have been proposed in recent literature to explain the predictions of GNNs. Depending on the employed techniques, there are three broad categories: perturbation-based, gradient-based, and surrogate-based methods. While several classes of GNN explanation methods have been proposed, there is little to no work done on showing how to use these explanations to improve the GNN performance. In particular, there is no framework that leverages these explanations on the fly and aids the training of a GNN. This lack of understanding mainly stems from the fact that there is very little work on systematically analyzing the use of explanations generated by state-of-the-art GNN explanation methods 


Proposal. Previous research in GraphXAI has focused on developing post-hoc explanation methods. In this work, we propose in-hoc GNN explanations that act as feedbacks, on the fly, during the training phase of a GNN model. In addition, we aim to use the generated explanations to improve the GNN training. Using explanations, we plan to define local neighborhoods for neural message passing, e.g., for a correctly classified node u, we can generate the most important nodes in its local neighborhood N_u and then use it as a prior or generate augmented samples for guiding the message-passing of similar nodes in the subsequent training stages. To this extent, we propose to have an explanation layer after every message-passing layer which acts as a unit buffer that passes all the information to the upper layers during the forward pass, but propagates the explanation information back to the graph representations.

Learning graph rewiring using RL
blankphoto.png
blankphoto.png
blankphoto.png

Most GNNs are based on the concept of message passing, which is by itself based on information diffusion. In diffusion dynamics, key information lies in closer objects, and distant nodes’ effect is decimated [1]. However, it is not clear that the topological graph structure must dictate the information transfer on the graph. In fact, in many cases, such as combinatorial optimization problems, nodes and edges that are distant from a node may have a major impact on the node’s value or class. To that end, graph rewiring allows adding edges, nodes, or other structures in order to assist information transfer. In practice, it decouples the information graph from the topological (input) graph. In this project, we will investigate how we can (meta) learn to build better information graphs using RL. Specifically, our agent will learn how to modify (add/remove) edges, i.e., perform graph rewiring, to improve learning. Our goal is to publish the results of this project in a top ML conference.


References

[1] Understanding over-squashing and bottlenecks on graphs via curvature, Topping et. al., 2022.

[2] On the bottleneck of graph neural networks and its practical implications, Alon et. al., 2020

Contrastive learning
blankphoto.png
blankphoto.png
blankphoto.png

Contrastive learning seeks to train a representation function that encodes the similarity structure in a data set based on pairs of positive samples (similar data points) and negative samples (dissimilar data points). This project will investigate ways of incorporating geometric information, such as equivariances or symmetries, into Contrastive Learning approaches. Depending on the interests and expertise of the group, both computational and theoretical avenues of investigation may be pursued.

Graph-rewiring for GNNs from a geometric perspective
blankphoto.png
blankphoto.png
blankphoto.png

In this project we will investigate possible ways to formalize and understand the notion of graph-rewiring in the context of Graph Neural Networks from a geometric perspective. In certain regimes - existence of long-range dependencies that are crucial for the classification task or heterophily of the label (feature) distribution - the graph structure is known not to be beneficial and, in some cases, even harmful. However, a clear understanding of how classical GNNs behave with respect to specific topological transformations is still missing. Providing a clearer picture in this regard is intimately connected with the problem of stability of GNNs with respect to topological perturbations and might also shed some light on tackling expressivity from a different angle. The main goal of the project consists in studying notions of distances among graphs and associated classes of transformations that would allow us to better tackle the problem of graph-rewiring and analyse theoretically its effects on GNNs.

Line bundle cohomology formulae on Calabi-Yau threefolds
blankphoto.png
blankphoto.png
blankphoto.png

Cohomology is a universal tool in mathematics: from topology and geometry to algebra and number theory, cohomology is used to quantify the possible ways in which a problem fails to meet the naive expectations of the problem solver. As such, estimates of cohomology represent a vital key to problem solving. In areas of theoretical physics such as string theory, cohomology is linked to the low-energy quantum excitations of fields and strings that can be experimentally observed.


In practice, cohomology computations are notoriously difficult to carry out in general. However, it has been recently shown that line bundle cohomology dimensions on several classes of two-dimensional and three-dimensional complex manifolds, including compact toric surfaces, generalised del Pezzo surfaces, K3 surfaces and Calabi-Yau threefolds, are described by closed-form formulae. This new technique for the computation of cohomology uses a combination of algebro-geometric methods and exploratory machine learning methods. The goal of the project will be to go one step further and discover exact generating functions for line bundle cohomology on Calabi-Yau threefolds.


References:

[1] C. Brodie, A. Constantin, J. Gray, A. Lukas, F. Ruehle, Recent Developments in Line Bundle Cohomology and Applications to String Phenomenology, Nankai Symposium on Mathematical Dialogues 2021 Conference Proceedings, arXiv: 2112.12107.

[2] C. Brodie, A. Constantin, R. Deen, A. Lukas, Machine Learning Line Bundle Cohomology, Fortsch.Phys. 68 (2020) 1, 1900087, arXiv: 1906.08730.

[3] C. Brodie, A. Constantin, A. Lukas, Flops, Gromov-Witten invariants and symmetries of line bundle cohomology on Calabi-Yau three-folds, J.Geom.Phys. 171 (2022), arXiv: 2010.06597.

[4] C. Brodie, A. Constantin, Cohomology Chambers on Complex Surfaces and Elliptically Fibered Calabi-Yau Three-folds, arXiv: 2009.01275.

Deep functional map
blankphoto.png
blankphoto.png
blankphoto.png

3D Shape matching is a fundamental problem in computer vision and graphics with significant applications on biological data. In this project, we will take a closer look at Deep Functional Map (DFM) [1,2] paradigm for 3D shape matching.  You will become familiar with unsupervised DFM frameworks [2] and investigate a couple of directions less explored in the DFM literature. First, most DFM literature strongly relies on precomputed Laplacian Beltrami (LB) eigenbasis. There have been some recent attempts to learn an embedding [3,4] instead. However, it is not entirely clear in which circumstances learned embedding is more robust and useful than LB eigenbasis. Secondly, we will investigate cycle consistency constraints in DFM that provide a strong regularization by jointly optimizing the maps over a collection of shapes. By the end of the project, you should better understand these topics from both theoretical and practical perspectives.


References

[1] Litany et al., Deep Functional Map: Structure prediction for dense shape correspondence, ICCV 2017

[2] Roufosse et al., Unsupervised Deep Learning for 3D shape Matching, ICCV 2019

[3] Marin et al., Correspondence Learning via Linearly Invariant Embedding, Neurips 2020

[4] Sharma & Ovsjanikov, Joint Symmetry Detection and Shape Matching for Non-rigid Point Cloud, arXiv

Exploiting domain structure for music ML tasks
blankphoto.png
blankphoto.png
blankphoto.png

Learning to represent and generate music is a highly relevant task for the machine learning field. This data domain is ideal for density estimation tasks and exhibits many interesting properties, such as long-term dependencies and patterns residing at various scales. More crucially, music generation is a main pillar of creative AI, which complements the ML community efforts in pushing scientific progress and gives us the amazing opportunity to assist artists in their creative process [0]!


Existing state-of-the-art approaches for symbolic music generation [1, 2] operate on and produce sequences of tokens. However, the music domain contains structure at several scales: local (e.g. chords, arpeggios, multiple voices being played together in a fugue) and long-term/global (e.g. ABA form [3], the key that a piece is written in, repeating patterns/motifs). In that sense, there have been relatively few studies (e.g. [4, 5, 6, 7, 8]) that investigate explicit representations of structure for modelling and generating music. Each of these works has certain limitations—for instance, each piece modelled in [4] (see Fig. 2) is turned into a single graph by connecting all notes with temporal links, but does not capture the fundamental music-theoretical dependencies and insights that composers most likely used when writing the piece. Alternatively, the work in [7] and a few others look at rhythm-specific encodings only. Perhaps the closest modelling strategy to the one intended for this project is [8], where the authors choose to encode various musical relationships between bars of the score; however, it would be easier to work with more general graph representations of music, building up from first principles such as the circle of fifths / the tonality of a piece (e.g. use a sliding window and compute relationships / a graph within each local window and between windows at discrete points in the score - there are endless ways to think about this!)


This project aims to study the effects of leveraging music-theoretical graph representations in ML models. The goal is to encode symbolic music sequences in a principled manner that reflects the composition process and underlying structure up to a greater extent that previously. This encoding would then be passed as (additional) input to a model [1, 2, 9]. Finally, the effects on model performance would be studied in classification and/or generation scenarios (TBD based on time constraints).


Steps:

1. Download dataset(s), choose task(s)

2. Become familiar with basic music theory concepts and design 1 or more ways of encoding the structure in symbolic music (see Appendix A.2 [1] for a description of event-based MIDI representations; see Chordify in Resources; see Wiki)

3. Set up model codebase and obtain baseline performance on chosen tasks

4. Add relational structure to the model and find suitable ways to encode it

5. Interpret the new results and investigate changes in model processing (e.g. visualising attention in layers, emerging patterns)

6. Open-source code, to allow researchers to preprocess their own music data and build more graph-based ML models for music tasks!


Resources

- Music Transformer GitHub codebase

- MusicVAE GitHub codebase

- Perceiver AR codebase (to be released soon)

- MAESTRO dataset

- MetaMIDIDataset

- Lakh MIDI dataset

- MusPy - A toolkit for symbolic music generation

- Chordify - music21 Documentation

- Symbolic Representations

- Fundamentals of music

- A structural way to encode music

(https://en.wikipedia.org/wiki/Tonnetz) studied by https://ojs.aaai.org/index.php/AAAI/article/view/11880 (here, the resulting encoding was an image)


References

[0] Music 2020 Archives - AI Art Gallery

(http://www.aiartonline.com/category/music-2020/)

[1] Music Transformer

[2] General-purpose, long-context autoregressive modeling with Perceiver AR

[3] Ternary form

[4] Graph Neural Network for Music Score Data and Modeling Expressive Piano Performance

[5] StructureNet: Inducing Structure in Generated Melodies - IR Anthology

[6] Neurosymbolic Deep Generative Models for Sequence Data with Relational Constraints

[7] Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions

[8] MELONS: generating melody with long-term structure using transformers and structure graph

[9] A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music

Implicit neural filters for steerable CNNs
blankphoto.png
blankphoto.png
blankphoto.png

Steerable CNNs [1] and, in particular, Euclidean Steerable CNNs [2, 3] provide a general framework for building neural networks which are equivariant to groups beyond translations, for example reflections and rotations. Euclidean Steerable CNNs rely on standard convolution with steerable filters, i.e. filters satisfying a steerability constraint [2, 3, 4]. Filters are typically parameterized by linear combination of a steerable filter basis [4], obtained by analytically solving the steerability constraint. However, we note that if the filter is parameterized by an implicit MLP [5] (e.g. for point cloud data), the steerability constraint just requires the MLP to be equivariant. This suggests a simpler way to build steerable CNNs, which does not rely on the polar decomposition of the filters and spherical harmonics. Moreover, this strategy enjoys the flexibility of the implicit neural function, supporting wider filters with limited parameter cost. Since no steerable basis is required, this approach can also be easily extended to different equivariance groups with minor changes, provided an equivariant MLP can be built. We will explore applications for point cloud data (with full rotational symmetries or with only azimuthal symmetries) and molecular data.


References

[1] A General Theory of Equivariant CNNs on Homogeneous Spaces, Taco Cohen and Mario Geiger and Maurice Weiler

[2] 3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data, Maurice Weiler and Mario Geiger and Max Welling and Wouter Boomsma and Taco Cohen

[3] Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds, Nathaniel Thomas and Tess Smidt and Steven Kearnes and Lusann Yang and Li Li and Kai Kohlhoff and Patrick Riley

[4] A Program to Build E(N)-Equivariant Steerable CNNs, Gabriele Cesa and Leon Lang and Maurice Weiler

[5] PointConv: Deep Convolutional Networks on 3D Point Clouds, Wenxuan Wu and Zhongang Qi and Li Fuxin

Differential geometry for representation learning
blankphoto.png
blankphoto.png
blankphoto.png

A common hypothesis in machine learning is that the data lie near a low dimensional manifold which is embedded in a high dimensional ambient space. This implies that shortest paths between points should respect the underlying geometric structure. In practice, we can capture the geometry of a data manifold through a Riemannian metric in the latent space of a stochastic generative model, relying on meaningful uncertainty estimation for the generative process. This enables us to compute identifiable distances, since the length of the shortest path remains invariant under re-parametrizations of the latent space. Consequently, we are able to study the learned latent representations beyond the classic Euclidean perspective.


In this project you will develop methods to learn Riemannian metrics in the latent space of deep generative models. You will then use the learned metrics for computing shortest paths in the representation space and for fitting a statistical model that respects the learned nonlinear geometry of the data.


References

(*) Latent Space Oddity: on the Curvature of Deep Generative Models, Georgios Arvanitidis, Lars Kai Hansen, Søren Hauberg, International Conference on Learning Representations (ICLR), 2018.

(*) A prior-based approximate latent Riemannian metric, Georgios Arvanitidis, Bogdan Georgiev, Bernhard Schölkopf, International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.

(*) Fast and robust shortest paths on manifolds learned from data, Georgios Arvanitidis, Søren Hauberg, Philipp Hennig, Michael Tiemann, International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.

(*) A locally adaptive normal distribution, Georgios Arvanitidis, Lars Kai Hansen, Søren Hauberg, Advances in Neural Information Processing Systems (NeurIPS), 2016.

Helmhotlz-Hodge Laplacians: edge flows and simplicial learning
blankphoto.png
blankphoto.png
blankphoto.png

The Laplacian operator is a ubiquitous tool in signal processing. Generalizations of this operator to manifolds (i.e., the Laplace-Beltrami operator) and graphs (i.e., the graph Laplacian) have proved to be very useful for learning on non-euclidean domains. Helmholtz-Hodge theory can further generalize these operators to higher-order differential forms.  For example, the continuous 1-Laplacian can be used to analyze vector-fields on manifolds [1] and edge flows on directed or undirected graphs can be analyzed via wavelet-like bases generated by the Hodge-Laplacian [3]. Similarly, the dth-Laplacian can be used to define convolutional operations on d-degree complexes [2]. This project will use these operators to develop methods for solving signal processing problems such as denoising and in-painting on simplicial domains focussing on edge-based and face-based  signals. We do so using both traditional variational methods and neural network-based approaches.


References

[1] Y.-C. Chen, M. Meila and I. G. Kevrekidis, Helmholtzian eigenmap: Topological feature discovery and edge flow learning from point cloud data, arXiv preprint arXiv:2103.07626, (2021).

[2] S. Ebli, M. Defferrard, and G. Spreemann, Simplicial neural networks, arXiv preprint arXiv:2010.03633, (2020).

[3] T. M. Roddenberry, F. Frantzen, M. T. Schaub, and S. Segarra, Hodgelets: Localized spectral representations of flows on simplicial complexes, arXiv preprint arXiv:2109.08728, (2021).

Learning non-geodesic submanifolds
blankphoto.png
blankphoto.png
blankphoto.png

Representation learning aims to transform data x into a lower-dimensional variable z designed to be more efficient for any downstream machine learning task, such as classification. In this project, you will tackle representation learning for manifold-valued data x, and specifically delve into non-geodesic submanifold learning with algorithms of curve fitting and variational autoencoders (rVAE) on manifolds. You will contribute to the open-source package Geomstats by implementing a representation learning module which will unify and contrast the aforementioned methods.


References

(*) Miolane et al. Geomstats: A Python Package for Riemannian Geometry in Machine Learning (JMLR 2020).

(*) Miolane, Holmes. Learning Weighted Submanifolds with Variational Autoencoders and Riemannian Variational Autoencoders (CVPR 2020).

Exploring network medicine principles encoded by graph neural networks
blankphoto.png
blankphoto.png
blankphoto.png

Graph neural networks (GNNs) have enabled many successful biomedical applications when applying to biological networks - from prioritization of treatments, detection of side effects, to identification of protein function. Because of the impressive performance, it is hypothesized that GNN must capture fundamental biological principles [1] but it is unclear what they are and how GNNs encode these principles. There are decades of research in the field of network medicine [2] where researchers study the organizing principles of biological networks and described hypotheses of governing laws. In this project, we will explore what are these important network medicine principles, how they are manifested in the GNN embedding space. If GNN fails to encode some network medicine principles, then we can motivate a new method to tackle them.


References

[1] Li, Michelle M., Kexin Huang, and Marinka Zitnik. Graph Representation Learning in Biomedicine (2021). arXiv: 2104.04883

[2] Barabási, Albert-László, Natali Gulbahce, and Joseph Loscalzo. Network medicine: a network-based approach to human disease. Nature reviews genetics 12.1 (2011): 56-68.

Geometric tools for investigating loss landscapes of deep neural networks
blankphoto.png
blankphoto.png
blankphoto.png

Neural networks have achieved extraordinary success, but this achievement is limited to only those networks that are trainable with stochastic gradient descent (SGD). What is special about these networks? At the very least, SGD can effectively traverse their loss landscapes. In this project, we will train neural networks and investigate their loss landscapes using tools from geometry.


Shocking phenomena have been discovered in relation to loss landscape geometry. For example, the existence of sparse subnetworks within larger networks that can be trained in isolation to achieve comparable performance [1]. Or linear paths within loss landscapes that exhibit monotonically decreasing loss [2, 3] (or that connect together several minima [4, 5]). Explaining these phenomena is challenging because loss landscapes in deep learning are extremely high dimensional and come with minimal guarantees on their geometric structure. However, in practice, we observe a surprisingly rich structure in the loss landscapes of neural networks.


We will review existing work that has successfully utilized geometric tools to better understand deep neural networks (for example, [3,6]). While this project will have its roots in deep learning theory, there will be a significant element of empirical analysis as we implement and evaluate standard deep learning architectures. Some general questions that we may hope to address include: how does the geometry around initialization differ from that at local minima? How does the geometry at initialization shape optimization? Can we identify global structure in the loss landscapes of neural networks?


References

[1] The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, Frankle & Carbin. ICLR 2019

[2] Qualitatively characterizing neural network optimization problems, Goodfellow et al. ICLR 2015

[3] Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes, Lucas et al. ICML 2021

[4] Topology and geometry of half-rectified network optimization, Freeman & Bruna. ICLR 2017

[5] Linear mode connectivity and the lottery ticket hypothesis, Frankle et al. ICML 2020

[6] Exponential expressivity in deep neural networks through transient chaos, Poole et al. NeurIPS 2016

Characterizing generalization and adversarial robustness for set networks
blankphoto.png
blankphoto.png
blankphoto.png

Disobeying the classical wisdom of statistical learning theory, modern deep neural networks generalize well even though they typically contain millions of parameters. Hence, a new generation of learning theory has emerged to explain the characteristics of deep neural networks, such as generalization, overfitting or robustness. Empirical or theoretical, most of the works which prosper in bringing insights to the learning phenomenon, focus on convolutional networks, operating on the image domain. However, a vast majority of the computer vision problems involve either graphs or point clouds which live in unstructured domains. The goal of this project is to first empirically understand the generalization character of point cloud networks. To this end, we will deploy a series of state of the art measures. Guided by this empirical study, we aim to theorize how and why deep sets generalize. In particular, we will focus on topological data analysis as a unifying framework.

Adaptive frame averaging for invariant and equivariant representations
blankphoto.png
blankphoto.png
blankphoto.png

Many machine learning tasks require learning functions that are invariant or equivariant to known symmetries of the input data. Unfortunately, there is a significant challenge in the design of neural network architectures that simultaneously (a) take into account the symmetries, (b) are expressive enough to have small generalization error, and (c) are computationally efficient. Murphy et al. [1, 2] have shown that it is possible to sacrifice (c) --computational efficiency-- for (a) and (b) with the use of symmetrization (Reynolds operator). Recently, Puny et al. [3] have proposed a method (Frame Averaging) to improve the efficiency of symmetrization. However, in some tasks, Frame Averaging can lead to large generalization errors. This is because of a fundamental trade-off between computationally efficient and generalization error in symmetrization. This project will study this trade-off and propose efficient solutions to achieve both computational efficiency and better generalization error in the affected tasks.


References

[1] Murphy, Ryan L., Balasubramaniam Srinivasan, Vinayak Rao, and Bruno Ribeiro. Janossy pooling: Learning deep permutation-invariant functions for variable-size inputs. ICLR 2019.

[2] Murphy, Ryan, Balasubramaniam Srinivasan, Vinayak Rao, and Bruno Ribeiro. Relational pooling for graph representations. In International Conference on Machine Learning, pp. 4663-4673. PMLR, 2019.

[3] Puny, Omri, Matan Atzmon, Heli Ben-Hamu, Edward J. Smith, Ishan Misra, Aditya Grover, and Yaron Lipman. Frame Averaging for Invariant and Equivariant Network Design. ICLR 2022.

Generalized Laplacian positional encoding for graph learning
blankphoto.png
blankphoto.png
blankphoto.png

Laplacian positional encoding (LPE), i.e. using the graph laplacian eigenvectors as input node features to GNNs [1,2,3], has emerged as a successful and popular positional encoding scheme in the past few years. Nevertheless, the space of positional encodings on graphs is largely unexplored. Following the observation that laplacian eigenvectors are a solution to a specific problem: minimizing the L2 distance between node embeddings, weighted by their affinity score (see [1], section 3.1), our intention is to generalize LPEs to a family of positional encodings defined by the minimisers of the same function with general Lp norms (for p!=2).

The PE formulation presented above raises several interesting challenges and questions, which we aim to explore in this project.

  1. How to calculate these positional encodings? While the L2 based functional can be easily minimized by solving a generalized eigenvalue problem, it is not clear how to approach the minimization of the generalized functionals above. Is there an efficient way to calculate/approximate these minimizers?

  2. What graph/node properties do these embeddings capture? Which graph learning tasks can benefit from them? Where do we expect to see them outperform L2 embeddings?

  3. Expressive power. What is the expressive power of GNNs that use these positional encodings?

  4. Symmetries. Is it possible to devise GNN architectures that are invariant to the natural symmetries of these embeddings? (see [4] for a solution to the L2 case)

The plan is to explore these questions in the week we have and then continue the project with a focus on a specific challenge, aiming for a submission to ICLR/ICML. Candidates with some prior knowledge in optimization theory / GNN expressive power / equivariant deep learning are the most suitable for this project.


References

[1] Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Belkin and Niyogi, Neural Computation 2003

[2] Rethinking graph transformers with spectral attention, Kreuzer et al., NeurIPS 2021

[3] A generalization of transformer networks to graphs, Dwivedi et al., AAAI WS 2021

[4] Sign and Basis Invariant Networks for Spectral Graph Representation Learning, Lim et al., 2022

PDE-inspired sheaf neural networks
blankphoto.png
blankphoto.png
blankphoto.png

Cellular sheaves [1, 2] equip graphs with a "geometrical" structure by attaching vector spaces and linear maps to their nodes and edges. It turns out that this additional structure can help mitigate some of the well-known problems of Graph Neural Networks. In a recent paper [3], we studied neural diffusion PDEs on sheaves and proved they have many desirable properties, such as better performance in heterophilic settings than GNNs and robust behaviour in the infinite-time limit (i.e. infinitely many layers).


In this project, we aim to extend this approach to other sheaf-based PDEs that behave differently from diffusion, leading to novel Sheaf Neural Network models. We will be studying the theoretical properties of these neural PDEs (e.g. Do they minimise some energy? Are the dynamics stable?) with the ultimate goal of building practical models for node classification and regression tasks.


References

[1] Jakob Hansen, Robert Ghrist, Toward a Spectral Theory of Cellular Sheaves, Journal of Applied and Computational Topology volume 3, pages 315–358 (2019)

[2] Jakob Hansen, Thomas Gebhart, Sheaf Neural Networks, NeurIPS 2020 Workshop on TDA and Beyond

[3] Bodnar et al., Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs, Preprint 2022

Latent graph learning for multivariate time series
blankphoto.png
blankphoto.png
blankphoto.png

Multivariate time series are prevalent in a variety of domains, including healthcare, transportation, space science, biology, and finance. Recently, it has attracted increasing attention to leverage the structural relationships among channels to learn stronger representation. Previous studies demonstrated that inter-sensor correlations bring rich information in modelling time series. In this project, we will learn how to use a graph neural network model to learn the latent relationship between different time series sensors (such as different leads in ECG signals). The learned graph patterns will correspond to the task. The model will be evaluated by the PhysioNet benchmark.