My research broadly falls into three groups:
1. Proving large data limits of variational problems on graphs.
2. Exploiting connections between neural networks and PDE's.
3. Applications of optimal transport type distances to signal and image analysis.
I am also a part of the Cambridge led project on AI assisted diagnosis and prognostication in Covid-19, and the European Union's Horizon 2020 program NoMADS.
Below you can read more, or visit my publication page for references.
- Large Data Limits of Variational Problems on Graphs
Several problems from machine learning can be posed as variational problems on graphs, for example balanced cuts and Laplace learning (also known as Laplacian regularisation or Kriging). My interest is in establishing large data limits of the minimisers. Essentially this is the question of when a empirical expectation can be replaced by the population average. The first problem one must address is how to compare a discrete function on a graph with a continuum function on the support of a probability measure. To avoid imposing stringent regularity conditions this is done via an optimal transport framework that allows one to measure Lp type distances via a optimal matching between the discrete and continuum domains. Armed with such a topology one can then seek to use tools from the calculus of variations, such as Gamma-convergence, and PDE's to establish convergence (or lack-of) of the minimisers.
In semi-supervised learning one has a set of feature vectors and a subset of labels. Laplace learning aims to find the smoothest interpolation between the known labels. The notion of smoothest is defined with respect to the geometry of the graph. In this area I am particularly interested in results that establish bounds on the graph-scaling and number of labels for well-posed and ill-posed limits. Furthermore, I am interested in methods that correct for the bias that dominates in the ill-posed regimes (for example, this lead to a new method named Poisson learning that is particularly effective at low labelling regimes).
Clustering can be thought of as a special case of semi-supervised learning when the set of labels is empty. In this case one seeks to partition the feature vectors based only on geometric information. The balanced cut problem, in particular Cheeger cuts, can be related to isoperimetric inequalities. This connection allows one access to a range of tools to understand the behaviour of the graph-based variational problem.
2. From PDE's to Neural Networks
Neural networks are in fashion for a large range of machine learning problems, however theoretical understanding has lagged behind application success. One particularly interesting (for me!) theoretical framework is to compare neural networks with differential equations via a deep layer / infinite width limit. In this approach I am interested in proving (in a variational framework) the connection between neural networks and differential equations and using insights to design new neural architectures. Particularly of interest are graph neural networks where theory from my graph-based learning work, for example, can be applied to design networks which scale better with the network depth (i.e. overcome the oversmoothing problem).
3. Applications of Optimal Transport Distances
Optimal transport distances, such as the Wasserstein distance, have found applications in many areas of mathematics from probability to geometry, and partial differential equations to the calculus of variations. For example, the heat equation arises as the gradient flow of entropy with respect to the Wasserstein distance, and the isoperimetric inequality can be proved via optimal transport maps.
In data science optimal transport distances have found use as a good measure of distance between images / signals. The Lagrangian nature makes optimal transport distances a more physically justifiable modelling assumption compared to Eularian type distances (such as Lp distances).
Two major disadvantages of optimal transport distances are (1) their computational cost and (2) the lack of off-the-shelf data analysis tools. Linear optimal transport attempts to (partially) overcome these issues by defining a projection (into the tangent plane) so that the Euclidean distance of projections is approximately the optimal transport distance. This goes some way to reducing the computational cost, as effectively the numerical expense becomes linear in the size of the dataset, and standard tools, such as principal component analysis, are applicable in the space of projections. One can map between the space of projections and the optimal transport manifold allowing, for example, modes of variation to be easily visualised.
My interest in linear optimal transport distances centres around the Hellinger--Kantorovich distance and the TLp distance (the latter being the metric used to establish discrete-to-continuum limits in graph-based variational problems). I'm also interested in applications such as to medical/material fingerprinting.