Many familiar data analysis techniques can be viewed as attempts to approximate a cloud of data points with a lower dimensional shape with a simple description. For example clustering can be viewed as an approximation of the dataset by a small collection of discrete points, while Principal Component Analysis (when taking only two components) is in essence the approximation of the dataset by a plane. These simple shapes are examples from a much more general set of shapes known as manifolds, which are characterized by the fact that they locally look like points, lines, planes, or hyperplanes. Other examples of manifolds include smooth curves (which locally look like lines) and surfaces (which locally look like planes). The success of more sophisticated machine learning techniques like deep neural networks has been attributed to their ability to find the shape of the data from among more general classes of manifold (Brahma et al. 2016).
Higher-order Networks and the Topology of Data
The manifolds learned in the case of neural networks, however, are hidden within the potentially billions of matrices describing the calculations performed by each of the nodes of the neural net. However, there is another approach to finding the latent manifolds, one which connects higher geometry, network science, and visualisation techniques. We use mathematical objects called simplicial complexes, which were originally developed as computational representations of manifolds by pure mathematicians, and repurposed into the representations of objects in three-dimensional visualization. Concretely, a simplicial complex is a set of nodes, edges connecting two nodes, and higher-order edges called 'simplices' connecting three or more nodes each. Simplicial complexes can be constructed from point set data by various well-established techniques, which are designed such that, if the original dataset lies on a manifold , and enough data points are sampled from each portion of the manifold, the original manifold can be reconstructed. Hence, simplicial complexes form a bridge between data analysis and geometry. Simplicial complexes can also form a bridge to network theory, since simplicial complexes are a generalization of networks, and can be reduced to an ordinary network merely by leaving out the simplices, leaving what is called the ‘skeleton’ of the simplicial complex. In addition, many standard techniques for analysing the structure of networks have already been extended to simplicial complexes.
The members in our lab come from a range of different technical backgrounds, including network theory, mathematics, three-dimensional visualization, and data science so working with simplicial complexes allows for cross-pollination between these subjects. One result of this is a technique due to M. Youssef, that determines whether a network has a large scale ring structure in a way that is independent of observation of any particular layout. This is done by constructing a simplicial complex from the network, then applying techniques from pure mathematics to what extent the simplicial complex resembles a circle. An extension of this idea, due to C. Hütter, is to determine which manifolds a network resembles, and use this knowledge to make a better choice of layout to capture its structure. A final method in this vein, due to J. Hancock (me) involves re-purposing tools from dimensionality reduction to generate a simplicial complex for a simple classic dataset, the iris dataset, which is both visually appealing and allows for useful downstream analysis.