Foundations of adjacency spectral embedding

Daniel Lewis Sussman

January 2014

Abstract

The eigendecomposition of an adjacency matrix provides a way to embed a graph as points in finite dimensional Euclidean space. This embedding allows the full arsenal of statistical and machine learning methodology for multivariate Euclidean data to be deployed for graph inference. Our work analyzes this embedding, a graph version of principal component analysis, in the context of various random graph models with a focus on the impact for subsequent inference. For the stochastic blockmodel, with a finite number of blocks of stochastically equivalent vertices, Sussman, et al (2012), Fishkind, et al (2013) and Lyzinski, et al (2013) show that clustering the embedded points using k-means accurately partitions the vertices into the correct blocks, even when the embedding dimension is misspecified or the number of blocks is unknown. For the more general random dot product graph model, an example of a latent position model, Sussman, et al (2013) shows that the latent positions are consistently estimated by the embedding which then allows for accurate learning in a supervised vertex classification framework. Tang, et al (2012) strengthens these results to more general latent position models. Athreya, et al (2013) provide distributional results, akin to a central limit theorem, for the residuals between the estimated and true latent positions which provides the potential for deeper understanding of these methods. In summary, these papers demonstrate that for a broad class of graph models and inference tasks, adjacency-spectral embedding allows for accurate graph inference via standard multivariate methodology.

Type

Thesis