Statistical Analysis of Network Data

Methods and Models

SOFTWARE

There does not appear to be, at this point in time, any single software package containing pre-developed tools for all of the types of network analyses covered in the book. In writing the book, I have drawn on various resources. Most network graph visualization was done using the graph drawing package Pajek, while most of the network-oriented computations (e.g., simulations, modeling fitting, etc.) were done using the statistical software package R.

Below is a more detailed description of some of the software resources used in constructing the examples for this book, as well as certain other related resources that may be of interest.

Network Analysis

Good network analysis packages allow for efficient input and manipulation of network graph data. At a minimum, they include tools for common graph-theoretic operations (e.g., shortest path calculations, flow analysis, etc.) and basic descriptive analysis (e.g., degree distributions, centrality, partitioning, etc.). In addition, they may include tools for simulation of different classes of random graphs and, in some cases, network graph modeling. Network visualization capabilities tend to vary with these packages, but for that purpose there are dedicated software tools (see below).

R packages

R is an open-source software environment for statistical computing and graphics. There are a number of contributed packages relating to the statistical analysis of networks and network data. I have used two of these with some regularity in the book.

igraph is a package for the generating, manipulating, analyzing, and visualizing network graphs, of sizes up to millions of vertices and edges. (This package is also implemented as a C library and a Python extension module.)
statnet is a suite of software packages for network analysis and modeling, that allows for the estimation, evaluation, and simulation of network models, as well as network analysis and visualization. The network models include exponential random graph models (ERGMs) and latent variable models. Model fitting and evaluation is driven by a core of appropriate MCMC algorithms.

Matlab toolboxes

Matlab is a commercial software environment for technical computing. Some members of the user community have developed toolboxes that allow one to conduct network analysis to varying extents. The most comprehensive appears to be the MatlabBGL toolbox, which offers a combination of tools for graph-theoretic calculations, network analysis, network graph generation, and visualization. See MatlabCentral for more information on this and other related packages.

Other

Many collections of network analysis tools may be found as part of larger special-purpose software packages. See, for example, the popular Bioconductor package in R or the Bioinformatics Toolbox in Matlab. Conversely, certain tools are implemented in stand-alone form. For example, I used the Windows-based mfinder package for motif detection.

Network Visualization

While most of the network analysis packages mentioned above offer the capability of visualizing network graphs, the task of visualization is challenging enough that dedicated software for this purpose typically may be required to obtain high-quality results. There is a relatively large body of such software available. The list below is meant to be illustrative and clearly not exhaustive.

Pajek

Pajek is a freely available (for non-commerical use) Windows-based package for the visualization of large networks. It also has a suite of network analysis tools, mainly oriented towards social network analysis. There is a non-trivial time investment up front necessary to acclimate oneself to the unique input format and the GUI interface. However, the software is capable of producing high-quality network visualizations allowing for a great deal of fine tuning, and was used to produce the majority of the visualizations in this book.

Graphviz

Graphviz is an open-source software for graph visualization, developed by researchers at AT&T. Like Pajek, it allows for a variety of high-quality layouts. Graphviz has been used by other packages as the muscle behind their own graph visualization capabilities. For example, the Bioconductor package in R mentioned above has a visualization sub-package called Rgraphviz built on top of the basic Graphviz package.

Other

For some tasks, special-purpose drawing software may be useful. For example, I used the software yEd for drawing most of the tree diagrams in the book. Alternatively, one may have certain platform requirements or programming language requirements/preferences. Appendix A of the book Drawing Graphs: Methods and Models, by Kaufmann and Wagner (Eds), provides a useful list of additional resources for graph drawing.

STATISTICAL ANALYSIS OF NETWORK DATA: METHODS AND MODELS
Eric D. Kolaczyk

Image of the cover for 'Statistical Analysis of Network Data'