|Statistical Analysis of Network Data|
|Methods and Models|
Below is a more detailed description of some of the software resources used in constructing the examples for this book, as well as certain other related resources that may be of interest.
Good network analysis packages allow for efficient input and manipulation of network graph data. At a minimum, they include tools for common graph-theoretic operations (e.g., shortest path calculations, flow analysis, etc.) and basic descriptive analysis (e.g., degree distributions, centrality, partitioning, etc.). In addition, they may include tools for simulation of different classes of random graphs and, in some cases, network graph modeling. Network visualization capabilities tend to vary with these packages, but for that purpose there are dedicated software tools (see below).
R is an open-source software environment for statistical computing and graphics. There are a number of contributed packages relating to the statistical analysis of networks and network data. I have used two of these with some regularity in the book.
Matlab is a commercial software environment for technical computing. Some members of the user community have developed toolboxes that allow one to conduct network analysis to varying extents. The most comprehensive appears to be the MatlabBGL toolbox, which offers a combination of tools for graph-theoretic calculations, network analysis, network graph generation, and visualization. See MatlabCentral for more information on this and other related packages.
Many collections of network analysis tools may be found as part of larger special-purpose software packages. See, for example, the popular Bioconductor package in R or the Bioinformatics Toolbox in Matlab. Conversely, certain tools are implemented in stand-alone form. For example, I used the Windows-based mfinder package for motif detection.
While most of the network analysis packages mentioned above offer the capability of visualizing network graphs, the task of visualization is challenging enough that dedicated software for this purpose typically may be required to obtain high-quality results. There is a relatively large body of such software available. The list below is meant to be illustrative and clearly not exhaustive.
Pajek is a freely available (for non-commerical use) Windows-based package for the visualization of large networks. It also has a suite of network analysis tools, mainly oriented towards social network analysis. There is a non-trivial time investment up front necessary to acclimate oneself to the unique input format and the GUI interface. However, the software is capable of producing high-quality network visualizations allowing for a great deal of fine tuning, and was used to produce the majority of the visualizations in this book.
Graphviz is an open-source software for graph visualization, developed by researchers at AT&T. Like Pajek, it allows for a variety of high-quality layouts. Graphviz has been used by other packages as the muscle behind their own graph visualization capabilities. For example, the Bioconductor package in R mentioned above has a visualization sub-package called Rgraphviz built on top of the basic Graphviz package.
For some tasks, special-purpose drawing software may be useful. For example, I used the software yEd for drawing most of the tree diagrams in the book. Alternatively, one may have certain platform requirements or programming language requirements/preferences. Appendix A of the book Drawing Graphs: Methods and Models, by Kaufmann and Wagner (Eds), provides a useful list of additional resources for graph drawing.