Methods and Models  
line decor
line decor




A list of datasets used in this book may be found below. In most cases, I am able to make the data themselves available. In a few cases, I am not, due to proprietarity or privacy issues. For each of those datasets made available here, I have combined a data file(s) with a README file, in the format of a compressed ZIP file. In the README file are given a description of the data, a brief characterization of the context in which they arise, and relevant information on their source.

Please note that for the majority of these datasets it is only due to the generosity of various of my colleagues that they are being made available for general use. In using any of these datasets, please acknowledge the sources appropriately.

  • Abilene datasets
    • Delay data
    • Origin-destination traffic flows
    • Aggregate flow volume data based on measurements of origin-destination flows on the Abilene network, taken continuously over a seven-day period, starting December 22, 2003.

  • AIDS blogs
  • Network of citations among blogs related to AIDS, patients, and their support networks, collected by Gopal, over a three-day period in August 2005.

  • Austrian telephone calls
  • Epileptic seizures
  • ECoG time series data corresponding to two periods (so-called 'pre-ictal' and 'ictal') of a seizure in an epilepsy patient, for eight separate seizures. Measurements are taken at each of 76 electrodes in the brain of the patient, allowing for the construction of association-based networks in studying functional connectivity.

  • Karate club
  • Zachary's well-known 'Karate Club' social network.

  • Lawyer collaboration
  • Lazega's data on the collaborative working relationships among lawyers in a New England law firm. I am unable to make these data available, due to privacy constraints associated with the original study.

  • Microarray experiments in E. coli
  • A subset of the microarray data for E. coli available from the Many Microbe Microarrays Database, as well as a subset of the known regulatory interactions for E. coli listed in the RegulonDB database.

  • Packet delay
  • Packet delay data from Coates et al. resulting from an Internet packet probing experiment designed for conducting network topology inference.

  • Protein interactions in S. cerevisiae
  • A network of interactions among 5151 proteins in S. cerevisiae (i.e., baker's yeast), culled from the January 2007 BioGRID database.

  • Protein function in S. cerevisiae
  • A sub-network of the above protein interaction network, induced by those proteins annotated with the function `Cellular Communication' in the January 2007 version of the Gene Ontology (GO) database, as well as labels indicating which of those proteins are further annotated with `Intracellular Signaling Cascade' (i.e., a more specific form of cellular communication).

  • Router-level Internet
  • A network representation of a portion of the router-level Internet, based on topology discovery measurements collected between April 21 and May 8, 2003 by the skitter measurement system at CAIDA.

  • Scientific citations
  • These data were collected as part of a project at Sandia Labs and are proprietary and unavailable. Figures 3.5 and 3.6 corresponding to these data were furnished directly to me by Kevin Boyack.


Eric D. Kolaczyk

Image of the cover for 'Statistical Analysis of Network Data'