Login to your profile!



No account? sign up!

NETWORK REPOSITORY
A SCIENTIFIC NETWORK DATA REPOSITORY WITH
INTERACTIVE VISUALIZATION and MINING TOOLS

The first interactive network data repository with visual analytic tools
The largest network data repository with thousands of network data sets
Interactive network visualization and mining
Download thousands of real-world network datasets: from biological to social networks

NETWORK DATA SETS
INTERACTIVE VISUALIZATION
DOWNLOAD NETWORK DATA SETS

Explore network data sets and visualize their structure
Interactive statistics and plots
Download massive network data of billions of edges



Code and Libraries for Network Analysis and Graph-based Learning and Other FAQ

Code for reading and analyzing graphs in mtx format (as well as many other graph formats):

Note that the graph readers in PMC and PGD are quite robust, and capable of reading and inferring the format type of the graph in many cases. Also, the mtx readers in those libraries are able to read mtx files that may violate one or more of the MTX file characteristics.

Note comments are denoted by at least one %. The header line is usually the first line that begins with at least one % and in many cases two %%, followed by MatrixMarket and in most cases four fields that describe the data stored. In general, the first line to appear without % represents N M K where N is the number of rows, M is the number of columns, and K is the number of nonzeros in the matrix. For undirected graphs, the first line to appear without % represents N N M where N is the number of nodes and M is the number of edges. For instance, the first line above is 4 4 6 and indicates the number of nodes is N=4 and number of edges is M=6.

A graph file with the extension .mtx is read (by PGD and PMC above) using this (somewhat) strict mtx graph reader. Thus, if the graph file does not strictly follow the above mtx format (e.g., if the graph is an edge list, without the header line or the line that encodes the number of rows, columns, and nonzeros, then the file extension should be changed to allow it to be read by the more flexible graph reader discussed below.

    Edge list: These codes are designed to be as flexible as possible and accept many variations of edge lists. Note these codes may be slightly slower than the mtx reader. This is due to allowing flexible edge list formats. Hence, this reader must perform many checks to figure out the exact format of the input file, and performs any necessary preprocessing work that may be required.

    • Delimiters: The reader accepts comma, space, or tab delimited edge lists.

    • Comments: Comments are allowed and should be denoted with the first character of a newline as # or %

    • Weights: If an edge list contains weights on the 3rd column, they are simply ignored. A user may specify to read the weights by setting the wt parameter or by noting the graph is in fact a temporal graph.

    • Multigraph: When an edge list contains multiple edges, we simply remove the duplicate edges.

    • The edge list may also contain gaps in the vertex ids (non sequential vertex ids) and start from any positive integer. Self-loops are removed.

    • The edge list is assumed to be undirected. However, if a directed graph is given, it is simply treated as undirected.

The Matrix Market File Format

  • ASCII format;
  • allow comment lines, which begin with a percent sign;
  • use a "coordinate" format for sparse matrices;
  • use an "array" format for general dense matrices;

A file in the Matrix Market format comprises four parts:

  1. Header line: contains an identifier, and four text fields;
  2. Comment lines: allow a user to store information and comments;
  3. Size line: specifies the number of rows and columns, and the number of nonzero elements;
  4. Data lines: specify the location of the matrix entries (implicitly or explicitly) and their values.

The header line has the form

%MatrixMarket object format field symmetry

or

%MatrixMarket object format field symmetry

The header line must be the first line of the file, and the header line must begin with the string %MatrixMarket or %%MatrixMarket. The four fields that follow that string are

  • object is usually matrix, and that is the case we will consider here. Another legal value is vector, whose format is similar, but with some obvious simplifications.
  • format is either coordinate or array;
  • field is either real, double, complex, integer or pattern.
  • symmetry is either general (legal for real, complex, integer or pattern fields), symmetric (real, complex, integer or pattern), skew-symmetric (real, complex or integer), or hermitian (complex only).

If the field of a matrix is pattern, then only the locations of the nonzeros will be listed. This presumes, obviously, that we are using the coordinate format!

If the symmetry of a matrix is symmetric or hermitian, then only the entries on or below the main diagonal are to be listed. If the symmetry is skew-symmetric, then only the entries strictly below the main diagonal are to be listed.

The comment lines, if any, should follow the header line. The only requirement is that each comment line begin with a percent sign.

If format was specified as array, then the size line has the form:

m n
where
  • m is the number of rows in the matrix;
  • n is the number of columns in the matrix;

If format was specified as coordinate, then the size line has the form:

m n nonzeros
where
  • m is the number of rows in the matrix;
  • n is the number of columns in the matrix;
  • nonzeros is the number of nonzero entries in the matrix (for general symmetry), or the number of nonzero entries on or below the diagonal (for symmetric or Hermitian symmetry), or the number of nonzero entries below the diagonal (for skew-symmetric symmetry).

If format was specified as array, there must follow exactly m * n data lines, one for each entry, listed by columns, having the form

value
where
  • value is the value of the entry. If the field is complex, a pair of real numbers is required.

If format was specified as coordinate, there must follow exactly nonzeros data lines, one for each matrix entry that is to be listed, having the form

i j value
where
  • i is the row of the entry;
  • j is the column of the entry;
  • value is the value of the entry. If the field is complex, a pair of real numbers is required. If the value of format was pattern, then no value is listed here; only the values of i and j occur.