Graph9 Program

Definition
How it works
User Input
Program Output
Download
Authors




What the program does

This Graph9 program does clustering based on adjacency matrix. An adjacency matrix representation of a graph is a V-by-V array of Boolean values, with the entry in row v and column w defined to be 1 if there is an edge connecting vertex v and vertex w in the graph, and to be 0 otherwise.


Definition

Connected graph -- A graph which each node can reach all other nodes via edges.

Cluster of genes -- A biological term refering to a connected graph of genes.

Bridges -- In Computer Science sense it should be called articulation point, but in the field of biology we use the term "bridge" to refer to the same thing. Bridges are nodes that if removed, would generate disconnected graphs.

Type of bridges X, Y and Z -- Removing a "X" bridge leads to formation of two or more connected graphs. Removing a "Y" bridge leads to formation of one connected graph and one or more single nodes. Removing a "Z" bridge leads to more than one connected graphs and more than one single nodes. ( Notice: definitions here are not onto)
Bridge type XBridge type YBridge type Z



How the program works

The Graph9 program first reads in a node file and a matrix file. The node file contains a list of all the node names. The matrix file contains the edges' information. After analyzing these two files, connected graphs as well as single nodes will be identified. For all the connected graphs, corresponding node files and matrix files will be generated with the file names subgroup_N and submatrix_N. (N is the index of each graph)


User input

Right after calling Graph9 from the command line, it will ask you for two input file, which are the node file and the matrix file described above. Then it will ask for a cutoff value. The cutoff value determines whether an edge is strong enough to be considered an edge in graphs. An edge with strength value less than the cutoff value will be considered non-existent.

The format of each line in the node file is: NodeName

The format of each line in the matrix file is: NodeA NodeB strengthValue

Then Graph9 will ask if the user wants to perform a bridge analysis on the graphs. If so, all possible bridges X, Y and Z will be identified in all graphs. This bridge analysis consumes relatively more time and more system resources, thus can significantly slow down the system especially when the subgroups are large. For example, clustering the whole arabidopsis genome with the bridge analysis on a 1 GHz machine takes about one week to finish, but only a few minutes without the bridge analysis.


Program output

nodeFileName.group_degree_info

  • First column -- Node (gene) ID
  • Second column -- Number of other genes connected to the current gene
  • Third column -- Graph size (number of genes in the graph)
  • Forth column -- Group number
  • Fifth column -- Type of bridge (X, Y or Z). If this node is not a bridge, "-" is assigned.
  • Sixth column -- Either "****" or nothing. The "****" indicates the beginning of a new group. This is only for visual purpose.

subgroup/subgroup_N (there are N files in this directory)

  • Node names belonging to subgroup N will be listed line by line.

submatrix/submatrix_N (there are N files in this directory)

  • The edges' info belonging to subgroup N will be listed line by line in the format "NodeA NodeB strengthValue".



Download source files

sample node input file
sample matrix input file
sample output file

Graph9 source file
Graph9 win executable file

In UNIX, type "make" at the command line.

cover


Graph9 is under the GNU General Public License

Copyright © 2002 University of California at Davis




email: Alexander Kozik
email: Brian Chan

last modified: November 12 2003