Community Detection Algorithms
On this page, the authors' C++ implementation of the two algorithms introduced in P. Schütz and A. Caflisch, Efficient Modularity Optimization by Multi-Step Greedy Algorithm and Vertex Mover Refinement are available. All files are contained in the following zip archive available for download:
Multi-Step Greedy algorithm
The implementations used for benchmarking can be found below. For the users' convenience a wrapped-up version is presented first.
Source
The C++-source is called "MSG_complete.cc" and implements the multistep greedy algorithm.
Compilation
This programs has been tested with the GCC compiler suite.
g++ -m32 -O3 MSG_complete.cc -o MSG_complete
The usage of the 32-bit version is recommended to lower numerical instabilities.
Usage
The edge information is expected in the format
Start_Vertex End_Vertex Weight
Please be aware that the program expects a directed edge file. This implies that for an undirected network the links
a b weight
and
b a weight
have to be present. The MSG algorithm is performed by
./MSG_complete edge_file level_parameter_value
The program outputs the file "partition-MSG-"edge_file"-l-eq-"level_parameter_value"-with-Q-"final_modularity. This output can directly be used by the VM procedure below.
Vertex-Mover Procedure
Source
The file "VM.cc" in the same zip archive provides the implementation of the vertex mover procedure.
Compilation
This program has been tested with the GCC compiler suite.
g++ -O3 VM.cc -o VM
Usage
The output of the MSG algorithm has to be fed-in as follows
./VM input_partition edge_file_de_symmetrized
This procedure automatically generates a file named "NO-boosted-"input_partition"-boosted-to-"final_modularity.
It is important to notice that the current implementation assumes unweighted networks. Therefore the redundant information of the edge file used for the MSG has to be removed. This means in the MSG file
a b weight
b a weight
in the VM edge file only
a b weight
For users of the program gawk this transformation can readily be done by
awk '($1<$2)' edge_file > edge_file_de_symmetrized
Network of Words in Titles of Publications (co)authored by Martin Karplus
The edges are listed in the file "karplus-edges" (see zip archive). Each line corresponds to one edge. The first and second column indicate the indices of the vertices spanning the edge. The weight of the edge is listed in the third column. The words corresponding to a vertex index are given in the file called "karplus-nodes".
Awk-scripts to create Random Networks to benchmark community-detection algorithms
The following bash-scripts (calling awk) create computer-generated network which can be used to benchmark community-detection algorithms. The following three flavours exist:
- "create-SED-network": Small networks with an exponential distribution imposed on the degree distribution
- "create-SLD-network": Small networks with a linear distribution imposed on the degree distribution
- "create-LLD-network": Same as SLD but with at least 300 vertices
Implementations for Benchmarking
Below, we provide the original C++-Implementation of the two algorithms introduced in the reference paper.
Multi-Step Greedy algorithm
Source
This C++-source file is called "MSG.cc," and it is also found in the aforementioned zip archive.
.
Compilation
The program has been tested with the GCC compiler suite.
g++ -m32 -O3 MSG.cc -o MSG To lower the effect of numerical instabilities it is recommended to compile the program with 32-bit.
Usage
The edge information is expected in the format
Start_Vertex End_Vertex Weight
The program assumes a directed network. Therefore, for undirected networks both links
a b weight
and
b a weight have to be present. The MSG algorithm is performed by
./MSG edge_file level_parameter_value
The program outputs per default no partition.
Extraction of Partition
The source file "transform.cc" contains a tiny C++-program that can be used to transform the MSG output into a partition. The compilation is straightforward
g++ -O3 transform.cc
It can be used as follows
- ./MSG edge_file level_parameter_value | grep merging | awk '{print $4 "\t" $5}' > temp_file
- ./transform temp_file partition-MSG
The output-format of partition-MSG is
Vertex_Number Community_Index 1
Vertex-Mover Procedure
Source
The vertex mover procedure is unchanged and found in "VM.cc" in the zip archive.
Compilation
This programs has been tested with the GCC compiler suite.
g++ -O3 VM.cc -o VM
Usage
The output of the MSG algorithm has to be fed-in as follows
./VM input_partition edge_file_de_symmetrised
This procedure automatically generates a file named "NO-boosted-"input_partition"-boosted-to-"final_modularity.
It is important to notice that the current implementation assumes unweighted networks. Therefore the redundant information of the edge file used for the MSG has to be removed. This means in the MSG file
a b weight
b a weight
in the VM edge file only
a b weight