A high-performance graph domain specific language

This is a tutorial for getting started with GraphIt and using the python bindings to compile GraphIt programs as a library and call it from python. If you are looking for a tutorial on building GraphIt programs as standalone application, you can visit the Standalone Getting Started tutorial.

In this tutorial we will write the PagerankDelta algorithm in GraphIt and run it from python on a graph loaded in python. We will also measure the performance of this algorithm from python. We will learn how to separate initialization from actual execution to get meaningful timing numbers. This tutorial is broadly devided in the following sections -

- Overview
- Downloading software
- Cloning GraphIt
- Basic Variables, Constructs, and Functions
- PageRankDelta Example in GraphIt language
- Scheduling Explanatation
- Using GraphIt code from Python.
- Code examples

Make sure you have all the correct Open Source Software installed. First follow the README file here to clone and install graphIt. You will need either CILK or OPENMP to allow you to run the C++ code in parallel. If you dont have either you can get both by simply downloading GCC. Alternatively if you already have CILK or OPENMP you can use those too. This tutorial will go through how to use GraphIt via both CILK and OPENMP.

Clone graphit by going to GraphIt
*Something to note for the following tutorial.*
*Everything will be done graphit/build/bin*

If you have not yet already please read the basic information on the GraphIt Language.

```
element Vertex end
element Edge end
const edges : edgeset{Edge}(Vertex,Vertex);
const vertices : vertexset{Vertex};
const cur_rank : vector{Vertex}(float);
const ngh_sum : vector{Vertex}(float);
const delta : vector{Vertex}(float);
const out_degree : vector{Vertex}(int);
const damp : float = 0.85;
const beta_score : float;
const epsilon2 : float = 0.1;
const epsilon : float = 0.0000001;
const init_delta: float;
func updateEdge(src : Vertex, dst : Vertex)
ngh_sum[dst] += delta[src]/out_degree[src];
end
func updateVertexFirstRound(v : Vertex) -> output : bool
delta[v] = damp*(ngh_sum[v]) + beta_score;
cur_rank[v] += delta[v];
delta[v] = delta[v]-1.0/edges.getVertices();
output = (fabs(delta[v]) > epsilon2*cur_rank[v]);
ngh_sum[v] = 0;
end
func updateVertex(v : Vertex) -> output : bool
delta[v] = ngh_sum[v]*damp;
cur_rank[v] += delta[v];
ngh_sum[v] = 0;
output = fabs(delta[v]) > epsilon2*cur_rank[v];
end
func initVectors(v : Vertex)
cur_rank[v] = 0.0;
ngh_sum[v] = 0.0;
delta[v] = init_delta;
end
export func set_graph(edges_args: edgeset{Edge}(Vertex, Vertex))
edges = edges_args;
vertices = edges.getVertices();
cur_rank = new vector{Vertex}(float)();
ngh_sum = new vector{Vertex}(float)();
delta = new vector{Vertex}(float)();
out_degree = edges.getOutDegrees();
beta_score = (1.0 - damp) / edges.getVertices();
init_delta = 1.0 / edges.getVertices();
vertices.apply(initVectors);
end
export func do_pagerank_delta() -> final_ranks: vector{Vertex} (float)
var n : int = edges.getVertices();
var frontier : vertexset{Vertex} = new vertexset{Vertex}(n);
for i in 1:10
#s1# edges.from(frontier).apply(updateEdge);
var output : vertexset{Vertex};
if i == 1
output = vertices.filter(updateVertexFirstRound);
else
output = vertices.filter(updateVertex);
end
delete frontier;
frontier = output;
end
delete frontier;
final_ranks = cur_rank;
end
```

*Page Rank Delta in Graphit*

Here we will go through an example of writing GraphIt code for the Page Rank Delta application and then calling it from python using the python bindings. You can find the code used in this example along with a few other applicaitons under graphit/apps directory here.

Additionally here is a link to the GraphIt OOPSLA18 paper or the arxiv report here. Sections 4 and 5 give the complete breakdown of the PageRankDelta code. Please look here if you want a more detailed breakdown of the functionality of GraphIt.

```
element Vertex end
element Edge end
```

*element definitions*

Here we construct the basic Elements, `Vertex`

and `Edge`

with the `element`

keyword that will be used by graphit. Note these basic Elements do not have to be named “Vertex” or “Edge”. Most Graph Algorithms will require that you have both of these. GraphIt supports multiple types of user-defined vertices and edges, which is important for algorithms that work on multiple graphs.

A quick refresher on Variables

```
const edges : edgeset{Edge}(Vertex,Vertex);
const vertices : vertexset{Vertex};
```

*vertexset and edgeset definitions*
Once we have declared the basic element types, we can use these to construct the vertexsets and edgesets. These two lines declare these edgeset, `edges`

and the vertexset `vertices`

. Each element of the edgeset is of `Edge`

type (specified between “{ }”), and the source and destination of the edge is type `Vertex`

. The source and destinations can also be of different types (for instance in case of a bipartite graph). Here we have just declared these two sets but haven’t initialized them to anything. This is because these sets will be created using the graph passed to GraphIt from the python code.
We have added the `const`

keyword before the declarations to indicate these vertexsets and edgesets are globally accessible.

```
const cur_rank : vector{Vertex}(float);
const ngh_sum : vector{Vertex}(float);
const delta : vector{Vertex}(float);
const out_degree : vector{Vertex}(int);
```

*vector definitions*

Data for vertices and edges are defined as vectors associated with the element type denoted using the `{ }`

syntax. For example, `cur_rank`

is associated with `Vertex`

, and is of type `float`

(specified in with the `()`

syntax). This is similar to fields of a struct in C or C++, but is stored as a separate vector.

Notice again, that we have only delcared these vectors and haven’t allocated them or initialized their data because the size and values of these will depend on the Graph passed in. We will be explicity initializing them in the in the `set_graph`

function.

```
const damp : float = 0.85;
const beta_score : float;
const epsilon2 : float = 0.1;
const epsilon : float = 0.0000001;
const init_delta: float;
```

*scalar constants declaration*

Next we declare the constants and scalar variables required for the algorithm. We also initialize those scalars that have values independent of the graph (for example the `damp`

float). We will initialize the variables that depend on the graph in the `set_graph`

function.

Next we move on to take a look at the functions used in the program.

A quick refresher on Functions

The algorithm uses three user-defined functions, `updateEdge`

, `updateVertexFirstRound`

, and `updateVertex`

.

```
func updateEdge(src : Vertex, dst : Vertex)
ngh_sum[dst] += delta[src]/out_degree[src];
end
```

`updateEdge`

takes in `src`

and `dst`

of an edge as arguments. The function adds to the current **ngh_sum** of the destination vertex `dst`

, the `delta`

divided by the `out_degree`

of the source vertex `src`

. This function is later used in the main function and applied to every edge in the edgeset.

```
func updateVertexFirstRound(v : Vertex) -> output : bool
delta[v] = damp*(ngh_sum[v]) + beta_score;
cur_rank[v] += delta[v];
delta[v] = delta[v]-1.0/edges.getVertices();
output = (fabs(delta[v]) > epsilon2*cur_rank[v]);
ngh_sum[v] = 0;
end
```

`updateVertexFirstRound`

takes in a vertex, `v`

, and returns a boolean. It does this by multiplying the `ngh_sum`

with the damping factor and adding the basescore. From this it computes the rank and using the delta it computes whether or not it exceeds a certain threshold. If this threshold is exceeded than it returns a boolean True and if not a boolean False. Then it sets the `ngh_sum`

back to 0.

```
func updateVertex(v : Vertex) -> output : bool
delta[v] = ngh_sum[v]*damp;
cur_rank[v] += delta[v];
ngh_sum[v] = 0;
output = fabs(delta[v]) > epsilon2*cur_rank[v];
end
```

`updateVertex`

also takes in a vertex and returning a boolean. However in this case it does not add the base score to `delta`

when determining Delta. Similarly then by comparing if the delta exceeded the threshold of epilson times the rank it outputs a True or False.

`updateVertexFirstRound`

and `updateVertex`

will be used later on to filter out the “active vertices”. These are the vertices that will used in the next iteration of the algorithm. These active vertices are also known as the frontier. The reason for two functions is that the first time we update the vertexs some additional computation needs to be done as described above that isn’t needed later on. Therefore the second function is run only once in the beginning of the algorithm.

```
func initVectors(v : Vertex)
cur_rank[v] = 0.0;
ngh_sum[v] = 0.0;
delta[v] = init_delta;
end
export func set_graph(edges_args: edgeset{Edge}(Vertex, Vertex))
edges = edges_args;
vertices = edges.getVertices();
cur_rank = new vector{Vertex}(float)();
ngh_sum = new vector{Vertex}(float)();
delta = new vector{Vertex}(float)();
out_degree = edges.getOutDegrees();
beta_score = (1.0 - damp) / edges.getVertices();
init_delta = 1.0 / edges.getVertices();
vertices.apply(initVectors);
end
```

*initialization functions*

We now create the functions that will initialize the vectors and other variables that depend on the graph. Firstly, the `set_graph`

function. Notice this function is marked with the `export`

keyword, which means that this function is to be called from python and should be a part of the module loaded into python. This function takes 1 argument, which is the edgeset passed from the python code (as `scipy.sparse.csr_matrix`

) and does not return anything.
You can read more about the `export`

keyword and how the GraphIt types interface with the python types in the export section of the language manual.

The first thing we initialize is the edgeset `edges`

by assigning the passed in argument to it. We then initialize the vertexset `vertices`

by using the `getVertices()`

function from `edges`

. This function returns the vertices from the graph passed in as argument.

We then allocate memory for the vectors associated with the vertices with `new Vector{Vertex}(float)()`

. This is okay to do now because the number of elements is known. We do the same for the variables `beta_score`

and `init_delta`

. The vector `out_degree`

is initialized by calling the `getOutDegrees()`

function from `edges`

.

We initialize the values for the vectors by using a user defined function `initVectors`

. This function takes in an argument the vertex to initialize and sets the initial value of the vectors for that vertex. We invoke this function on all the vertices in the `set_graph`

function by using the `vertexset.apply`

operator.

You should note here that we have created a separate `set_graph`

function to isolate it from the actual algorithm in the `do_pagerank_delta`

function. We do this so we can measure the performance of the actual algorithm and not the cost of initializing the graph related datastructures. If you are not timing the execution of the algorith, it is perfectly fine to add an arguement to the main `do_pagerank_delta`

function and do all the initializations there.

Now, all the data structures are initialized and we are ready to perfom the pagerank_delta operation.

```
export func do_pagerank_delta() -> final_ranks: vector{Vertex} (float)
var n : int = edges.getVertices();
var frontier : vertexset{Vertex} = new vertexset{Vertex}(n);
for i in 1:10
#s1# edges.from(frontier).apply(updateEdge);
var output : vertexset{Vertex};
if i == 1
output = vertices.filter(updateVertexFirstRound);
else
output = vertices.filter(updateVertex);
end
delete frontier;
frontier = output;
end
delete frontier;
final_ranks = cur_rank;
end
```

*main function of PageRankDelta*

The `main`

function is where your program comes together and runs together with all the user-defined functions. Similar to C and C++, you have to explicitly name the function `main`

.

GraphIt is designed to separate edge processing logic from edge traversal, edge filtering (from, to, srcFilter, and dstFilter), atomic synchronization, and modified vertex deduplication and tracking logic (apply and applyModified). This separation enables the compiler to represent the algorithm from a high level, exposing opportunities for edge traversal and vertex data layout optimizations. Moreover, it frees the programmer from specifying low-level implementation details, such as synchronization and deduplication logic.

The algorithm iterates for 10 iterations to update each vertex’s `cur_rank`

value. The `cur_rank`

is assumed to converge after 10 iterations, and represents the importance of each vertex based on its topological structure. In each iteration, the algorithm maintains the set of vertices whose rank has changed greatly from previous iterations, which is known as the `frontier`

. We start with having all vertices in the frontier with frontier initalization ( `var frontier : vertexset{Vertex} = new vertexset{Vertex}(n);`

). The frontier generated by

`output = vertices.filter(updateVertexFirstRound)`

in the first iteration, which applies the `updateVertexFirstRound`

function to all the `vertices`

. In later iterations the frontier is generated by

`output = vertices.filter(updateVertex)`

.

The user has to explicitly manage the memory and swapping of the frontier with `delete frontier; `

and `frontier = output;`

In each iteration we update the `delta`

values using the `updateEdge`

function in `#s1# edges.from(frontier).apply(updateEdge); `

. We use the operator `from`

to obtain the set of edges that comes out from the `frontier`

. Then we use `apply `

to apply the `updateEdge`

function on these edges. The label `#s1`

is used as a way to identify the edgeset operator for performance tuning using the scheduling.

The vector `cur_rank`

is returned by assigning it to the `final_ranks`

variable.

To learn how to use the scheduling functions with GraphIt, please visit the scheduling section of the main getting-started document.

Before you can run the code above you need to first follow these steps and build GraphIt

Before we start building GraphIt we need to make sure we have all the dependencies installed. Currently GraphIt requires the following packages to be installed -

- cmake (version >= 3.5)
- g++ (version >= 5.4)
- python 2.7 or Python 3.x (for running basic test cases)
- python 3.x (x >= 5, for python bindings)
- pip3 (for installing the required packages)
- pybind11
- can be installed using the command
`pip3 install pybind11`

- can be installed using the command
- scipy
- can be installed using the command
`pip3 install scipy`

- can be installed using the command

To perform an out-of-tree build of Graphit:

First clone the repository.

```
git clone git@github.com:GraphIt-DSL/graphit.git
```

After you have cloned the directory:

```
cd graphit
mkdir build
cd build
cmake ..
make
```

To run the C++ test suite do (all tests should pass):

```
cd build/bin
./graphit_test
```

To run the Python end-to-end test suite:

start at the top level graphit directory cloned from Github, NOT the build directory (All tests would pass, but some would generate error messages from the g++ compiler. This is expected.) Currently the project supports Python 2.x and Python 3 for the basic tests (test.py and test_with_schedules.py) and Python 3.x (x >= 5) for the pybind test cases (pybind_test.py), which use scipy.

```
cd build
python python_tests/test.py
python python_tests/test_with_schedules.py
export PYTHONPATH=.
python3 python_tests/pybind_test.py
```

We are now ready to invoke the GraphIt program we wrote above from python 3.x (x >= 5). We start by writing the following python program.

```
import graphit
import scipy.io
from scipy.sparse import csr_matrix
import sys
import time
module = graphit.compile_and_load("pagerank_delta_export.gt")
graph = csr_matrix(scipy.io.mmread(sys.argv[1]))
module.set_graph(graph)
start_time = time.perf_counter()
ranks = module.do_pagerank_delta()
end_time = time.perf_counter()
print ("Time elapsed = " + str(end_time - start_time) + " seconds")
```

This file `pagerank_delta.py`

can be saved anywhere including your home directory. This is because the GraphIt compiler can be used anywhere as long as the `PYTHONPATH`

environment variable points to the right directory.

We start by importing the `graphit`

module. This module has helper functions to compile and invoke Graphit functions. We also import the supporting libraries required to load the graph and time the computation.

We then ask the graphit module to load the GraphIt program we wrote using the `compile_and_load`

function. We pass to it the path of the file we wrote above. This function returns a module that has all the functions we had exported form the GraphIt file. Notice that for the path, we have just provided `"pagerank_delta_export.gt"`

because the file is in the same directory. But you should provide the full path here if it is in a different directory.

Next we load the graph in the `scipy.sparse.csr_matrix`

format. Assuming the graph is stored in the MatrixMarket format, we use the `scipy.io.mmread`

function which returns the graph in the `scipy.sparse.coo_matrix`

format. We convert it to the `csr_matrix`

format and store it in the `graph`

variable. We will be specifying the input graph file name at run time and hence we have used the `sys.argv[1]`

as input file name.

We then pass this graph to the pagerank_delta program by invoking the `set_graph`

function we had defined. Finally we invoke the `do_pagerank_delta`

function and save its return value in the ranks. We also time the call to this function by calling the `time.perf_counter()`

function before and after the function call and substracting the time.

Save the above python program in a file named pagerank_delta.py

Before we can run the program we have to make sure that environment variable `PYTHONPATH`

points to GraphIt’s build directory.

This can be done using the command -

```
export PYTHONPATH=<path/to/graphit/build>
```

We are ready to run this program using python3. We can invoke the command -

```
python3 pagerank_delta.py <path/to/graph.mtx>
```

Running the code outputs the time elapsed to run the computation.

The build directory of Graphit is added to your PYTHONPATH already, but just in case the above command gives an error messsage like -

```
ImportError: No module named 'graphit'
```

It means that python3 cannot find the `graphit`

module. You can point to it using the command -

```
export PYTHONPATH=<path/to/graphit/build/directory>
```

and then running the command again -

```
python3 pagerank_delta.py <path/to/graph.mtx>
```

All the code used in this tutorial is available at the following urls

To reproduce these examples, you can just navigate to the apps directory and run the python command given below. Some example graphs are also provided in the test/graphs directory.

```
python3 pagerank_delta.py ../test/graphs/4.mtx
```