Group # 27
students: Giovanni Tangredi (s276086), Francesco Xia (s276509)
* this project has been tested on a Linux machine.
To compile and run the program:
make
./bin/main <graph_file_path> <num_intervals> <query_file_path> [-l] [-q]
OPTIONS
-l saves generated labels to "test/output/labels_out.txt"
-q saves query results to "test/output/queries_out.txt"
To run the test suites:
make test_node
./bin/test
The other rules available rules are: test_label, test_graph, test_query, test_bitmap
NOTE: the TEST flag must be set to 0 when the program is executed in normal mode. Otherwise, the labels are not generated randomly. Set TEST to 1 when running the test suites.
SDPProjectQ2
├── README.md
├── bin
├── gen_query.c
├── include
│ ├── bitmap.h
│ ├── constants.h
│ ├── graph.h
│ ├── label.h
│ ├── menu.h
│ ├── query.h
│ └── time_tracker.h
├── makefile
├── src
│ ├── bitmap.c
│ ├── graph.c
│ ├── label.c
│ ├── main.c
│ ├── menu.c
│ ├── query.c
│ └── time_tracker.c
└── test
├── input
│ ├── grafo20.gra
│ ├── grafo20.png
│ ├── grafo20_25.que
│ └── grafo_con_rango.png
│
│
├── output
│ ├── labels_out.txt
│ └── queries_out.txt
└── tests
├── bitmap_test.c
├── graph_test.c
├── label_test.c
├── node_test.c
└── query_test.c
This repository has been divided as follows:
- the bin folder contains the executables.
- the src folder contains the source files.
- the test folder is further divided in:
- the input folder contains an example of a graph and a set of queries.
- the output folder contains the output generated by the program (i.e., the labels, the query results, and copy of the graph).
- the tests folder contains the test source files.
- the include folder contains the header files.
The user can customize the program's behavior by modifying the values of the flags found in constant.h:
- DEBUG: verbose mode.
- TEST: if set to 1, the labels are not randomly generated.
- MAX_THREADS_GRAPH: number of running threads used during the graph generation phase. MUST be > 0.
- MAX_THREADS_QUERY: number of running threads used to solve the queries. MUST be > 0.
- ALL_NODES: prints the graph nodes on standard output.
The tests were created using the Check.h framework.
If the option -q is passed, the program creates a file called queries_out.txt, where each line represents the result of a query: source_id destination_id {0/1} (unreachable/reachable).
If the option -q is missing, the user can interactively query the reachability of two nodes.
Similarly passing the option -l, the program creates a file named labels_out.txt, with the following format: node_id: [l1_l, l1_r] [l2_l, l2_r] ...
The graph is represented using a data structure similar to an adjacency list. Specifically, we used a vector of Node pointers to depict the graph, where each node can be indexed through its node_id. Furthermore, each node stores:
- a list of its outgoing edges.
- a vector of intervals of size num_intervals
The queries are stored in a static struct called queries in query.c. Specifically:
- a single query is represented as a route (route.src and route.dst);
- The result of a particular query is stored in a fixed position of a bitmap, whose position is determined by the location of the query inside the file query_file_path. Each bit indicates either non-reachability/reachability between two nodes (0/1). The i-th bit of the bitmap represents the state of i-th query (i.e., the query found at i-th lines of the file <query_file>);
The generation of the graph follows these main ideas:
- the generation is done concurrently by running MAX_THREADS_GRAPH number of threads
- at each iteration, each thread
- reads x lines
- increment shared counter of read lines (see p_curr_iteration)
- parse those lines (e.g. find the node_id of the current node and of its children)
- allocates a single chunk of memory, big enough to store x Nodes (see node_create_multiple)
- save each created Node at graph->nodes[<node_id>] = node;
repeat until end of file.
The intervals are generated concurrently by running n threads, where n is equal to the parameter num_intervals. The interval generated by i-th thread is saved in the corresponding Node.intervals[i-th];
The label generator is an implementation of the algorithm 1 found in [1].
Regarding the queries, they are equally divided in n blocks, where n is empirically set to 4 (see constants.h). Each block is then assigned to a thread, which task is to find the non-reachability/reachability of every query in its block. The query solver is an implementation of algorithm 2 found in [1].
References:
[1]: Yıldırım, H., Chaoji, V. & Zaki, M.J. GRAIL: a scalable index for reachability queries in very large graphs. The VLDB Journal 21, 509–534 (2012). https://doi.org/10.1007/s00778-011-0256-4