KdTree

This repository contains the k-d tree implementation as a part of third assignment in COL362 (Intro. to Database Management Systems) being held at IIT Delhi

Program Structure

The file parent.py is provided to you (you are not supposed to edit it). It is the program that will time your submission. You are expected to submit your code along with a run.sh which compiles and runs your program (Specifications below).

In order for parent.py to measure the time taken by your program to just answer the query (excluding the k-d tree construction time), the following mechanism has been adopted.

parent.py will execute sh run.sh <dataset_file> as a child process. (run.sh should in turn compile and run your program) (Here <dataset_file> is the name of the file containing the data points - Format specified below)
Your program (child) should construct the k-d tree using <dataset_file> and output "0" on the standard output (stdout) when done. This would be read by the parent.
parent.py would then write the name/path of the <query_file> (which contains the query point for kNN - Format specified below) and the value of k (k in kNN) on your standard input (stdin). It also starts the timer now.
Your program should now read the name of the <query_file> and the value of k from stdin and process the query using the k-d tree. The output (the k nearest neighbors) should then be written to results.txt. (Format specified below)
Your program should then output "1" on stdout so that the parent can stop the timer and check your results.txt.

File Formats

A point is represented as a space separated list of values along each dimension.
<dataset_file>:

First line is "D N" where D = numer of dimensions and N = number of points.
N lines follow where each line is a D-dimesnional point.

<query_file>:

First line is "D" where D = number of dimensions
Next line is the D-dimensional query point.

results.txt: k lines where each line should be a point.

Example: Suppose the points are 2-dimensional and <dataset_file> contains 3 points, it would look something like

2 3
0.0 1.0
1.0 0.0
0.0 0.0

Now if the <query_file> is:

2
1.0 1.0

Then for k = 2, results.txt should look like:

0.0 1.0
1.0 0.0

NOTE: The results.txt file must be sorted in ascending order of distance to the query point. For a tie-breaker (2 points equi-distant from query point), use lexicographical ordering between the 2 points i.e. if two d-dimensional points A and B are equi-distant from the query point, then A < B if for some 0 <= m < d, A[m] < B[m] and A[i]=B[i] for 0 <= i < m. (where A[i] represents value of A along i-th dimension) (Eg. [1.0 2.0] < [1.0 3.0])

Command to run

python parent.py <dataset_file> <query_file> <k>

parent.py is compatible with Python 2.x and Python 3.x and with both Linux and Windows

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
utils		utils
.DS_Store		.DS_Store
README.md		README.md
a.out		a.out
file_generator.py		file_generator.py
main.cpp		main.cpp
parent.py		parent.py
query.txt		query.txt
read.cpp		read.cpp
results.txt		results.txt
run.sh		run.sh
sample.py		sample.py
xy.cpp		xy.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KdTree

Program Structure

File Formats

Command to run

About

Releases

Packages

Languages

navreeetkaur/kdtree

Folders and files

Latest commit

History

Repository files navigation

KdTree

Program Structure

File Formats

Command to run

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages