Skip to content

multiple vantage point distance-based tree data structure

License

Notifications You must be signed in to change notification settings

starkdg/mvptree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MVPTree (Multiple Vantage Point Tree Data structure)

A distance-based data structure for the indexing of multi-dimensional data. The MVPTree indexes vector objects according to its distance from select vantage points chosen directly from the indexed data.

Performance Tests

The following tests measure the build and query times for various index sizes. Data points are 16-dimensional real-valued vector object (uniform independantly distributed). Build operations reflect the number of distance operations needed to build the tree. Likewise, query operations are the average number of distance operations to query the index for a constant radius of 0.04. Both build and query operations are normalized to the size of the tree and expressed as a percentage. Each line in the chart is the average of 5 trial runs, where the tree structure is built up, queried for 10 clusters of 10 points, and then taken down.

N MEM Build opers Build time Query opers Query time
100K 122MB 2554% 21.4μs 0.0087% 42.4μs
200K 289MB 2693% 26.3μs 0.0048% 63.4μs
400K 548MB 2792% 36.8μs 0.0022% 70.2μs
800K 963MB 2872% 59.4μs 0.0012% 86.0μs
1M 1.1GB 2899% 62.9μs 0.0010% 107μs
2M 2.7GB 3021% 79.6μs 0.0005% 163μs
4M 5.5GB 3139% 101μs 0.0002% 242μs

Here are the query stats for various radius sizes and a constant index size of N = 1,000,000:

radius Query Opers. Query Time
0.02 0.00094% 30.9μs
0.04 0.00092% 121μs
0.06 0.0012% 359μs
0.08 0.00041% 971μs
0.10 0.0241% 2.67μs

To try it for yourself, run the test program, runmvptree2. You can fiddle with the test parameters in the file, tests/run_mvptree2.cpp.

Programming

Use the template header files. First, create a custom data type. Examples are provided under key.hpp:

struct VectorKeyObject {
	double key[16];
	VectorKeyObject(){};
	VectorKeyObject(const double otherkey[]){
		for (int i=0;i < 16;i++) key[i] = otherkey[i];
	}
	VectorKeyObject(const VectorKeyObject &other){
		for (int i=0;i < 16;i++) key[i] = other.key[i];
	}
	const VectorKeyObject& operator=(const VectorKeyObject &other){
		for (int i=0;i < 16;i++) key[i] = other.key[i];
		return *this;
	}
	const double distance(const VectorKeyObject &other)const {
		double sum = 0;
		for (int i=0;i < 16;i++){
			sum += pow(key[i] - other.key[i], 2.0);
		}
		return sqrt(sum/16.0);
	}
	
};

The custom data type must provide the copy contructor, assignment operator and distance function implementation.

Next, use the templated class:

#include "mvptree/mvptree.hpp"


MVPTree<VectorKeyObject> mvptree;

vector<VectorKeyObject> points;

mvptree.Add(points);
mvptree.Sync();

int sz = mvptree.Size();
assert(sz == points.size());


VectorKeyObject target;
double radius = ...  //  value for data set 
vector<item_t<VectorKeyObject>> results = mvptree.Query(target, radius);


mvptree.Clear();
sz = mvptree.Size();
assert(sz == 0);

For more details, see the programs in the tests directory.

Install

cmake .
make
make test
make install