This is a console application for classifying input vectors using the KNN classification algorithm.
KNN (K - nearest neighbor) is an algorithm for classifying an unclassified input vector of x elements based on a given database of at least K other classified vectors of the same length. The pseudocode for the algorithm:
KNN(Vector v, int k):
- find K closest vectors in database by some distance metric.
- define type to be the most frequent type of all k closest vectors.
- return type.
The program is split to two processes. the server proccess and the client proccess. The client handles the requeses from the user. With all server-client communication implemented using TCP. The possible requests are:
- uploading the database of classified and of unclassified vectors to the server.
- choosing the K number and the distance metric.
- Classifying the vectors.
- Showing the classifications to the user in the console.
- Downloading the classified vectors to the user's computer.
- To handle multiple clients, the server uses multithreading in order to handle all of the clients at once, with at most 40 clients in the queue.
- The client process also uses multithreading to let the user continue the communication with the server while the classified vectors are downloaded in the background in another thread.
- The 5 possible user requests are implemented using a Command Line Interface (CLI) Design Pattern.
- The possible Distances are:
- "AUC" - Euclidian (more info at https://en.wikipedia.org/wiki/Euclidean_distance)
- "MAN" - Manhatten (more info at https://en.wikipedia.org/wiki/Taxicab_geometry)
- "CHB" - Chebyshev (more info at https://en.wikipedia.org/wiki/Chebyshev_distance)
- "CAN" - Canberra (more info at https://en.wikipedia.org/wiki/Canberra_distance)
- "MIN" - Minkowski (more info at https://en.wikipedia.org/wiki/Minkowski_distance)
implementation Note - Since the Euclidian and the Manhattan distance are subsets of the Minkowski distance, they also use its implementation. The default input for the Minkowski distance being simply the Euclidian distance (when P = 2. It's defined as a const).
for n = the number of vectors in the given database, each with dimension m. For each new input, the program runs at time complexity of O(nm)*. O(m) for calculating each vector's distance from the input, for a total of n times, and then O(n) for the select algorithm, and anther O(k) for finding the most common classification.
By developing this project, i have gained experience in the following areas:
- KNN classification algorithm.
- TCP server and client implementation in C++.
- multithreading programming.
- Implementing a Command Line Interface (CLI) design pattern.
Compilation can be done either using terminal command: linux> make
This will create two run files: client.out, server.out.
To compile only one of the files instead, use: linux> make client.out
or linux> make server.out
Run server using command: linux> ./server.out [port-number]
And client using: linux > ./client.out [server-ip] [server-port-number]
Optionally, could also clean up compiled files using command: linux> make clean
server:
./server.out 60000
client:
./client.out 127.0.0.1 60000
Welcome to the KNN Classifier Server. Please choose an option:
1. upload an unclassified csv data file
2. algorithm settings
3. classify data
4. display results
5. download results
8. exit
1
Please upload your local train csv file.
datasets/iris/iris_classified.csv
Please upload your local test csv file.
datasets/iris/iris_Unclassified.csv
Welcome to the KNN Classifier Server. Please choose an option:
1. upload an unclassified csv data file
2. algorithm settings
3. classify data
4. display results
5. download results
8. exit
2
The current KNN parameters are: K = 5, distance metric = AUC
19 CHB
Welcome to the KNN Classifier Server. Please choose an option:
1. upload an unclassified csv data file
2. algorithm settings
3. classify data
4. display results
5. download results
8. exit
3
classifying data complete
Welcome to the KNN Classifier Server. Please choose an option:
1. upload an unclassified csv data file
2. algorithm settings
3. classify data
4. display results
5. download results
8. exit
4
1 Iris-setosa
2 Iris-virginica
3 Iris-virginica
4 Iris-versicolor
5 Iris-versicolor
6 Iris-setosa
7 Iris-virginica
8 Iris-setosa
9 Iris-virginica
10 Iris-setosa
11 Iris-versicolor
12 Iris-virginica
13 Iris-virginica
14 Iris-versicolor
15 Iris-setosa
Done.
Welcome to the KNN Classifier Server. Please choose an option:
1. upload an unclassified csv data file
2. algorithm settings
3. classify data
4. display results
5. download results
8. exit
8