Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Decision tree model #32 issue #108

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions docs/methods/neighbors/decision_tree/decision_tree.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Decision Tree

Decision Tree model.

Decision Tree forms the tree based on the best information gain we received at each node. The tree formed later use to predict the output values for test data.


## Attributes

| Name | Definition | Shape |
| ------------ | --------------------------------------------------------- | ---------- |
| entropy | Entropy of the y values left | 1 |
| max depth | Max depth of the decision tree | 1 |
| max depth | Max depth of the decision tree | 1 |
| min Samples leaf | Minimum samples that should be there to form further nodes of branch | 1 |
| target | Output class | 1 |
| max features | Max features that should be considered to form further branches | 1 |
| features | All the features/columns in data | n_features |
| fkey | Column number that will be considered to form next node | 1 |
| fval | Double splitting value | 1 |
| left | Left node of decision tree | 1 |
| right | Right node of decision tree | 1 |

## Methods

| Name | Definition | Return value |
| ------------------------------- | ----------------------------------------------------- | ----------------- |
| `entropy(vector<double>)` | Compute entropy | double |
| `divideData(vector<vector<double>>,int,int)` | Divide the data after finding best Info gain | `vector<vector<vector<double>>>`|
| `infoGain(vector<vector<double>>,int,int)` | Computes info gain | double |
| `train(vector<vector<double>>)` | Train the model on training values | void |
| `predict(vector<double>)` | Predict the output for testing value | int |

## Example

```
std::vector<std::vector<double>> x_data{{0,23.76,3,76.56,1},{1,12.76,2,87.45,0},{1,21.86,1,79.98,1},{0,32.64,1,76.87,1},{0,22.76,3,89.90,0},{1,28.64,0,73.87,1},{0,12.87,3,82.86,0}};
DecisionTree<double> dt = new DecisionTree(9,0,4);
dt->train(x_data);
std::vector<double> test{1,38.19,2,81.65};
std::cout<<dt->predict(test);

```
89 changes: 89 additions & 0 deletions examples/methods/neighbors/decision_tree.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
// #include "../../src/slowmokit/methods/neighbors/decision_tree.hpp"
// #include "../../src/slowmokit/core.hpp"

// signed main(){
// std::vector<std::vector<double>> x_data{
// {0,23.76,3,76.56,1},
// {1,12.76,2,87.45,0},
// {1,21.86,1,79.98,1},
// {0,32.64,1,76.87,1},
// {0,22.76,3,89.90,0},
// {1,28.64,0,73.87,1},
// {0,12.87,3,82.86,0},
// {0,33.87,2,80.97,1},
// {1,39.64,1,70.87,1},
// {0,28.90,2,89.86,1},
// {0,13.76,3,72.56,0},
// {1,19.76,2,88.45,1},
// {0,16.86,1,78.98,0},
// {0,32.44,1,73.87,1},
// {1,22.76,3,80.93,1},
// {0,28.64,0,78.87,0},
// {1,8.87,2,81.96,0},
// {0,31.87,2,75.97,0},
// {1,27.64,1,71.89,1},
// {0,20.90,2,80.86,0},
// {0,23.76,3,76.56,1},
// {1,12.76,2,87.45,1},
// {1,21.86,1,79.98,1},
// {0,32.64,1,76.87,1},
// {0,22.76,3,89.90,0},
// {1,28.64,0,73.87,1},
// {0,12.87,3,82.86,0},
// {0,33.87,2,80.97,1},
// {1,39.64,1,70.87,1},
// {0,28.90,2,89.86,0},
// {0,13.76,3,72.56,0},
// {1,19.76,2,88.45,1},
// {0,16.86,1,78.98,0},
// {0,32.44,1,73.87,1},
// {1,22.76,3,80.93,1},
// {0,28.64,0,78.87,0},
// {1,8.87,2,81.96,1},
// {0,31.87,2,75.97,0},
// {1,27.64,1,71.89,0},
// {0,20.90,2,80.86,1},
// {0,32.64,1,76.87,1},
// {0,22.76,3,89.90,0},
// {1,28.64,0,73.87,1},
// {0,12.87,3,82.86,0},
// {0,33.87,2,80.97,1},
// {1,39.64,1,70.87,1},
// {0,28.90,2,89.86,1},
// {0,13.76,3,72.56,0},
// {1,19.76,2,88.45,1},
// {0,16.86,1,78.98,0},
// {0,32.44,1,73.87,1},
// {1,22.76,3,80.93,1},
// {0,28.64,0,78.87,0},
// {1,8.87,2,81.96,0},
// {0,31.87,2,75.97,0},
// {1,27.64,1,71.89,1},
// {0,20.90,2,80.86,1},
// {0,23.76,3,76.56,1},
// {1,12.76,2,87.45,0},
// {1,21.86,1,79.98,1},
// {0,32.64,1,76.87,1},
// {0,22.76,3,89.90,0},
// {1,28.64,0,73.87,1},
// {0,12.87,3,82.86,0},
// {0,33.87,2,80.97,0},
// {1,39.64,1,70.87,0},
// {0,28.90,2,89.86,1},
// {0,13.76,3,72.56,1},
// {1,19.76,2,88.45,0},
// {0,16.86,1,78.98,1},
// {0,32.44,1,73.87,1},
// {1,22.76,3,80.93,0},
// {0,28.64,0,78.87,0},
// {1,8.87,2,81.96,1},
// {0,31.87,2,75.97,0},
// {1,27.64,1,71.89,1},
// {0,20.90,2,80.86,0},
// };
// DecisionTree<double> dt = new DecisionTree(9,0,4);
// dt->train(x_data);
// std::vector<double> test{1,38.19,2,81.65};
// std::cout<<dt->predict(test);
// return 0;
// }
1 change: 1 addition & 0 deletions src/slowmokit.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include "slowmokit/methods/linear_model/linear_regression.hpp"
#include "slowmokit/methods/linear_model/logistic_regression.hpp"
#include "slowmokit/methods/neighbors/bernoulli_nb.hpp"
#include "slowmokit/methods/neighbors/decision_tree.hpp"
#include "slowmokit/methods/neighbors/gaussian_nb.hpp"
#include "slowmokit/methods/neighbors/knn.hpp"

Expand Down
180 changes: 180 additions & 0 deletions src/slowmokit/ducks/matrix/matrix.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
/**
* @file ducks/matrix/matrix.cpp
*
* Implementation of the matrix main program
*/

#include "matrix.hpp"

template<class T> Matrix<T>::Matrix(int n, int m) : n(n), m(m)
{
if (n <= 0 or m <= 0)
throw std::out_of_range("\nCannot have non-positive dimension.");

mat.resize(n, std::vector<T>(m, T(0)));
}

template<class T> Matrix<T>::Matrix(const std::vector<std::vector<T>> in)
{
if (std::size(in) <= 0 or std::size(in[0]) <= 0)
throw std::out_of_range("\nCannot have non-positive dimension.");

n = std::size(in);
m = std::size(in[0]);
mat.resize(n, std::vector<T>(m));

for (int i = 0; i < n; i++)
{
if (std::size(in[i]) != m)
throw std::invalid_argument("\nAll rows must have same dimension");

for (int j = 0; j < m; j++)
this->mat[i][j] = in[i][j];
}
}

template<class T> Matrix<T> &Matrix<T>::operator*=(const T &scalar)
{
for (int i = 0; i < n; i++)
{
for (int j = 0; j < m; j++)
mat[i][j] *= scalar;
}

return *this;
}

template<class T> Matrix<T> &Matrix<T>::operator*=(const Matrix<T> &rhs)
{
auto [n2, m2] = rhs.getShape();

if (n2 <= 0 or m2 <= 0)
throw std::out_of_range("\nCannot have non-positive dimension.");

if (m != n2)
throw std::invalid_argument("\nColumn dimension of matrix-1 must be equal "
"to row dimension of matrix-2");

auto lhs = this->mat;
std::vector res(n, std::vector<T>(m2, T(0)));

for (int i = 0; i < n; i++)
{
for (int j = 0; j < m2; j++)
{
for (int k = 0; k < m; k++)
res[i][j] += lhs[i][k] * rhs[k][j];
}
}

this->mat = res;
updateShape();

return *this;
}

template<class T> Matrix<T> &Matrix<T>::operator+=(const Matrix<T> &rhs)
{
auto [n2, m2] = rhs.getShape();

if (n2 <= 0 or m2 <= 0)
throw std::out_of_range("\nCannot have non-positive dimension.");

if (n != n2 or m != m2)
throw std::invalid_argument(
"\nBoth Dimension of matrix-1 must be equal to that of matrix-2");

for (int i = 0; i < n; i++)
{
for (int j = 0; j < m; j++)
this->mat[i][j] += rhs[i][j];
}

return *this;
}

template<class T> Matrix<T> &Matrix<T>::operator-=(const Matrix<T> &rhs)
{
auto [n2, m2] = rhs.getShape();

if (n2 <= 0 or m2 <= 0)
throw std::out_of_range("\nCannot have non-positive dimension.");

if (n != n2 or m != m2)
throw std::invalid_argument(
"\nBoth Dimension of matrix-1 must be equal to that of matrix-2");

for (int i = 0; i < n; i++)
{
for (int j = 0; j < m; j++)
this->mat[i][j] -= rhs[i][j];
}

return *this;
}

template<class T> std::array<int, 2> Matrix<T>::getShape() const
{
return {this->n, this->m};
}

template<class T> T &Matrix<T>::operator()(int i, int j)
{
if (i >= n or i < 0)
throw std::out_of_range("\ni should be between 0 and " +
std::to_string(n - 1) + " inclusive");
if (j >= m or j < 0)
throw std::out_of_range("\nj should be between 0 and " +
std::to_string(m - 1) + " inclusive");

return mat[i][j];
}

template<class T> const std::vector<T> &Matrix<T>::operator[](int i) const
{
if (i >= n or i < 0)
throw std::out_of_range("\ni should be between 0 and " +
std::to_string(n - 1) + " inclusive");

return this->mat[i];
}

template<class T>
std::ostream &operator<<(std::ostream &os, const Matrix<T> &matrix)
{
int n = std::size(matrix);
int m = std::size(matrix[0]);

for (int i = 0; i < n; i++)
{
for (int j = 0; j < m; j++)
{
if (j > 0)
os << " ";
os << matrix[i][j];
}

if (i != n - 1)
os << "\n";
}

return os;
}

template<class T> Matrix<T> operator*(Matrix<T> lhs, const Matrix<T> &rhs)
{
lhs *= rhs;
return lhs;
}

template<class T> Matrix<T> operator+(Matrix<T> lhs, const Matrix<T> &rhs)
{
lhs += rhs;
return lhs;
}

template<class T> Matrix<T> operator-(Matrix<T> lhs, const Matrix<T> &rhs)
{
lhs -= rhs;
return lhs;
}
13 changes: 13 additions & 0 deletions src/slowmokit/methods/neighbors/decision_tree.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
/**
* @file methods/neighbors/decision_tree.hpp
*
* Easy include for Decision Tree algorithm
*/


#ifndef SLOWMOKIT_DECISION_TREE_HPP
#define SLOWMOKIT_DECISION_TREE_HPP

#include "decision_tree/decision_tree.hpp"

#endif // SLOWMOKIT_DECISION_TREE_HPP
Loading