Table of content

Features
Code and screenshots
Project overview
Detailed design
- misc.hpp
- regr.hpp
- intp.hpp
- menu.hpp
Mathematical design
Reference

Features

The program is divided into three parts
- Statistical analysis
  - Users can load CSV files as they are prompted
  - Users can save CSV files
  - The CSV file is encapsulated file into a misc::Table class
  - Viewing content of CSV files
    - Showing all rows
    - Showing certain number of rows
    - Showing certain column
    - Showing certain row
  - Statistical analysis of all columns in the table
    - Standard deviation and variance
    - Mean and Summation
    - Maximum value and minimum value
    - 25th percentile, median and 75th percentile
    - Interquartile range (IQR)
- Regression
  - Correlation coefficient
  - Plots original data, and predicted data
  - Plots the equation with 500 points with the minimum and maximum of the loaded x-data
  - Shows a table containing x-values, y-values and the predicted data
  - Linear regression shows slope and y-intersect in equation form
  - Polynomial regression shows coefficients of polynomials within the equation which is printed as a latex view
- Interpolation
  - Plots original data
  - Plots all possible interpolations for 500 points with the minimum and maximum of the loaded x-data
  - Gets the user's x-input and returns the interpolated output , in the same time this output gets stored in the vector and can be viewed as a table at any time
  - Shows a table x-values, y-values and the interpolated data
  - Applies either linear interpolation or polynomial interpolation
- Error warning and crash prevention against:
  - miss-loaded data
  - No read CSV files
  - Out of bounds input
  - Wrong data type input

Code and screenshots

Screenshots
- Main menu
- Option-1
- Option-2
  - Sub-option-1 (it keeps going)
  - Sub-option-2
  - Sub-option-3
  - Sub-option-4
- Option-3 (statistics.csv is loaded)
- Option-4
  - Sub-option-1
    - Linear
    - Polynomial
  - Sub-option-2
    - Linear
    - Polynomial
  - Sub-option-3
- Option-5
  - Sub-option-1
  - Sub-option-2
  - Sub-option-3
    - Linear
    - Polynomial
  - Sub-option-4
- Option-6
This is the code as a .zip file:

Project overview

The software design of this project will be similar to a library design approach as it will have the following features such as:
- It will be header based, as all files will be in .hpp
- The use of a build system such as Cmake
- The use of a testing framework such as Google Test (GTest)
- Complex directory system
- Full and detailed code documentation
- Use of namespace and classes
- Use of object oriented programming such as inheritance and collaboration
- Use of version control such as git

Directory structure

What is each directory
- doc/ documentation directory for any documentation material
- out/ output directory to store compiled outputs
- include/ include directory for all of the source code
- lib/ library directory to store any external libraries
- test/ testing directory to store any test scripts
The current structure

include
├── graphs.hpp
├── intp.hpp
├── menu.hpp
├── misc.hpp
└── regr.hpp
test
├── int_test.cpp
├── menu_test.cpp
├── misc_test.cpp
└── reg_test.cpp
out
├── interpolation.csv
├── regression.csv
├── statistics.csv
├── test_int
├── test_menu
└── test_reg
build
└──...
lib
└── googletest
doc
└──... 

CMakeLists.txt

Libraries and dependencies used
- Libraries
  - Google test for testing (https://github.com/google/googletest)
  - Tables and Graphs for the terminal graphing (https://github.com/tdulcet/Tables-and-Graphs )
  - Standard library
    - iterator
    - utility
    - fstream
    - string
    - vector
    - algorithm
    - numeric
    - iostream
- Test script dependency flow chart
- Main program (menu.hpp) dependency flow chart
- Both figures show how the dependencies are used such graph.hpp and gtest.h are the external libraries
- The Cmake build script

######################## project details ##########################
cmake_minimum_required(VERSION 3.13)

# Enable debug symbols by default
if(NOT CMAKE_BUILD_TYPE)
  set(CMAKE_BUILD_TYPE Debug CACHE STRING "Choose the type of build (Debug or Release)" FORCE)
endif()

# project detail
project(mcpp VERSION 1.0 DESCRIPTION "Statistical model")
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ../out)

# cpp standard (c++14)
set(CMAKE_CXX_STANDARD 14)

include_directories(includes)
include_directories(test/)

add_subdirectory(lib/googletest EXCLUDE_FROM_ALL)

# reg
add_executable(test_reg test/reg_test.cpp)

# misc
add_executable(test_misc test/misc_test.cpp)

# int 
add_executable(test_int test/int_test.cpp)

# menu 
add_executable(test_menu test/menu_test.cpp)

target_link_libraries(test_reg gtest pthread) 
target_link_libraries(test_misc gtest pthread) 
target_link_libraries(test_int gtest pthread)

Git log (https://github.com/harith-alsafi/statistic-model-cpp)

24377de6de2ff6535fe0331048c3e2a44be12497 changed documentation
97e947ebb93475b9911620b25386f6b2aefdede8 fixed documentation
2102f546e5618bb7d12afa2a3aca7f1f662cf350 added files
d8e4267113b8564147917e188d65b233dbad4de6 fixed csv
894dbd2c04d21ef5c0ff833c8cd67316ce10ad7a finished everything
b1b2b3d3cc40686fbf29664e20f354c6992fef26 finalized documentation
9189429c6c3ec6a8e6aa0a4061bad4650ab9e2cd finishde intp.hpp
7aa0493f0d984c3813ad9df98d6726bfb5756075 finished documenting misc.hpp
5e1332c1a10f44cc0fc26127f16d02b4dfcdbef4 finalized everything
fba32bf5ea45edd10d2ce9f0d75d18a06a86a9b4 finished polynomial
cdd7008e3e8d408ec72baf549d28d5f9790cff81 finished poly (almost)
87aa4050372b44b8de3a9502c8d5c5293ea9fa8e polynomial isnt working for real life data
3f618b0cba6d9f9cd34b9480b092f24e2093cb42 finished up the menu [almost]
fcd0e588c52bed6efa3485e33d4a4fcc6bc9b52d finishing up regression
f0ce1ee27deba868628dc62eec2ffb11954af76e almost finished option for show data
a803eda7b8e6fa0cfb092c5de6567a80e878c19a finishing up the menu
d6c26c236c12c9a5d265af2f3ea6e6b0de470edd started with menu options
80dbf36da54d500560f2f4b1492b82cecb7a8279 finished polyinterp
2dfd42dfc03d2d2dcd709ecc180c73894330eda5 finished testing linerinterp
b3314b4eaebacaefaab56b3abb97a14f683b758b cleaning up a bit of the code
e960e121c87c22ff5bbd99a2104d852d9c9e68cc started testing intp
477724c717b7dee948440978bf46980b38316eae almost finishing up linearinterpolation
5949f356d03305b0e93b80fc5802363935d06abb finishing up LinearRegression
6c546ce0997cf6932022c547d324afd9eff81966 fixed naming scheme
86aab0d1a023a7f10bfa6546b0ed69a17bf3c059 adding more features
308bea5874622c6cb103dbe3aafed3eb92cadb51 cleaning up the enviroment
8f81d1924d29eb0fa8eb0c24f049bd55e3ed92a2 finished table
bee2a6b73fe160d4e2498fb3e331bdc8fa23274c did changes to plot and added to table
01508cdae75b4463239d7b975086f111f60ca834 directory structure and code
17a548d5ef8e836059f2244d431d049daa45c018 directory structure
c0ac4a63a52e070a85824966f170432f58349423 Initial commit

Detailed design

Namespaces and classes
- We have multiple namespaces (N) and multiple classes (C)
- The figure below summarises the existing namespaces with their corresponding classes
- The main class that is used to run the menu is Menu
- It encapsulates all methods and classes into a CLI interface

misc.hpp

Holds misc namespace which stands for miscellaneous
Dependencies

Global functions
- misc.hpp has only two global functions outside the namespace as shown below:
- The reason that they are global is that one of them overload an existing function which is round such that it gives the ability to specify the number of decimal places
- The other function implements a multiplication operator for std::vector
- Both of those functions are templated meaning they can take any data type
- I wanted to implement this method in other classes however graph.hpp uses long double, thus I was forced to use long double
- Another way could have been to convert any data type into long double before using graph.hpp however this is not practical as I will be repeating many lines of code which is inefficient.
Functions inside the namespace
- This the only function that doesnt belong to any class but is within the namepsace
- It is used to generate an std::vector of any type given the minimum and maximum values with the number of points of that std::vector
Plot class
- Collaboration diagram
- The collaboration diagram will show any class that Plot has used within its attributes such as:
  - graphoptions which is part of graph.hpp , the variable is aoptions
  - std::string which is part of <string> , the variable is title
- The collaborative variables are shown as pointing towards the class
- The diagram shows all methods and attributes of the class such that
  - Private methods and attributes are denoted by - (ex: height)
  - Public methods and attributes are denoted by + (ex: Plot())
- It also contains an enum variable which holds the colours to use in graph.hpp
- This is a description of the attributes
Table class
- Collaboration diagram
- We can see that this diagram is a bit different that the previous one such that we have an arrow going from the class itself
- This is because this class uses inheritance as it inherits from a standard library which is std::vector<std::vector<long double>>
- There is a QR struct within this class which is as follows:
- This struct will be used to store the Quartiles range for a given vector
What files uses misc.hpp

regr.hpp

Holds regr namespace which stands for regression
Dependencies

This file doesn't contain any global functions inside or outside the namespace
LinearRegression class
- Collaboration diagram
- We can see that this class also uses a misc::Plot variable, which the reason its shown in the diagram
- misc::Table was also used but mainly as a return type as shown below:
- All of the heavy mathematical algorithms are done in fit_data()
- We also have virtual functions such that they will be overloaded by the following class which is PolyRegression
- This class has the following friend class as it will ease up integrating both classes
- The attributes of the class will store the main variables that will be used with regression
PolyRegression class
- Collaboration diagram
- This class inherits from LinearRegression and overloads the virtual methods
- And as the same concept most of the mathematical algorithms were implemented in fit_data()
- This class also declares a few methods of LinearRegression as private since they are not suited for a polynomial regression such as:
- The new attributes added are as follows:
- Where as the new added function are specifically used for this class:
  - set_degree() sets the degree
  - get_power() returns a std::string of the x superscript based on given power
What files use regr.hpp

intp.hpp

Holds intp namespace which stands for interpolation
Dependencies

This file doesn't contain any global functions inside or outside the namespace
LinearInterp class
- Collaboration diagram
- This one uses the same concept as LinearRegression when it comes to the design
- Most of the numerical algorithms occur in find_value()
- plot_all_interpolation() will be used to showcase the true intent of this program
- This class also has virtual methods which are:
- This class also has a friend class which is as follows:
- The following is the description of the attributes
PolyInterp class
- Collaboration diagram
- This class inherits from LinearInterp
- It only overrides the ``virtual` functions without adding any new methods or attributes
What files use intp.hpp

menu.hpp

Holds the CLI implementation of the program within the Menu class
The dependency diagram is the same for the one used for the whole diagram
Menu class
- Collaboration diagram
- Since this class uses all the previous classes, this is the reason that the diagram looks complex
- The class also has an enum as follows :
- This variable will control the flow of the program such as the program will have sub menus, meaning that I need to keep the program in the sub menus until prompted to go back
- The following is a description of the attributes
- The only function that will be used is run() which will start the menu
- This is implemented in this way such that its easier to use implement in other files

Mathematical design

Representation of the arrays

Statistical analysis
- Mean
  - Assuming we have an array of n-values, we loop through the array and sum up all of the values
  - Then we divide by the number of this array
- Standard deviation and variance
  - We loop through the elements
  - Subtract each element from the mean and square the result
  - Add up all the elements and then divide by the total number of elements of the array
- Interquartile range
  - You need to sort the area before implementing the equation
  - The output of the Quartiles would be the index of the array
- Co-relation coefficient
  - This will implement the sum function that I made
  - It will also use the multiplication operator for the vectors
  - Having those two methods made implementing this equation much simpler
Linear regression
- This uses the same concept as the correlation coefficient
- So I calculate the numerator and denominator through the previous functions
- Then calculate the corresponding slope (m) and y-intercept (c)
Polynomial regression (m is the degree)
- We can see that the polynomial regression is more complex compared to the linear one
- Through code it can be simplified such as in order to get the matrix X we need a few loops that calculates the power or each x-value and assign it into the correct place of the matrix
- The first equation shows the representation of each row
- This means that we will have a system of linear equations
- Using the concept of linear algebra specifically Gaussian elimination we are able to solve for a
- Since one of my team-mates was implementing such algorithms I borrowed some of his code to solve the linear system
Linear interpolation
- First we need to sort the array in ascending order
- Then we extract the upper and lower bound that x-input lies in between
- We extract the corresponding y-value of the upper and lower bounds
- Then we calculate the interpolated y-value
Polynomial interpolation (Lagrange interpolation)
- So we declare a nested for loop such that we calculate the difference and then multiply it with the previous difference unit we get the value
- Then we calculate the interpolated-y through the summation of the product of each y-value with what we got from the previous equation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of content

Features

Code and screenshots

Project overview

Detailed design

misc.hpp

regr.hpp

intp.hpp

menu.hpp

Mathematical design

Reference

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
build		build
doc		doc
include		include
lib/googletest		lib/googletest
out		out
test		test
CMakeLists.txt		CMakeLists.txt
README.md		README.md
dir.txt		dir.txt
doxygen.conf		doxygen.conf
doxygen.conf.bak		doxygen.conf.bak
general.cpp		general.cpp
log.txt		log.txt

harith-alsafi/statistic-model-cpp

Folders and files

Latest commit

History

Repository files navigation

Table of content

Features

Code and screenshots

Project overview

Detailed design

misc.hpp

regr.hpp

intp.hpp

menu.hpp

Mathematical design

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages