ML-Elections

Machine Learning 101

Steps of work

Data Cleansing
1. Missing Data
  1. fill missing columns with relevant values
    - median for numeric fields
    - most common for label for categorial fields
  2. (optional) look for linear correlation with other feature.
  3. (optional) create boolean feature for missing values.
  4. (optional) closest fit (WHAT??)
2. Noisy Data
  1. Outlier Detection
    1. Nearest Neighbours - remove by choosing outlier.
3. Data transformation.
  1. Change categories to boolean (0/1) columns.
  2. Change boolean columns to binary.
  3. (Optional) Create categories by grouping. (linear - age, Logarithmic - num of employees )
  4. Scaling
    1. linear ( Xi / max(X) )
  5. (Optional) balance data
    1. if we have one label with way more occurrences than other we should scale it.
Feature selection
1. variance filter - remove features with low variance
2. filter methods
  1. select top 25% features with f_classif
  2. select top 25% features with mutual_information
3. wrapper method
  1. run RFECV with fold = 3
  2. run RFECV with StratifiedKFold(2)
4. Embedded methods
  1. decisions tree - pick 25% of the highets weight features
5. Sum up all the features together.
Evaluating model
1. Trials – trying out different models.
2. Examination – after the trials we zoomed in on best models.
3. Training – Training the chosen models
4. Prediction - predicting based on the chosen models.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.idea		.idea
data		data
docs		docs
.gitignore		.gitignore
Clustering V2.ipynb		Clustering V2.ipynb
Ex02.zip		Ex02.zip
Ex03.zip		Ex03.zip
LICENSE		LICENSE
ML-Ex02.pdf		ML-Ex02.pdf
ML-Ex03.pdf		ML-Ex03.pdf
ML-Ex04.pdf		ML-Ex04.pdf
README.md		README.md
Untitled.html		Untitled.html
Untitled.txt		Untitled.txt
automate_models.py		automate_models.py
clustering.html		clustering.html
clustering.ipynb		clustering.ipynb
clustering.py		clustering.py
data_exploration.html		data_exploration.html
data_exploration.ipynb		data_exploration.ipynb
data_exploration.py		data_exploration.py
main.py		main.py
modeling.py		modeling.py
modeling_notebook.html		modeling_notebook.html
modeling_notebook.ipynb		modeling_notebook.ipynb
modeling_notebook.py		modeling_notebook.py
pred_with_dec_tree.py		pred_with_dec_tree.py