Skip to content

import datasets, perform exploratory data analysis, scaling & different models such as linear or logistic regression, decision trees, random forests, K means, support vectors etc.

License

Notifications You must be signed in to change notification settings

nmathias0121/ml-model-algorithms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ml-model-algorithms

import datasets, perform exploratory data analysis, scaling & different models such as linear or logistic regression, decision trees, random forests, K means, support vectors etc.

Import Modules install module in system :
  "pip3 install module-name"

Process Data
 process_data.py contains the following functions :
  get_file_names_in_dir(dir_name) : print name of files to process in directory
  dataset_import(file_name, dataset_type) : import dataset & print description such as data size, rows, columns, unique and null values
  dataset_EDA(data, pairplot_columns) : pairplot, heatmap
  dataset_scrubbing(data, scrub_type, data_columns, fill_operation) : clean data by removing or filling missing values, deal with categorical variables using one hot encoding, remove entire columns
  pre_model_algorithm(df, algorithm, target_column) : scale data using principle component analysis or k means clustering
  def split_validation(dataset, features, target_column, test_split) : split train data into train & test including the target column with desired split ratio

Run Model
 run_model.py contains the following models :
  linear_regression(X_train, X_test, y_train, y_test, show_columns, target_column) : continuous predictions
  logistic_regression(X_train, X_test, y_train, y_test, show_columns, target_column) : discrete predictions
  decision_tree_classifier(X_train, X_test, y_train, y_test, show_columns, target_column) : both continuous & discrete predictions
  random_forest_classifier(X_train, X_test, y_train, y_test, show_columns, target_column, num_estimators) : both continuous & discrete predictions
  gradient_boosting(X_train, X_test, y_train, y_test, show_columns, target_column, gb_type) : regressor for continuous & classifier for discrete
  k_neighbors_classifier(X_train, X_test, y_train, y_test, show_columns, target_column, k, scaled_features) : continuous, discrete, ordinal, categorical data predictions
  support_vector_classifier(X_train, X_test, y_train, y_test, show_columns, target_column) : continuous data predictions

References