Study projects developed during data science courses.
The function guesses the number and prints the number of attempts.
Studying the provided data using pandas.
EDA, prepare the data for the machine learning
- Filter outliers
- Perform correlation analysis in quantitative data
- Perform analysis of the nominative variables
- Select columns for the machine learning step.
Predict tripadvisor restaurant rating.
- Data cleaning
- Filling NA
- Outlier removing
- Feature Engineering
- EDA
- Using ML first time with default parameters
First whole data preprocessing with eda and feature engineering.
Bank score prediction project
- Data cleaning
- Filling NA
- Outlier removing
- Feature Engineering
- EDA
- ML
- Naive model
- PCA, SVD to reduce the matrix size
- Hyperparameter tuning
Predict car classes from the pictures using deep learning
- 6 types of augmentation
- Different sizes of images starting from 512 to 224
- Different number of epochs
- Different batch sizes
- All model types that are presented in tf.keras.applications
- Fine-tuning and transfer learning
- LR were optimized using ReduceLROnPlateau
- Different optimizers
- Batch Normalization
- Different callback Keras functions
- TTA
- Different head architecture
Analysis of vacancies from HeadHunter using SQL query in jupyter notebook
Property price prediction
The data have a lot of outliers, mistakes, input errors, slang abbreviations, that's why the project was split into 2 parts data_cleaning.ipynb and eda_ml.ipynb
- Data cleaning
- Data Enrichment
- EDA
- Feature Engineering
- ML
- Outlier removal using different models: IsolationForest, EllipticEnvelope, LocalOutlierFactor
- Feature selection using different methods: RFE, SelectFromModel, FeatureImportance
- Testing of linear models. Baseline.
- Testing of 5 different advanced models: Random Forest, CatBoost, Gradient Boosting, XGBoost, LightGBM. Bagging and stacking have also been tested.
- Hyperparameter tuning