This project implements a pipeline for activity classification using sensor data from wearable devices. The primary objective is to classify human activities such as Walking
, Running
, Lying Down
, and Sitting
based on features extracted from accelerometer sensors placed on different body parts. The project uses a Random Forest Classifier to achieve accurate activity recognition.
- Preprocessing pipeline for handling missing values, mapping labels, and filtering activities.
- Feature selection using
SelectKBest
. - Random Forest Classifier for activity prediction.
- Hyperparameter tuning using
GridSearchCV
. - Model evaluation with metrics such as accuracy, classification report, and confusion matrix.
The following libraries are used in this project:
os
pandas
sklearn
train_test_split
RandomForestClassifier
classification_report
,confusion_matrix
,accuracy_score
GridSearchCV
SelectKBest
,f_classif
matplotlib
seaborn
The dataset contains sensor readings from wearable devices. Each data entry includes measurements such as back_x
, back_y
, back_z
, thigh_x
, thigh_y
, and thigh_z
and their corresponding activity labels.
- Python 3.8 or higher.
- The required libraries listed in
requirements.txt
.
-
Clone this repository:
git clone https://github.com/Emem-studio/intelligence-engineering-CW2.git cd intelligence-engineering-CW2
-
Install the dependencies:
pip install -r requirements.txt
-
Ensure your dataset files are in the appropriate directory (
har70plus
) as specified in the script. -
Run the main script:
python random_forest_pipeline.py
- The script outputs accuracy, classification reports, and confusion matrices for the classification task.
- Best hyperparameters from
GridSearchCV
will also be displayed.
- Features:
back_x
,back_y
,back_z
thigh_x
,thigh_y
,thigh_z
- Activities:
Walking
Running
Lying Down
Sitting
- Filtered out:
Standing
- Classifier: Random Forest
- Hyperparameter Tuning:
n_estimators
max_depth
max_features
- Balanced class weights to handle class imbalance.
The final model achieves an accuracy of ~83.9% with the following performance metrics:
- Precision, Recall, and F1-Score for each activity.
- Confusion matrix heatmap visualization.
- This project integrates with GitHub for CI/CD and automation.
- Set up on Azure for scalable and efficient processing.