The project aims to develop a machine learning model capable of predicting the cost per room in a hotel reservation. The creation and training of the model were carried out using Amazon SageMaker, and the training data is stored in DynamoDB. After training, the model is stored in Amazon S3. To enable the utilization of the model, an API service was developed using Python and the FastAPI framework to load the trained model from S3 and perform cost per room inference. The deployment of the service is conducted through AWS Elastic Beanstalk.
Warning
It is imperative for users to deploy their own application on AWS using their own credentials to ensure compliance and security. This ensures that users have full control over their application's environment and data, facilitating customization and enhancing security measures.
The dataset used is the Hotel Reservations Dataset, which contains various information about thousands of reservations, including the price per room, our target variable for analysis. The dataset is stored in DynamoDB and retrieved within SageMaker.
Important
The data preprocessing included adding a new label column to classify the price into three numerical categories:
Subsequently, the original column containing the price is removed. Additionally, exploratory data analysis was conducted to identify the most relevant correlations for training.
Note
KNN, Logistic Regression, and XGBoost models were tested using both the original dataset and the dataset with selected relevant columns.
The results with the selected dataset were:
Model | Precision | Recall | F1-score | Accuracy |
---|---|---|---|---|
KNN | ||||
Linear Learner | ||||
XGBoost | ||||
XGBoost Oversampling | ||||
XGBoost Undersampling |
The results with the original dataset were:
Model | Precision | Recall | F1-score | Accuracy |
---|---|---|---|---|
Linear Learner | ||||
XGBoost | ||||
XGBoost Oversampling | ||||
XGBoost Undersampling |
Based on the results, it was observed that the XGBoost model demonstrated the best performance with both datasets. Strategies such as oversampling, undersampling, and hyperparameter tuning were applied to enhance the model. It was concluded that the optimal performance of this model was achieved using the original dataset with oversampling, resulting in an accuracy of
The Hotel Reservation Prediction API was developed using the FastAPI framework, leveraging its efficiency. Users can utilize Swagger to easily submit reservation details such as the number of adults, children, nights of stay, and lead time. The API returns the predicted class for the reservation, indicating the corresponding price range.
Method | Endpoint | Description |
---|---|---|
POST | /api/v1/predict | Submits data for prediction |
The inference process comprises several sequential steps. Firstly, incoming parameters are received and subjected to validation using Pydantic, a Python library designed for data validation. Following this, categorical parameters undergo conversion into a binary numerical format to ensure compatibility with the model. Subsequently, the XGBoost model executes prediction operations on the transformed input data, facilitated by its ability to handle structured data effectively. Post-prediction, both the input parameters and the resultant prediction are recorded in the DynamoDB table, facilitating traceability and further analysis. Finally, the API response encapsulates the predicted class determined by the model, thus completing the inference process.
Caution
Credentials should remain local to your environment only. Never expose your credentials in any part of the code, such as in source files, comments, or commit history. Instead, use environment variables or secure secret management tools to manage and access your credentials securely.
Giovane Iwamoto | Gustavo Serra | Isabela Buzzo | Leandro Pereira
Giovane Hashinokuti Iwamoto - Computer Science student at UFMS - Brazil - MS
I am always open to receiving constructive criticism and suggestions for improvement in my developed code. I believe that feedback is an essential part of the learning and growth process, and I am eager to learn from others and make my code the best it can be. Whether it's a minor tweak or a major overhaul, I am willing to consider all suggestions and implement the changes that will benefit my code and its users.