Skip to content

🏨 Hotel Reservation Prediction - Machine learning model capable of predicting the cost per room in a hotel reservation. API service developed to load the trained model from S3 and perform inference.

License

Notifications You must be signed in to change notification settings

GiovaneIwamoto/hotel-reservation-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HOTEL RESERVATION PREDICTION

OVERVIEW

The project aims to develop a machine learning model capable of predicting the cost per room in a hotel reservation. The creation and training of the model were carried out using Amazon SageMaker, and the training data is stored in DynamoDB. After training, the model is stored in Amazon S3. To enable the utilization of the model, an API service was developed using Python and the FastAPI framework to load the trained model from S3 and perform cost per room inference. The deployment of the service is conducted through AWS Elastic Beanstalk.

Icons


ARCHITECTURE

alt text


Warning

It is imperative for users to deploy their own application on AWS using their own credentials to ensure compliance and security. This ensures that users have full control over their application's environment and data, facilitating customization and enhancing security measures.

MODEL TRAINING

The dataset used is the Hotel Reservations Dataset, which contains various information about thousands of reservations, including the price per room, our target variable for analysis. The dataset is stored in DynamoDB and retrieved within SageMaker.

Important

The data preprocessing included adding a new label column to classify the price into three numerical categories:

$1$ Price LTE to $85$ | $2$ Price GT $85$ and LT $115$ | $3$ Price GTE to $115$

Subsequently, the original column containing the price is removed. Additionally, exploratory data analysis was conducted to identify the most relevant correlations for training.

Note

KNN, Logistic Regression, and XGBoost models were tested using both the original dataset and the dataset with selected relevant columns.

The results with the selected dataset were:

Model Precision Recall F1-score Accuracy
KNN $0.62$ $0.62$ $0.62$ $0.62$
Linear Learner $0.57$ $0.59$ $0.56$ $0.57$
XGBoost $0.82$ $0.82$ $0.82$ $0.82$
XGBoost Oversampling $0.83$ $0.83$ $0.83$ $0.82$
XGBoost Undersampling $0.83$ $0.83$ $0.83$ $0.83$

The results with the original dataset were:

Model Precision Recall F1-score Accuracy
Linear Learner $0.61$ $0.62$ $0.61$ $0.61$
XGBoost $0.84$ $0.84$ $0.84$ $0.837$
XGBoost Oversampling $0.85$ $0.85$ $0.85$ $0.84$
XGBoost Undersampling $0.83$ $0.83$ $0.83$ $0.835$

Based on the results, it was observed that the XGBoost model demonstrated the best performance with both datasets. Strategies such as oversampling, undersampling, and hyperparameter tuning were applied to enhance the model. It was concluded that the optimal performance of this model was achieved using the original dataset with oversampling, resulting in an accuracy of $84$%.

alt text


API IMPLEMENTATION

The Hotel Reservation Prediction API was developed using the FastAPI framework, leveraging its efficiency. Users can utilize Swagger to easily submit reservation details such as the number of adults, children, nights of stay, and lead time. The API returns the predicted class for the reservation, indicating the corresponding price range.

Method Endpoint Description
POST /api/v1/predict Submits data for prediction

The inference process comprises several sequential steps. Firstly, incoming parameters are received and subjected to validation using Pydantic, a Python library designed for data validation. Following this, categorical parameters undergo conversion into a binary numerical format to ensure compatibility with the model. Subsequently, the XGBoost model executes prediction operations on the transformed input data, facilitated by its ability to handle structured data effectively. Post-prediction, both the input parameters and the resultant prediction are recorded in the DynamoDB table, facilitating traceability and further analysis. Finally, the API response encapsulates the predicted class determined by the model, thus completing the inference process.

Caution

Credentials should remain local to your environment only. Never expose your credentials in any part of the code, such as in source files, comments, or commit history. Instead, use environment variables or secure secret management tools to manage and access your credentials securely.


AUTHORS

Giovane Iwamoto | Gustavo Serra | Isabela Buzzo | Leandro Pereira

Giovane Hashinokuti Iwamoto - Computer Science student at UFMS - Brazil - MS

I am always open to receiving constructive criticism and suggestions for improvement in my developed code. I believe that feedback is an essential part of the learning and growth process, and I am eager to learn from others and make my code the best it can be. Whether it's a minor tweak or a major overhaul, I am willing to consider all suggestions and implement the changes that will benefit my code and its users.

About

🏨 Hotel Reservation Prediction - Machine learning model capable of predicting the cost per room in a hotel reservation. API service developed to load the trained model from S3 and perform inference.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages