Amazon-ML-Challenge-2k23

This repository provides an insight into the methods used by Team PowerPuff Girls in the Amazon Machine Learning Challenge 2023.

Note: This is not a comprehensive solution, but an assortment of the key model code that our team implemented.

Team Name : PowerPuff Girls

Team members :

Akarshan Kapoor
Samvaidan Salgotra
Taraksh Sambhar
Ayush Tiwari

Leaderboard Position : Rank 50.
Find the leaderboard here.

Explanation of Approaches

Approach 1

This approach constructs a Keras Sequential model, reads in the training data, and processes two features named "TITLE_DES" and "TITLE_BUL" using the TF-IDF vectoriser. It then pads the resulting vectors to a uniform length and trains the model. It iterates through the groups in the test DataFrame, processes the same two features, generates the corresponding input for the model, and uses the trained model to make predictions. Finally, the model collects all the predictions along with their corresponding "PRODUCT_ID" in a DataFrame for the final output.

Approach 2

This approach preprocesses the data by cleaning text columns, removing HTML tags, converting to lowercase, removing punctuation, and eliminating stopwords. The modified training data is saved to a new CSV file. The AutoKeras library is used to create a text regression model, which is trained on the preprocessed training data and loaded using TensorFlow. The preprocessed test data is combined into a single column, and the model is used to predict the "PRODUCT_LENGTH" for both the combined text and the title text in the test data.

Approach 3

This approach creates a BERT-based classifier. It begins by setting a fixed seed for reproducibility. After this, it performs the preprocessing tasks, such as handling duplicate entries and missing values. The titles are then encoded into numerical values using a pre-trained multilingual version of BERT, and the encoded data is split into training and validation sets. It then defines a PyTorch Dataset class and uses it to construct DataLoader instances for efficient iteration over the dataset during training and validation. The script builds a classification model by adding a dropout and a linear layer on top of the pre-trained BERT model. After setting up the learning rate scheduler and loss function, the training process is executed in a loop, where in each epoch, the model is trained on the entire dataset and then evaluated on the validation set. The best performing model on the validation set is saved. Lastly we evaluate the model using root mean square error (RMSE) as the metric.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
approach1.py		approach1.py
approach2.py		approach2.py
approach3.py		approach3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon-ML-Challenge-2k23

Explanation of Approaches

Approach 1

Approach 2

Approach 3

About

Releases

Packages

Languages

CodingWarrior33/Amazon-ML-Challenge-2k23

Folders and files

Latest commit

History

Repository files navigation

Amazon-ML-Challenge-2k23

Explanation of Approaches

Approach 1

Approach 2

Approach 3

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages