This repository contains the solution code for the Nitro NLP Hackathon, a competition focused on multi-class text classification in the Romanian language. The goal of the competition is to classify text into one of five possible categories: Sexist Direct, Sexist Descriptive, Sexist Reporting, Non-sexist Offensive, and Non-sexist Non-offensive. For more detailed information about the problem description, you can visit the Kaggle competition page.
The task involves developing a pipeline for multi-class text classification in the Romanian language. The provided dataset consists of text data collected from various sources, including social media networks like Facebook, Twitter, Reddit, web articles, and books. The text needs to be classified into the following categories:
- Sexist Direct
- Sexist Descriptive
- Sexist Reporting
- Non-sexist Offensive
- Non-sexist Non-offensive
The dataset has been curated to ensure representative content and includes posts with sexist and offensive language. Please note that the dataset might contain content that could be considered upsetting or disturbing. Reader discretion is advised.
The performance of the classification models was be evaluated using the weighted accuracy metric. This metric accounts for the class imbalance in the dataset by calculating the average accuracy of the different classes, weighted by their relative frequency.
Participated in this hackathon as a team of two individuals and proudly secured the 7th place out of 46 competing teams. You can find the final leaderboard on competition results page.