News Category Predictor: Automatic Classification of Tasnim News Dataset Using Machine Learning Algorithms
A project for automatic news classification into different topics using machine learning algorithms on the Tasnim News dataset. This project utilizes several machine learning techniques including Support Vector Machine (SVM), Naive Bayes, and Random Forest to accurately classify news articles into predefined categories.
- Data Preprocessing: Cleaning and preparing the news articles for classification.
- Model Training: Using various machine learning algorithms such as SVM, Naive Bayes, and Random Forest.
- Evaluation: Assessing the performance of the models with accuracy, precision, and recall metrics.
- Prediction: Classifying new articles into predefined categories.
The dataset used in this project is sourced from TasnimNews Dataset (Farsi - Persian) | تسنیم.
- Python 3.x
- scikit-learn
- pandas
- numpy
- matplotlib
- hazm
The model achieved the following metrics on the test dataset:
Feel free to fork this project, submit issues and pull requests. For major changes, please open an issue first to discuss what you would like to change.