Skip to content

Indonglish Dataset created based on Jaksel Sociolinguistic phenomenon

Notifications You must be signed in to change notification settings

laksmitawidya/indonglish-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Indonglish Dataset ✨

Dataset for semantic task (sentiment analysis) 😃 😒😐

📋 Paper

[Paper 53] Code-Mixed Sentiment Analysis using Transformer for Twitter Social Media Data

✍️ Citation

Laksmita Widya Astuti, Yunita Sari and Suprapto, “Code-Mixed Sentiment Analysis using Transformer for Twitter Social Media Data” International Journal of Advanced Computer Science and Applications(IJACSA), 14(10), 2023. http://dx.doi.org/10.14569/IJACSA.2023.0141053

❓About

This dataset was constructed based on keywords derived from the sociolinguistic phenomenon observed among teenagers in South Jakarta. The dataset was designed to tackle the semantic task of sentiment analysis, incorporating three distinct label categories: positive, negative, and neutral. The annotation of the dataset was carried out by a panel of five annotators, each possessing expertise language and data science.

📈 Data Generating Process

The available data spans from August 2020 to September 2022. Along with keywords, the endpoint query also includes date-based queries. The dataset is standardized by dividing it into three sections: testing, validation, and training. The evaluation and dataset distribution adhere to the same F1 value calculation as applied to the IndoLEM dataset in a manner similar to the approach outlined in a study conducted by Koto et al. The data distribution in this study employs a ratio of 3638 sentences for training, 399 for validation, and 1011 for testing.

About

Indonglish Dataset created based on Jaksel Sociolinguistic phenomenon

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published