Semantic Role Labeling (SRL) for Hindi

Project Description

Semantic Role Labeling (SRL) is a critical task in Natural Language Processing (NLP) aimed at identifying the roles words or phrases play in a sentence. This project focuses on developing an SRL system for Hindi, using both statistical and neural models. The goal is to enhance the performance of SRL systems by accurately labeling arguments in sentences, which is vital for applications like question-answering, inference, and machine translation.

Why SRL?

Improves NLP Applications: Enhances tasks such as question-answering, inference, and knowledge graph creation by providing deeper insights into sentence structures and meanings.
Better Translation: Preserves intended meanings and syntactic structures across languages, leading to more accurate translations.
Accessibility: Increases linguistic accessibility for diverse communities.
Linguistic Diversity: Addresses the needs of multilingual communities, particularly in India.

Datasets

Hindi Propbank: Part of the Hindi-Urdu PropBank project, designed for SRL tasks. It provides a multi-representational and multi-layered Treebank for Hindi-Urdu.
Custom Dataset: Created from the Hindi Propbank, consisting of around 14,000 tokens of Hindi text with head POS, dependency relations, and SRL labels.
Additional Dataset: Collected 1.3k Hindi sentences with arguments, SRL labels, and dependency relations.

Models and Experiments

Statistical Models

Linear Support Vector Classifier (LinearSVC):
- Effective for high-dimensional data.
- Utilized three different sets of input features capturing various aspects of input data.

Neural Network Models

FastText Embeddings + BiLSTM Classifier:
- Combines FastText embeddings and BiLSTM for non-contextual word representations.
FastText Embeddings + Dependency Relation + BiLSTM Classifier:
- Adds dependency relations to enhance syntactic and semantic understanding.
Indic-BERT + MLP Classifier:
- Uses Indic-BERT embeddings with a Multilayer Perceptron for contextual word representations.
Indic-BERT + BiLSTM Classifier:
- Combines Indic-BERT embeddings with BiLSTM for contextual and sequential modeling.
Indic-BERT + Dependency Relation + BiLSTM Classifier:
- Integrates Indic-BERT embeddings and dependency relations for comprehensive SRL.

Evaluation

Baseline Model: Utilized fundamental features (word, POS tag, argument indicator).
Performance Metrics: Accuracy, precision, recall, and F1-score.
Results:
- FastText Embeddings + BiLSTM: 70.41% accuracy
- FastText Embeddings + Dependency Relation + BiLSTM: 86.86% accuracy
- Indic-BERT + MLP Classifier: 89.81% accuracy
- Indic-BERT + BiLSTM Classifier: 90.02% accuracy
- Indic-BERT + Dependency Relation + BiLSTM: 95.34% accuracy

Challenges

Scarcity of suitable datasets for SRL in Hindi.
Limited resources and references in this domain.
Need for deeper understanding of Hindi grammar and possibly other Indian languages.
Small dataset size impacting the reliability of metrics.

Future Work

Exploring Different BERT Layers: To improve semantic feature representation by testing various BERT layers.

Directory Structure

├── Data
│
├── Code
│   ├── Data Preparation.ipynb
│   ├── NLP_FastText_1.ipynb
│   ├── NLP_FastText_2.ipynb
│   ├── NLP_IndicBert_1.ipynb
│   ├── NLP_IndicBert_2.ipynb
│   ├── NLP_IndicBert_3.ipynb
│   └── NLP_Statistical_Models.ipynb
│
├── Models
│   ├── FastText_BiLSTM_Classifier.pt
│   ├── FastText_BiLSTM_Dep_Classifier.pt
│   ├── IndicBert_BiLSTM_Classifier.pt
│   ├── IndicBert_MLP_Classifier.pt
│   └── IndicBert_MLP_Dep_Classifier.pt
│
├── Research Papers
│   ├── (Paper) Nomani - Towards Building Semantic Role Labeler.pdf
│   └── 28_W29.pdf
│
├── LICENSE
├── Project_PPT.pdf
└── Project_Report.pdf

Directory Details

Data: Contains the datasets used for training and testing the SRL models. These datasets include the Hindi Propbank and additional custom datasets created for this project.
Code: Includes all Jupyter notebooks used for data preparation, model training, and evaluation. Each notebook is named according to its content:
- Data Preparation.ipynb: Scripts for preparing and preprocessing data.
- NLP_Statistical_Models.ipynb: Implementation of statistical models like Linear SVC.
- NLP_FastText_1.ipynb: Initial FastText embedding and model training.
- NLP_FastText_2.ipynb: Further experiments with FastText embeddings.
- NLP_IndicBert_1.ipynb, NLP_IndicBert_2.ipynb, NLP_IndicBert_3.ipynb: Various experiments using IndicBERT embeddings.
Models: Contains the saved models resulting from the training processes. Each model file is named according to the architecture used:
- FastText_BiLSTM_Classifier.pt: Model using FastText embeddings with BiLSTM.
- FastText_BiLSTM_Dep_Classifier.pt: FastText embeddings with BiLSTM and dependency relations.
- IndicBert_BiLSTM_Classifier.pt: IndicBERT embeddings with BiLSTM.
- IndicBert_MLP_Classifier.pt: IndicBERT embeddings with a Multilayer Perceptron (MLP).
- IndicBert_MLP_Dep_Classifier.pt: IndicBERT embeddings with MLP and dependency relations.
Research Papers: Contains relevant research papers and references used during the project:
- (Paper) Nomani - Towards Building Semantic Role Labeler.pdf: A referenced research paper.
- 28_W29.pdf: Another referenced research document.
LICENSE: The license file for the project.
Project_PPT.pdf: The presentation slides summarizing the project.
Project_Report.pdf: The detailed project report outlining methodologies, experiments, and results.

Code and Resources

Repository: Google Drive

References

Gupta, A., & Shrivastava, M. (2018). Enhancing Semantic Role Labeling in Hindi and Urdu. LREC 2018 Workshop on Representation Learning for NLP, 28-32.
Anwar, M., & Sharma, D. (2016). Towards Building Semantic Role Labeler for Indian Languages. Tenth International Conference on Language Resources and Evaluation (LREC'16), 4588-4595.
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching Word Vectors with Subword Information. Facebook AI Research.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Role Labeling (SRL) for Hindi

Project Description

Why SRL?

Datasets

Models and Experiments

Statistical Models

Neural Network Models

Evaluation

Challenges

Future Work

Directory Structure

Directory Details

Code and Resources

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Code		Code
Data		Data
Models		Models
Research Papers		Research Papers
.DS_Store		.DS_Store
LICENSE		LICENSE
Project_PPT.pdf		Project_PPT.pdf
Project_Report.pdf		Project_Report.pdf
README.md		README.md

License

sharmamht19/Semantic-Role-Labeling

Folders and files

Latest commit

History

Repository files navigation

Semantic Role Labeling (SRL) for Hindi

Project Description

Why SRL?

Datasets

Models and Experiments

Statistical Models

Neural Network Models

Evaluation

Challenges

Future Work

Directory Structure

Directory Details

Code and Resources

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages