Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Named Entity Recognition using NLP #947

Open
agupta451 opened this issue Oct 24, 2024 · 7 comments
Open

Named Entity Recognition using NLP #947

agupta451 opened this issue Oct 24, 2024 · 7 comments
Labels
Status: Up for Grabs Up for grabs issue. WoC 4.0 Winter of Code 4.0 by GDG IIITK

Comments

@agupta451
Copy link

Deep Learning Simplified Repository (Proposing new issue)

🔴 Project Title : Named Entity Recognition using NLP

🔴 Aim : Develop a Named Entity Recognition (NER) system that can automatically identify and classify entities within unstructured text into predefined categories such as person names, organizations, locations, dates, and other relevant entities.

🔴 Dataset : CoNLL-2003, OntoNotes, or ACE

🔴 Approach : Clean and preprocess the text data to handle issues such as tokenization, lowercasing, and normalization of names and dates. Feature extraction using tfidf, word embeddings (Word2vec, GloVe). Using deep learning approaches like Bi-directional LSTM (BiLSTM), LSTM-CRF, or transformer-based models and compare their performace


📍 Follow the Guidelines to Contribute in the Project :

  • You need to create a separate folder named as the Project Title.
  • Inside that folder, there will be four main components.
    • Images - To store the required images.
    • Dataset - To store the dataset or, information/source about the dataset.
    • Model - To store the machine learning model you've created using the dataset.
    • requirements.txt - This file will contain the required packages/libraries to run the project in other machines.
  • Inside the Model folder, the README.md file must be filled up properly, with proper visualizations and conclusions.

🔴🟡 Points to Note :

  • The issues will be assigned on a first come first serve basis, 1 Issue == 1 PR.
  • "Issue Title" and "PR Title should be the same. Include issue number along with it.
  • Follow Contributing Guidelines & Code of Conduct before start Contributing.

To be Mentioned while taking the issue :


Happy Contributing 🚀

All the best. Enjoy your open source journey ahead. 😎

Copy link

Thank you for creating this issue! We'll look into it as soon as possible. Your contributions are highly appreciated! 😊

@abhisheks008
Copy link
Owner

Assigning this issue to you @agupta451

@abhisheks008 abhisheks008 added Status: Up for Grabs Up for grabs issue. ieee-igdtuw IEEE IGDTUW Open Source Week 2024 and removed Status: Assigned Assigned issue. level 2 Level 2 for GSSOC hacktoberfest gssoc-ext labels Nov 10, 2024
@abhisheks008 abhisheks008 removed the ieee-igdtuw IEEE IGDTUW Open Source Week 2024 label Nov 19, 2024
@abhisheks008 abhisheks008 added the WoC 4.0 Winter of Code 4.0 by GDG IIITK label Jan 1, 2025
@SimranShaikh20
Copy link
Contributor

@abhisheks008 can you assign me this task as part of SWOC !

@abhisheks008
Copy link
Owner

Sorry @SimranShaikh20 this issue is designated for other event. You can check out other open issues.

@SimranShaikh20
Copy link
Contributor

@abhisheks008 okay np

@Anuj-k-45
Copy link

Hello,
I am excited to contribute to this issue as part of woc 4.0. For this project, I will build a Named Entity Recognition (NER) system that can automatically detect and classify various entities such as person names, organizations, locations, dates, and more from unstructured text. The project will start with data preprocessing tasks like tokenization, lowercasing, and normalization of names and dates, followed by feature extraction using techniques like TF-IDF and word embeddings such as Word2Vec and GloVe. I will implement and compare different deep learning models, including BiLSTM, LSTM-CRF, and transformer-based architectures like BERT or RoBERTa, evaluating their performance on standard NER benchmarks.
In addition to the primary tasks, I plan to integrate some advanced evaluation metrics like precision, recall, and F1-score to assess model performance in detail. I will also explore the use of transfer learning, where pre-trained models are fine-tuned on the NER task to improve accuracy, particularly for domain-specific data. This will enhance the robustness and adaptability of the NER system.
Following the provided project structure, I will create separate folders for the dataset, images, and model components. A requirements.txt file will be included to list all necessary dependencies, and the README in the model folder will provide clear documentation, visualizations, and conclusions.

I would appreciate it if you could assign this issue to me. I look forward to contributing to this project. Thank you!

Full Name : Anuj Kaushal
GitHub Profile Link : https://github.com/Anuj-k-45
Email ID : anujkaushal1068@gmail.com

@abhisheks008
Copy link
Owner

Hi @Anuj-k-45 thanks for showing interest. I think the proposal is not like the one you have given here in this comment section. I'll let you know once I get any concrete information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Up for Grabs Up for grabs issue. WoC 4.0 Winter of Code 4.0 by GDG IIITK
Projects
None yet
Development

No branches or pull requests

4 participants