Official Repository for the Summer Project Lluminating Language offered by BCS
Project Objective : To learn about the wonderful world of Natural language Processing(NLP) from ground up and finally building our own custom open-source RAG infused Chatbot (We may see finetuning as well).
To start with, just to jog your memory, here are some resources to brush up on your basics.
-
Git :
-
Latex:
-
Basic Python:
NOTE : DO NOT try to finish the entire thing, or even try to become perfect with every single concept. Get comfortable with things like printing, conditionals, loops, functions and importing libraries and you should be good to go :)
The first week will be dedicated to understanding the basics of NLP and the tools that we will be using throughout the project.
Task : To build a sentiment analysis model using the IMDB dataset and trying and testing different models and techniques.
Resources:
-
EDA :
-
Pre-Processing :
- Tokenization
- Stemming and Lemmatization:
-
Feature Extraction :
-
Model Selection :
-
Evaluation:
We'll learn about the basics of transformer architechture, start using huggingface hub, and perform the same task as last week but this time using transformers and notice the effects.
Resouces :
In the third week of the project, we'll learn about how to collect data from various sources over the internet, how to process it and make it ready for our final RAG model.
- Udbhav Agarwal
- Arin Dhariwal
- Himanshu Shekhar
- Shreya Gupta