Skip to content

Official Repository for the Summer Project Lluminating Language offered by BCS

Notifications You must be signed in to change notification settings

udbhav-44/BCS-Lluminating-Language

Repository files navigation

Lluminating-Language

Official Repository for the Summer Project Lluminating Language offered by BCS

Project Objective : To learn about the wonderful world of Natural language Processing(NLP) from ground up and finally building our own custom open-source RAG infused Chatbot (We may see finetuning as well).

Week 0

To start with, just to jog your memory, here are some resources to brush up on your basics.

  1. Building a Neural Network from Scratch using only Numpy

  2. Neural Networks from Ground Up

  3. Git :

    1. https://rogerdudler.github.io/git-guide/
    2. https://github.com/firstcontributions/first-contributions
  4. MarkDown

  5. Latex:

    1. https://www.overleaf.com/learn/latex/Learn_LaTeX_in_30_minutes
    2. https://latex-tutorial.com/
  6. Basic Python:

    1. https://dabeaz-course.github.io/practical-python/
    2. https://automatetheboringstuff.com/

NOTE : DO NOT try to finish the entire thing, or even try to become perfect with every single concept. Get comfortable with things like printing, conditionals, loops, functions and importing libraries and you should be good to go :)

Week 1: Basics of NLP

The first week will be dedicated to understanding the basics of NLP and the tools that we will be using throughout the project.

Task : To build a sentiment analysis model using the IMDB dataset and trying and testing different models and techniques.

Resources:

  1. EDA :

    1. https://www.geeksforgeeks.org/what-is-exploratory-data-analysis/
    2. https://www.youtube.com/watch?v=-o3AxdVcUtQ
  2. Pre-Processing :

    1. Tokenization
    2. Stemming and Lemmatization:
      1. https://www.ibm.com/topics/stemming-lemmatization
      2. https://www.youtube.com/watch?v=HHAilAC3cXw
  3. Feature Extraction :

    1. https://www.geeksforgeeks.org/ml-one-hot-encoding/
    2. https://neptune.ai/blog/vectorization-techniques-in-nlp-guide
  4. Model Selection :

    1. https://towardsdatascience.com/top-machine-learning-algorithms-for-classification-2197870ff501
  5. Evaluation:

    1. https://www.analyticsvidhya.com/blog/2021/07/metrics-to-evaluate-your-classification-model-to-take-the-right-decisions/

Week 2: Transformer Application

We'll learn about the basics of transformer architechture, start using huggingface hub, and perform the same task as last week but this time using transformers and notice the effects.

Resouces :

  1. Attention Mechanism
  2. GPT
  3. BLOG
  4. Fine Tuning Transformer Model

Week 3: Data Scraping and Data Collection

In the third week of the project, we'll learn about how to collect data from various sources over the internet, how to process it and make it ready for our final RAG model.

  1. Data Sources
  2. XPaths

Mentors:

  1. Udbhav Agarwal
  2. Arin Dhariwal
  3. Himanshu Shekhar
  4. Shreya Gupta

About

Official Repository for the Summer Project Lluminating Language offered by BCS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •