Skip to content

Latest commit

 

History

History
79 lines (54 loc) · 3.31 KB

README.md

File metadata and controls

79 lines (54 loc) · 3.31 KB

Lluminating-Language

Official Repository for the Summer Project Lluminating Language offered by BCS

Project Objective : To learn about the wonderful world of Natural language Processing(NLP) from ground up and finally building our own custom open-source RAG infused Chatbot (We may see finetuning as well).

Week 0

To start with, just to jog your memory, here are some resources to brush up on your basics.

  1. Building a Neural Network from Scratch using only Numpy

  2. Neural Networks from Ground Up

  3. Git :

    1. https://rogerdudler.github.io/git-guide/
    2. https://github.com/firstcontributions/first-contributions
  4. MarkDown

  5. Latex:

    1. https://www.overleaf.com/learn/latex/Learn_LaTeX_in_30_minutes
    2. https://latex-tutorial.com/
  6. Basic Python:

    1. https://dabeaz-course.github.io/practical-python/
    2. https://automatetheboringstuff.com/

NOTE : DO NOT try to finish the entire thing, or even try to become perfect with every single concept. Get comfortable with things like printing, conditionals, loops, functions and importing libraries and you should be good to go :)

Week 1: Basics of NLP

The first week will be dedicated to understanding the basics of NLP and the tools that we will be using throughout the project.

Task : To build a sentiment analysis model using the IMDB dataset and trying and testing different models and techniques.

Resources:

  1. EDA :

    1. https://www.geeksforgeeks.org/what-is-exploratory-data-analysis/
    2. https://www.youtube.com/watch?v=-o3AxdVcUtQ
  2. Pre-Processing :

    1. Tokenization
    2. Stemming and Lemmatization:
      1. https://www.ibm.com/topics/stemming-lemmatization
      2. https://www.youtube.com/watch?v=HHAilAC3cXw
  3. Feature Extraction :

    1. https://www.geeksforgeeks.org/ml-one-hot-encoding/
    2. https://neptune.ai/blog/vectorization-techniques-in-nlp-guide
  4. Model Selection :

    1. https://towardsdatascience.com/top-machine-learning-algorithms-for-classification-2197870ff501
  5. Evaluation:

    1. https://www.analyticsvidhya.com/blog/2021/07/metrics-to-evaluate-your-classification-model-to-take-the-right-decisions/

Week 2: Transformer Application

We'll learn about the basics of transformer architechture, start using huggingface hub, and perform the same task as last week but this time using transformers and notice the effects.

Resouces :

  1. Attention Mechanism
  2. GPT
  3. BLOG
  4. Fine Tuning Transformer Model

Week 3: Data Scraping and Data Collection

In the third week of the project, we'll learn about how to collect data from various sources over the internet, how to process it and make it ready for our final RAG model.

  1. Data Sources
  2. XPaths

Mentors:

  1. Udbhav Agarwal
  2. Arin Dhariwal
  3. Himanshu Shekhar
  4. Shreya Gupta