vbainwala / Benchmarking-LLMs-Indic-Languages Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Benchmarking Study of Bloomz-560m, mBART-large, IndicBART on the Indic Languages

0 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Dataset		Dataset
Machine Translation		Machine Translation
Summarization		Summarization
LICENSE		LICENSE
README.md		README.md

Repository files navigation

Benchmarking LLMs on Indic Languages

Performed as part of COMS6998 Course@Columbia University

Dataset

Samanantar : https://huggingface.co/datasets/ai4bharat/samanantar
IndicSentenceSummarization : https://huggingface.co/datasets/ai4bharat/IndicSentenceSummarization

Models Used

Bloomz-560m : https://huggingface.co/bigscience/bloomz-560m
mBART-large : https://huggingface.co/facebook/mbart-large-50-many-to-many-mmt
IndicBART-XXEN : https://huggingface.co/ai4bharat/IndicBART-XXEN

Implementation

For each model the existing code base available at https://huggingface.co/ is recfactored to perform the following tasks:

Machine Translation
Summarization

Each file obtained as the output is further processed to find the mean values for the following metrics

METEOR
BLEU
ROUGE
BERTScore

About

Benchmarking Study of Bloomz-560m, mBART-large, IndicBART on the Indic Languages

nlp large-language-models mbart50 indicnlp

Report repository

Languages

Python 100.0%