Skip to content

Benchmarking Study of Bloomz-560m, mBART-large, IndicBART on the Indic Languages

License

Notifications You must be signed in to change notification settings

vbainwala/Benchmarking-LLMs-Indic-Languages

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Benchmarking LLMs on Indic Languages

Performed as part of COMS6998 Course@Columbia University

Dataset

Samanantar : https://huggingface.co/datasets/ai4bharat/samanantar
IndicSentenceSummarization : https://huggingface.co/datasets/ai4bharat/IndicSentenceSummarization

Models Used

Bloomz-560m : https://huggingface.co/bigscience/bloomz-560m
mBART-large : https://huggingface.co/facebook/mbart-large-50-many-to-many-mmt
IndicBART-XXEN : https://huggingface.co/ai4bharat/IndicBART-XXEN

Implementation

For each model the existing code base available at https://huggingface.co/ is recfactored to perform the following tasks:

  • Machine Translation
  • Summarization

Each file obtained as the output is further processed to find the mean values for the following metrics

  • METEOR
  • BLEU
  • ROUGE
  • BERTScore

About

Benchmarking Study of Bloomz-560m, mBART-large, IndicBART on the Indic Languages

Topics

Resources

License

Stars

Watchers

Forks

Languages