Skip to content

Latest commit

 

History

History
76 lines (49 loc) · 5.18 KB

README.md

File metadata and controls

76 lines (49 loc) · 5.18 KB

IndicTrans and IndicTrans2 Repository

This repository contains two distinct systems for handling language processing tasks related to Indian languages: IndicTrans for transliteration and IndicTrans2 for text-to-text translation.

Folder Structure

  • IndicTrans: Contains the transliteration tool that converts text between different scripts, preserving phonetic accuracy.
  • IndicTrans2: Contains the text-to-text translation system, optimized for translation between Indian languages.

IndicTrans (Transliterator)

IndicTrans is designed to transliterate text between various Indian language scripts and the Roman script. It ensures phonetic correctness and is useful for tasks such as cross-script search and name transliteration.

Features:

  • Transliteration between Indian scripts such as Devanagari, Tamil, Telugu, and more.
  • Lightweight and efficient for quick script conversions.

For more information on usage, refer to the IndicTrans README.


IndicTrans2 (Text-to-Text Translation)

IndicTrans2 is an advanced text translation system built for translating text between multiple Indian languages. It uses state-of-the-art machine translation models to maintain context, meaning, and fluency.

Features:

  • Translates between Indian languages such as Hindi, Tamil, Telugu, Kannada, and others.
  • Optimized for accuracy, handling complex grammar and sentence structures.

For more details on usage and implementation, refer to the IndicTrans2 README.


Conclusion

The INDIC-TRANS model effectively captures the clarity and emotional undertones of the original dialogue, retaining humour and contextual understanding. However, there are areas for improvement, particularly in enhancing idiomatic expressions and cultural relevance for Tamil speakers. Overall, the translation demonstrates strong fidelity to meaning, grammatical correctness, and natural fluency, making it relatable for the target audience.

IndicTrans2 - Performance Analysis [October 2024]

Criterion Description
Fidelity to Meaning Original Dialogue: Explores favourite colours, masculinity, and personal experiences.
Translation Accuracy: Captures key phrases like “yellow is my favourite colour” and “I love violence” without loss of meaning.
Contextual Relevance Cultural Insights: Addresses societal expectations of masculinity and femininity.
Nuanced Translation: Effectively conveys playful commentary and humour sensitive to Tamil cultural contexts.
Grammatical Correctness English Standards: Adheres to proper grammatical conventions.
Tamil Proficiency: Maintains grammatical integrity with correct sentence structure and punctuation.
Naturalness and Fluency Authenticity: Utilizes colloquial expressions to enhance relatability for Tamil audiences.
Smooth Flow: Ensures seamless conversational tone in both languages.

Positive Performance

Aspect Description
Clarity of Meaning Effectively captures the primary ideas and emotional undertones of the original dialogue.
Humour Retention Maintains an engaging tone throughout, preserving the humour well.
Contextual Understanding The narrative flows smoothly, enabling the audience to follow along without confusion.

Areas for Improvement

Area Description
Idiomatic Expressions Some Tamil phrases could be more idiomatic, resulting in occasional awkward phrasing.
Nuance Loss Certain expressions lack subtle cultural and emotional depth in translation.
Cultural Relevance Some references may not resonate effectively with Tamil speakers without additional context.


Links and References