Skip to content

Bhasha: Deep Learning Web App for Multilingual Text Detection. Detects 10 Indic languages and English from text. Trained on Azure ML Studio, deployed on Heroku using Docker. Achieves over 80% accuracy. Utilizes TensorFlow, Keras, Flask, and Docker for seamless deployment.

License

Notifications You must be signed in to change notification settings

sonalgan/bhasha

Repository files navigation

Logo

Bhasha Web App: Indic Languages Detection from Text

Bhasha Web App is a deep learning-based web application designed to detect multiple Indian languages from a given text. The model achieves an accuracy rate of over 80% in predicting the language of the provided input text. The training and testing data for the model were sourced from the MultiIndicMT dataset, which encompasses 10 major Indic languages: Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu, along with English.

Features and Development

  • Data Encoding: The web app employs encoding techniques to map the diverse multilingual characters into a more standardized encoding format known as ISCII (Indian Script Code for Information Interchange).

  • Model Development: The initial model's accuracy and loss were analyzed and fine-tuned using various regularization techniques. The model's performance was improved through rigorous testing and refinement.

  • Hyperparameter Tuning: The final model's performance was further optimized using Azure Hyperdrive, which allowed for fine-tuning the model's hyperparameters to achieve the best results.

Usage

  1. Web Interface: Users can input text in mixed Indic languages or even an unknown combination of languages. The web app will display the percentage distribution of Indic languages present in the input text, allowing users to identify the dominant language components.

  2. Dockerized Deployment: The web app is deployed using Docker containerization technology. This ensures a consistent and reliable deployment process that can be easily replicated across different environments.

Deployment

The Bhasha Web App is hosted on the Heroku platform using Dockerized containers. This deployment approach offers scalability, flexibility, and ease of management.

Figures

Fig. 1: Encoding of diverse multilingual characters to ISCII encoding format for uniform representation.

ISCII Encoding

Fig. 2: Initial model accuracy and loss, followed by the implementation of various regularization techniques.

Model Accuracy and Loss

Fig. 3 and 4: Demonstration of the web app's functionality: inputting unknown mixed Indic language text and receiving the percentage distribution of languages.

Input Output

Contributors

License

This project is licensed under the MIT License.

Training scripts for the deep learning model can be found in the DeepLearning repository.

Feel free to contribute and collaborate to enhance the accuracy and language coverage of the Bhasha Web App!

About

Bhasha: Deep Learning Web App for Multilingual Text Detection. Detects 10 Indic languages and English from text. Trained on Azure ML Studio, deployed on Heroku using Docker. Achieves over 80% accuracy. Utilizes TensorFlow, Keras, Flask, and Docker for seamless deployment.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published