Skip to content

Language Detection is a web based Deep Neural Network (DNN) language identification model, utilizing Python and TensorFlow. It uses character n-grams to classify text input languages efficiently. Insights into the network's language differentiation process are provided through t-SNE and PCA visualizations.

Notifications You must be signed in to change notification settings

ritessshhh/LanguageDetection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Language Detection

Project Overview

Language detection is a language identification model built on Deep Neural Networks (DNNs) using Python and TensorFlow. It employs character n-grams to classify the language of any given text input. The system is capable of ingesting user inputs and efficiently outputting the identified language.

This project includes the visualization of learned features using techniques such as t-SNE and PCA to give insights into how the network differentiates between languages. The model has demonstrated a final accuracy of 89.2%, illustrating its effectiveness in language identification.

Dependencies

Python (>=3.7)
TensorFlow (>=2.4.0)
Scikit-learn (>=0.24.0)
Numpy (>=1.19.5)
Matplotlib (>=3.3.2)

Installation

Clone the repo

git clone https://github.com/ritessshhh/languageDetection

Example

1. When you load the page, it is visible to you like this

Screenshot 2023-07-08 at 10 24 53 PM

2. When you click on the languages, it gives you example of the languages you can use to recognize.

Screenshot 2023-07-08 at 10 26 04 PM

3. In this example, I have put one Spanish sentence in the Langauge box.

Screenshot 2023-07-08 at 10 27 57 PM

4. When 'Submit' button is clicked, it shows the identified langauge.

Screenshot 2023-07-08 at 10 28 56 PM

Model

The model is a Deep Neural Network (DNN) implemented using Python and TensorFlow. It utilizes character n-grams to create distinct feature sets for different languages, and these features are used to classify the language of a given text input.

To gain insights into the feature representation, we visualized the learned features of the model using techniques such as t-SNE and PCA.

Performance

The LinguaNet model achieved a final accuracy of 89.2%, highlighting its effectiveness in identifying and differentiating languages based on text input.

Future Work

We plan to refine the model further, aiming to increase its accuracy and expand its scope to include more languages. Contributions and feedback are welcome.

License

Copyright [2023] [Ritesh Chavan]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

About

Language Detection is a web based Deep Neural Network (DNN) language identification model, utilizing Python and TensorFlow. It uses character n-grams to classify text input languages efficiently. Insights into the network's language differentiation process are provided through t-SNE and PCA visualizations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published