SwiftKeyboard

This project was a part of the Capstone Project in the Data Science Specialization offered by Johns Hopkins University in Coursera. The dataset was carefully analyzed using the statistical properties and was applied to Natural Language Processing in order to build a predictive text application.

In this web based application user inputs text, as text is typed by the user, predictive model will recommend the next possible words(s) to be appended to the input stream. The text data was downloaded here and is porvided in four different languages. English corpra was used for this application and the model was trained using document corpus compiled from the following three sources of text data:

Blogs
Twitter
News

The large corpus of text documents were analyzed to discover the structure in the data and how words are put together to build a predictive model. N-Gram linguistic model was used to build a smart keyboard that predicts the next word based on input of the words.

Roadmap to the Model

Getting and cleaning the data:
- All the provided corpus was combined into one.
- 25% of the corpus was selected for training a model
Exploratory Data Analysis:
- Frequency of words and their pairs were calculated
Modeling:
- Quanteda package was used to tokenize the corpus
- 1 to 7-gram model was build for word prediction

Algorithm and Prediction

To improve efficiency, word pairs that appear less than 5 times in the corpus were removed
Katz's back-off model was used to predict the next word
The model iterates from 7-gram to 1-gram to find matches in the last (n-1) words
It starts from 7-gram, backs off to 6-gram if there is no prediction.
It continues till, it back-off to 1-gram.
When the user input is null, the most frequent word 'the' is returned when number of prediction by default is 1

The Shiny App

Here is a link to the application which provides all the necessary instructions.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Milestone Report		Milestone Report
ShinyWordPred		ShinyWordPred
Slide		Slide
Prediction.R		Prediction.R
README.html		README.html
README.md		README.md
Shiny.png		Shiny.png
data cleaning.R		data cleaning.R
doParallel.R		doParallel.R
nextWordsApp.R		nextWordsApp.R
sample.R		sample.R
swearWords.txt		swearWords.txt
tokenization.R		tokenization.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SwiftKeyboard

Roadmap to the Model

Algorithm and Prediction

The Shiny App

About

Releases

Packages

Languages

shovitraj/SwiftKeyboard

Folders and files

Latest commit

History

Repository files navigation

SwiftKeyboard

Roadmap to the Model

Algorithm and Prediction

The Shiny App

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages