Skip to content

Python-based text cleaning of comments scraped from social media platforms for NLP-based brand sentiment analysis

Notifications You must be signed in to change notification settings

vflawson/Text-Cleaning-for-NLP-in-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Text-Cleaning-for-NLP-in-Python

Python-based text cleaning of comments scraped from social media platforms for NLP-based brand sentiment analysis

Gensim pre-processing and re packages

In order to perform NLP-based models and analysis (such as brand sentiment analysis in this case), the underlying text data needs to be properly cleaned to aid model performance and quality.

In particular, text data scraped from social media platforms such as Twitter, Facebook, and Reddit have their own challenges in terms of data cleaning.

This code filters by the English language (as the Gensim NLP model used was for English), removes numbers, punctuation, stopwords, and makes text lowercase.

It also tackles some social media-specific challenges, such as handle removal, emoji removal, truncating longer comments, and removing URLs.

About

Python-based text cleaning of comments scraped from social media platforms for NLP-based brand sentiment analysis

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published