In this project I will apply natural language processing to understand the sentiment in the latest article featuring Bitcoin & Ethereum. Also I will apply fundamental NLP techniques to better understand the other factors involved with the coin prices focusing on common words, phrases, organizations, and entities mentioned. Questions encompassing the "sentiment" is; what are articles saying about different cryptocurrencies? What is the current public sentiment surrounding these coins? NewsApi and NLTK (Natural Language Tool Kit) library utilized.
- Sentiment Analysis
- Natural Lanuguage Processing
- Name Entity Recognition
- After setting my NewsAPI Key, the first step was to fetch Bitcoin & Ethereum news articles
- Next was to create the Sentiment Scores DataFrame using a for-loop for Bitcoin
- The same process was performed for Etherum. After converting to a DataFrame using pd.DataFrame
- A ".describe()" function shows us important numbers related to the Sentiment Score
- Import NLTK (Natural Language ToolKit) nltk.tokenize, nltk.corpus, nltk.stem
- Import Lemmatizer
- Import word_tokenize, sent_tokenize
- Import WordNetLemmatizer, PorterStemmer
- Now we can take a look at unique word counts
- We can then use the imported Counter function to count the frequency of words in the articles
- The top 3 most frequently used words in the Bitcoin news articles were "Char" (95x), "Bitcoin" (49x) and "Reuters" (23x)
Word clouds are an intuitive way to visualize the frequency of different words in a news article to quickly see which words were most prominently used. Word clouds are also very easily generated with Python
NER generates visually-appealing text that makes it clear what words are important within the article, and to what "category" that word belongs to: is it an organization, a currency, a name, etc.
- Import Spacy
- Concatenate all Bitcoin/Ethereum text together using the ".join()" function.
- Then, run NER processor on text, and render visualization
bitcoin_doc = nlp(bitcoin_text) displacy.render(bitcoin_doc, style='ent')
I found three questions I could ask while running this function while finding the sentiment scores from Bitcoin and Ethereum articles. My first question was Which coin had the highest mean positive score? Next, looking over the outputted data, I could ask, Which of the coins had the highest compound score? With that tidbit of information, I can go on to find Which of the coins had the highest positive score?
In my research, the data shown that Bitcoin had the highest mean score of 0.0519600. When looking at the compounded scores, I noticed that Bitcoin with a high score of 0.8834000. Also, in the same data, I was able to find that Bitcoin had the highest positive score as well. Leading the way with a score of 0.2740000.