Skip to content

Latest commit

 

History

History
96 lines (70 loc) · 2.74 KB

File metadata and controls

96 lines (70 loc) · 2.74 KB

Build status

LBSA - Lexicon-based Sentiment Analysis

Fast library for sentiment analysis, opinion mining and language detection.

Installation

Install dependencies:

$ sudo pip3 install requirements.txt

From the parent folder, install the library by typing the following command:

$ sudo python3 setup.py install

To access the NRC lexicon, download it from: http://www.saifmohammad.com/WebDocs/Lexicons/NRC-Emotion-Lexicon.zip

Extract it, and provide the path to the excel file the first time you use the NRC lexicon. For example:

>>> path = 'path/to/NRC-Emotion-Lexicon-v0.92-In105Languages-Nov2017Translations.xlsx'
>>> sa_lexicon = lbsa.get_lexicon('sa', language='english', source='nrc', path=path)

Dependencies

  • numpy >= 1.13.3
  • pandas >= 0.21.0
  • xlrd

Features

Sentiment analysis

>>> import lbsa
>>> tweet = """
... The Budget Agreement today is so important for our great Military.
... It ends the dangerous sequester and gives Secretary Mattis what he needs to keep America Great.
... Republicans and Democrats must support our troops and support this Bill!
... """
>>> sa_lexicon = lbsa.get_lexicon('sa', language='english', source='nrc')
>>> sa_lexicon.process(tweet)
{'anger': 0, 'anticipation': 0, 'disgust': 0, 'fear': 2, 'joy': 0, 'sadness': 0, 
'surprise': 0, 'trust': 3}

Opinion mining

>>> op_lexicon = lbsa.get_lexicon('opinion', language='english', source='nrc')
>>> op_lexicon.process(tweet)
{'positive': 2, 'negative': 1}

Language detection

Language detection requires the NRC lexicon:

>>> import lbsa
>>> tweet = """
... A la suite de la tempête #Eunice et à la demande du Président de la République,
... lEtat décrétera dans les meilleurs délais létat de catastrophe naturelle partout
... où cela savérera nécessaire.
... """
>>> lexicon = lbsa.get_lexicon('sa', language='auto', source='nrc')
>>> print(lexicon.process(tweet))
{'anger': 2, 'anticipation': 1, 'disgust': 1, 'fear': 2, 'joy': 0, 'sadness': 2, 'surprise': 2,
'trust': 0, 'lang': 'french'}

Feature extractor

>>> extractor = lbsa.FeatureExtractor(sa_lexicon, op_lexicon)
>>> extractor.process(tweet)
array([0., 0., 0., 2., 0., 0., 0., 3., 2., 1.])

Example

Feature extractor:

feature_extraction.py

alt text

Perform sentiment analysis over time on "Thus spoke Zarathustra":

book.py