The program crawls moneycontrol and economictimes to fetch data of companies listed in the input.xlsx file. Once the data is scraped and a csv file is built, relative analysis is done using analyse.py to classify the companies into RED, AMBER and GREEN. Absolute scoring is done using the score.py script.
The master branch contains the source code for the application.
dist-windows branch contains the exe files and support files independant of the python installation.
- Click Clone or Download and then Download ZIP button to download the application.
- Extract all files to a folder and run the applications as specified below.
- Make changes to input.xlsx (Configuration settings given below)
- Run News Scraper.exe
- Output is stored in Scrape Output.csv
- Run Score Generator.exe and Scrape Analyser.exe for analysing scraped content.
- Output is stored in Score Output.xlsx and Analyse Output.xlsx respectively
Application | input.xlsx columns | Output File |
---|---|---|
News Scraper.exe | A to F | Scrape Output.csv |
Score Generator.exe | H to L | Score Output.xlsx |
Scrape Analyser.exe | I to L | Analyse Output.xlsx |
Parameter | Ideal Value | Description |
---|---|---|
COMPANYNAME | String, exact match | Enter company names as close to the company name listed on the stocks page of Economic Times and Money Control. |
THRESHOLD | 0.75 to 0.85 | This is the percentage match between input company names and website listed company names. |
DATEFROM | DD-MM-YYYY | Data from this period will be scraped |
DATETO | DD-MM-YYYY | Data till this period will be scraped |
WEBSITE | Select at least one site | Moneycontrol or Economictimes |
ROTATING_PROXIES | 0 to 100 0 = No proxy |
Used to bypass bans while scraping. Increasing proxy number might lead to slower data Scraping. |
Parameter | Ideal Value | Description |
---|---|---|
POSITIVE_KEYWORDS | String | Positive Keywords list |
NEGATIVE_KEYWORDS | String | Negative Keywords list |
SEARCH OPTION | Article Title or Article Content | Select which content you want the program to analyse |
N_STDDEV_RED | Float (0.5, 1, 1.5, 2) | If keyword count greater than mean + (N_STDDEV_RED * stddev) then the company is labelled RED |
N_STDDEV_AMBER | Float (0.5, 1, 1.5, 2) | If keyword count greater than mean + (N_STDDEV_AMBER * stddev) then the company is labelled AMBER |
Licensed under the MIT License