Biographical data of national, state and some local elections candidates from archive.india.gov.in and myneta.info along with scripts for retrieving the data. The data from the 15th Lok Sabha and members in Rajya Sabha as of June, 2014 was used to produce this small note: (No) Missing daughters of Indian Politicians. While data on all political candidates in national, state and some local elections from myNeta was used to analyze spousal income, movable and immovable assets by politician gender. (Analysis.)
Data on Indian MPs serving the Lok Sabha and the Rajya Sabha.
To get the data, download the scripts in the get_data/archive_india_gov folder to your computer. The scripts require Python 3.x
and BeautifulSoup 4
to run. The package dependency is listed in get_data/archive_india_gov/requirements.txt. Once you have installed the dependencies, you can run the scripts.
-
To download web pages containing the information, run scrape_indian_gov.py:
python scrape_indian_gov.py
The HTML files will be saved in
./rajyasabha
and./loksabha
-
To parse and extract information from the HTML files, run extract_indian_gov.py
python extract_indian_gov.py <dir>
The script outputs a CSV file, saving it as
dir-out.csv
The data were scraped in June, 2014 and November, 2015.
- 15th Lok Sabha (Scraped June, 2014)
- 16th Lok Sabha (Scraped November, 2015)
- Rajya Sabha 2014 (Scraped June, 2014)
- Rajya Sabha 2015 (Scraped November, 2015)
Note: In 2015, the list of Rajya Sabha members on the site appears to differ slightly from the list posted on http://rajyasabha.nic.in/.
Select biographical and electoral data of national, state and some local elections candidates from myneta.info. The data were scraped in November, 2015.
There are three scripts. Why three? Information about gender is not provided on candidate pages and is integrated later. The three scripts are:
- india_mps.py to download basic profile data.
- india_mps_women.py to get information on gender.
- india_mps_gender.py to merge gender information into all three CSVs.
To begin using the scripts, install the requirements. Then download the scripts into a folder, and run scripts from the command line.
usage: india_mps.py [-h] [-o OUTPUT] [-n MAX_CONN] [-s FROM_STATE]
[-y FROM_YEAR] [-c FROM_CONSTITUENCY] [-t TYPE]
[--no-header]
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Output CSV file name
-n MAX_CONN, --max-conn MAX_CONN
Max concurrent connections
-s FROM_STATE, --from-state FROM_STATE
Start from a specific state
-y FROM_YEAR, --from-year FROM_YEAR
Start from a specific election year
-c FROM_CONSTITUENCY, --from-constituency FROM_CONSTITUENCY
Start from a specific constituency
-t TYPE, --type TYPE Type (all|state|nation|local)
--no-header Output without header at the first row
python india_mps.py -o india-mps-all.csv
python india_mps_women.py
URL of all women candidates saved as: output-women.csv
To merge all candidates with gender, run:
python india_mps_gender.py
Meta Data
- Each row = politician per constituency per election year.
- Columns
- Politician Name, Constituency, State, Party, Election Year, Whether They Won or Not, Type: State/National/Local
- Education, Age, Address, Self Profession, Spouse Profession
- Income Tax Return: Self Total Income, Spouse Total Income
- Self Movable Assests, Spouse Movable Assets:
- cash--- for self and spouse
- jewellery --- for self and spouse
- totals --- for self and spouse
- Immovable Assets --- Self Totals, Spouse Totals
- Liabilities --- Self Totals, Spouse Totals
Notes
There are missing data for election years before 2011:
- Income Tax Return so no Self/Spouse Total Income
- No column for Spouse in the Liabilities
- In a few elections, multiple candidates with the same name are fighting to get elected from the same constituency. For instance, check here, here, here, here, here, and here.
Scripts, figures, and writing are released under CC BY 2.0.