TSE (Tribunal Superior Eleitoral) is a official source of data, at http://www.tse.jus.br/eleicoes/estatisticas/repositorio-de-dados-eleitorais (old http://agencia.tse.jus.br/estatistica/sead/odsele/consulta_cand/ )
The scripts will transfer all zip files to /tmp/tse_transfer
folder and the selected data to the PostgreSQL database.
Use core refresh (the python core-refresh-csv.py tse
command) to update the PubPerson's data folder with the new TSE database.
If pubpeson core was started, make all TSE preparation with sh tse-mk00.sh
. To run in background, prefer copy/paste each line of the shell script.
Tips:
- for use PostgreSQL's Python extension in UBUNTU, use
apt install postgresql-contrib postgresql-plpython
. - The data source have problems with header, see LEIAME.pdf and complete the control spreadsheet
- ... Data of TSE is a bit dirty,
- there are unrevised names like "0DÁRICO", "0SEAS", and "ABRRAÃO".
- there are a lot of invalid birth dates and invalid CPF, losting records.
- ... and the dataset description is a mess: not use CSV header, but each file needs different interpretation.
There are 3.5 million (3520846) data items, but many are repeated (each election year). The newer data are better, so, when repeat we can select only the last version.
- all (100%) have name (full name) valid property.
- 59% (2081277 items) have birthDate valid property.
- xx% (xxx items) have vatID (Brazilian CPF) valid property.
- There are 1492419? distinct names.
- There are 310 source files
consulta_cand*.txt
, and the its fields are not uniform. - Supposing uniformity, the relevant fields are at positions 3,6,11,14 and 27 (from 1), and secondary positions 15,28 and 31. Supposing corresponds respectivally to fields (of the LEIAME documentation)
ANO_ELEICAO
,SIGLA_UF
,NOME_CANDIDATO
,CPF_CANDIDATO
,DATA_NASCIMENTO
,NOME_URNA_CANDIDATO
,NUM_TITULO_ELEITORAL_CANDIDATO
,SEXO
.
Other results:
- There are some little of names with problems, as "0DÁRICO", "0SEAS", and "ABRRAÃO", as showed by
pubperson.kx_firstname
table. - ... Unique name-birthDate entries
- ... unique valid vatIDs
Under construction. Not seems good data, no birthDate
and no CPF.
Official source at http://www.tse.jus.br/partidos/filiacao-partidaria/relacao-de-filiados