An analysis of voting patterns in São Paulo's 2024 elections, focusing on voter behavior, absenteeism, and geographic trends.
2-election-.english_compressed.1.1.mp4
This project provides an in-depth analysis of voting patterns in the 2024 São Paulo municipal elections, with a focus on the first and second rounds of mayoral and city council races. It examines key aspects such as voter behavior, shifts between rounds, and regional variations in voter turnout.
The dataset was manually compiled from official sources , includes over 15,000 entries. To gather relevant data, the project employed web scraping techniques, followed by data cleaning and exploratory data analysis (EDA). These methods uncover valuable insights into electoral trends and provide strategic guidance for understanding the political dynamics of São Paulo, which can inform future election strategies
This work was developed as part of the Integrated Project and Storytelling course in the second semester of the undergraduate program in Data Science and Artificial Intelligence at PUC-SP in 2024, under the mentorship of the renowned Professor ✨ Rooney Ribeiro Albuquerque Coelho
His expertise and unwavering dedication to teaching played a crucial role in deepening our understanding of both data science and the art of storytelling.
To access the full Map, click the Map below:
Access the dataset and explore the interactive dashboard via the Power BI link below, where you can use dynamic filters for detailed insights and visualizations.
- Introduction
- Study Objectives
- Theoretical Background
- Dataset Description
- Methodology
- Exploratory Data Analysis
- Charts and Dashboards
- 7.1. Vote Distribution by Municipality
- 7.2. Most Voted Mayoral Candidates
- 7.3. Most Voted Councilor Candidates
- 7.4. Most Voted Mayors by Electoral Zone
- 7.5. Most Voted Councilors by Electoral Zone
- 7.6. Most Voted Mayors by Municipality
- 7.7. Most Voted Councilors by Municipality
- 7.8. Vote Distribution by Political Party
- Interactive Dashboards
- Conclusion
- Extra Material
- References
- How to Run the Project
- Contributing
- Our Team
This report presents a detailed analysis of the data from São Paulo's 2024 municipal elections, focusing on vote distribution, voter behavior, and the performance of mayoral and councilor candidates. Various visualizations and dashboards are used to explore voting patterns, emerging trends, and the factors influencing electoral outcomes.
The study aims to understand electoral dynamics in São Paulo's urban and peripheral areas, identifying factors determining voter preferences, such as the most-voted parties, candidate profiles, and voting behavior.
Analyzing electoral data is crucial for understanding voter behavior, party preferences, and political trends across different regions. Data visualization offers a clear and efficient way to identify patterns that can inform future campaigns.
The data used in this study were extracted from public sources, providing information on votes by municipality, electoral zone, and political party. The dataset includes details about mayoral and councilor candidates in São Paulo, including the number of votes received by each candidate.
👉🏻 Access Here All Processed Files
The following CSV files were processed:
address_Mayor.csv
Mayor_by_city.csv
Mayor_by_city_round_2.csv
Mayor.csv
address_Councilor.csv
Councilor_by_city.csv
councilor.csv
Here is an overview of the main columns in the processed CSV files:
NM_MUNICIPIO
: Municipality nameNR_ZONA
: Electoral zone numberDS_CARGO_PERGUNTA
: Election role (Mayor or Councilor)NM_VOTAVEL
: Candidate nameSG_PARTIDO
: Party acronymQT_VOTOS
: Number of votes received
The methodology was divided into several steps:
- Data Preprocessing: Reading and concatenating datasets, cleaning invalid records.
- Exploratory Data Analysis (EDA): Identifying voting patterns and trends using graphs and tables.
- Data Visualization: Creating interactive charts with the Plotly library for dynamic result exploration.
The exploratory analysis uncovered several interesting trends, such as:
- The dominance of votes for parties like MDB and PSOL.
- A geographic vote distribution showing high concentration in central São Paulo and greater support for progressive parties in peripheral areas.
The votes distribution revealed a large concentration in São Paulo and neighboring urban areas. The analysis indicated the need for specific strategies for peripheral areas.
import plotly.express as px
import pandas as pd
# Reading the dataset
election = pd.read_csv('/path/to/your/data.csv', encoding='latin-1')
# Plotting vote distribution by municipality
fig = px.histogram(election, x="NM_MUNICIPIO", y="QT_VOTOS",
title="Votes by Municipality",
color_discrete_sequence=["#1f77b4"])
fig.update_layout(bargap=0.2)
fig.show()
Ricardo Nunes (MDB) stood out in central zones, while Guilherme Boulos (PSOL) had strong support in the peripheries.
# Filtering mayoral candidates
mayor = election[(election["DS_CARGO_PERGUNTA"] == "Prefeito") &
(election["NM_MUNICIPIO"] == "SÃO PAULO") &
(election["SG_PARTIDO"] != "#NULO#")].copy()
# Grouping and ordering candidates by votes
mayor = mayor.groupby(['NM_VOTAVEL', 'SG_PARTIDO']).sum().sort_values("QT_VOTOS", ascending=False)["QT_VOTOS"].reset_index()
# Calculating vote percentages
total_votes = mayor["QT_VOTOS"].sum()
mayor["PERCENTAGE"] = mayor["QT_VOTOS"] / total_votes
# Bar chart
fig = px.bar(mayor, x="NM_VOTAVEL", y="QT_VOTOS", color="SG_PARTIDO",
title="Most Voted Mayoral Candidates",
color_discrete_sequence=px.colors.qualitative.Dark24)
fig.show()
7.3. Most Voted Councilor Candidates
Vote distribution showed a concentration among local candidates, with highlights for Tabata Amaral (PSB) and Renato Sorriso (PL) in peripheral zones.
# Filtering councilor candidates
councilor = election[(election["DS_CARGO_PERGUNTA"] == "Vereador") &
(election["NM_MUNICIPIO"] == "SÃO PAULO") &
(election["SG_PARTIDO"] != "#NULO#")].copy()
# Grouping and ordering candidates by votes
councilor = councilor.groupby(['NM_VOTAVEL', 'SG_PARTIDO']).sum().sort_values('QT_VOTOS', ascending=False)["QT_VOTOS"].reset_index()
# Calculating vote percentages
total_votes = councilor["QT_VOTOS"].sum()
councilor["PERCENTAGE"] = councilor["QT_VOTOS"] / total_votes
# Bar chart
fig = px.bar(councilor, x="NM_VOTAVEL", y="QT_VOTOS", color="SG_PARTIDO",
title="Most Voted Councilor Candidates",
color_discrete_sequence=px.colors.qualitative.Dark24)
fig.show()
Central zones favored Ricardo Nunes, while peripheral zones were dominated by Guilherme Boulos.
# Data of zones and neighborhoods
areas = pd.DataFrame({
"ZONE": [1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 6, 246, 246, 247, 247, 248, 248, 249, 250, 250, 250, 251, 251, 252],
"NEIGHBORHOOD": ["BELA VISTA", "CONSOLACAO", "LIBERDADE", "REPUBLICA", "SE", "BARRA FUNDA", "PERDIZES", "SANTA CECILIA", "BOM RETIRO", "BRAS", "PARI", "AGUA RASA", "BELEM", "MOOCA", "JD PAULISTA"]
})
# Merging with mayor data
merged = mayor.merge(areas, left_on="NR_ZONE", right_on="ZONE")
# Bar chart
fig = px.bar(merged, x="NEIGHBORHOOD", y="QT_VOTES", color="SG_PARTY", title="Most Voted Mayor by Zone")
fig.show()
The analysis revealed candidates like Márcio Chagas (PSOL) and Luana Almeida (PL) performing well in suburban areas.
# Analyzing most voted councilors by electoral zone
areas = pd.DataFrame({
"ZONE": [1, 1, 1, 2, 2, 3, 3, 4, 5, 6],
"NEIGHBORHOOD": ["BELA VISTA", "CONSOLACAO", "LIBERDADE", "MOOCA", "CAMPO BELO", "ITAQUERA", "CID DUTRA", "PIRITUBA", "VILA PRUDENTE", "TATUAPE"]
})
# Merging councilor data
councilor_merged = councilor.merge(areas, left_on="NR_ZONE", right_on="ZONE")
# Bar chart
fig = px.bar(councilor_merged, x="NEIGHBORHOOD", y="QT_VOTES", color="SG_PARTY", title="Most Voted Councilor by Zone")
fig.show()
The municipality-level analysis confirmed Ricardo Nunes' dominance in urban areas and Boulos’ strength in peripheral zones.
# Grouping mayors by municipality
municipality = mayor.groupby("NM_MUNICIPIO").sum().sort_values("QT_VOTES", ascending=False)
# Bar chart
fig = px.bar(municipality, x=municipality.index, y="QT_VOTES", title="Most Voted Mayor by Municipality")
fig.show()
The analysis showed a strong presence of candidates like Eduardo Suplicy (PT) across several municipalities, reflecting broad political support.
# Grouping councilors by municipality
municipality_councilor = councilor.groupby("NM_MUNICIPIO").sum().sort_values("QT_VOTES", ascending=False)
# Bar chart
fig = px.bar(municipality_councilor, x=municipality_councilor.index, y="QT_VOTES", title="Most Voted Councilor by Municipality")
fig.show()
The vote distribution charts confirmed the dominance of MDB and PSOL, with PSOL's support growing in peripheral zones.
# Analyzing distribution of votes by party
party_votes = election.groupby("SG_PARTIDO").sum().sort_values("QT_VOTES", ascending=False)
# Bar chart
fig = px.bar(party_votes, x=party_votes.index, y="QT_VOTES", title="Distribution of Votes by Political Party")
fig.show()
8. Interactive Power BI Dashboards: Click to access the link
This dashboard provided a detailed view of electoral preferences by region, highlighting the polarization between urban and peripheral areas.
import plotly.express as px
# Gráfico de mapa para distribuição de votos por município
df = pd.read_csv('distribution_votes.csv')
fig = px.choropleth(df, locations="municipality", color="votes", hover_name="municipality", title="Distribuição Geográfica de Votos")
fig.show()
This dashboard was essential for understanding candidate performance across regions, using heatmaps and bar charts.
import plotly.express as px
# Bar chart for vote analysis by party
df = pd.read_csv('votes_by_party.csv')
fig = px.bar(df, x="party", y="votes", color="party", title="Vote Analysis by Party")
fig.show()
The visualization allowed for identifying votes distribution by party and electoral preferences by zone.
# Dashboard for candidate performance
df = pd.read_csv('candidates_performance.csv')
fig = px.scatter(df, x="zone", y="votes", color="party", title="Candidate Performance by Electoral Zone")
fig.show()
This dashboard analyzed voting by age, gender, and social class, highlighting preferences of younger voters and lower social classes for progressive candidates.
# Dashboard for comparison between candidates
df = pd.read_csv('candidates_comparison.csv')
fig = px.scatter(df, x="votes_mayor", y="votes_councilor", color="party", title="Comparison of Mayoral and Councilor Candidates")
fig.show()
The comparison between the two elections revealed significant changes in electoral preferences, with PSOL gaining ground in the peripheries.
\# Dashboard for voting by age group
df = pd.read_csv('votes_by_age_group.csv')
fig = px.pie(df, names="age_group", values="votes", title="Voting by Age Group")
fig.show()
The analysis of the 2024 São Paulo municipal election data provided valuable insights into voter behavior and emerging trends. We observed increasing political polarization, with PSOL gaining strength in peripheral areas and MDB maintaining a solid base in central urban areas. Additionally, the analysis revealed a shift in electoral preferences, with growing support for more progressive parties, especially among younger voters and lower social classes.
The analysis of charts and dashboards enabled a more detailed understanding of vote distribution by geography, candidate performance by electoral zone, and vote segmentation by party and demographic profile. The trends observed suggest that future electoral campaigns should focus on more segmented strategies, considering the social and economic characteristics of each region.
- Personalize electoral communication for different regions, considering demographic and socioeconomic profiles.
- Leverage the growth of social media and other digital platforms to connect with younger voters and those with limited access to traditional media.
- Tailor campaign proposals according to local issues such as security, health, and education, which were decisive factors for votes in various peripheral zones.
-
🇺🇸 Data Analysing Report: Click 🔗
-
🇧🇷 Data Analysing ReportClick 🔗
-
Power BI Access Link: Click 🔗
-
Power BI File: Click 🔗
- QR Code:
Scan the code to access the data and visualizations on Power BI.
- Superior Electoral Court (TSE)
- [Electoral Data Source]
- Articles on electoral data analysis and data visualization
This project was developed in Python and uses libraries like Pandas, Plotly, and Dash for data analysis and visualization. Follow the instructions below to set up the environment and run the code.
Before running the project, you need to have Python and Git installed on your system.
Download Python Download Git
Additionally, you will need the dependencies listed in the requirements.txt file:
pandas
plotly
To get started, clone the repository to your computer:
git clone https://github.com/your_user/SP2024-Election-Analysis.git
Install the required dependencies by running the following command:
pip install -r requirements.txt
To create an executable of the project, you can use PyInstaller. Run the following command to generate the executable:
pyinstaller --onefile electoral_analysis.py
This will create an executable file in the dist/ folder, which can be run directly without needing to install Python.
After installing the dependencies or creating the executable, run the main script to generate the analyses and visualizations:
python electoral_analysis.py
If you wish to view the interactive dashboards using Bash, run the following command:
python app.py
This will open the dashboard in your browser.
If you'd like to contribute to this project, feel free to fork it, make changes, and submit pull requests. Here are the steps to get started:
- Fork this repository.
- Create a branch for your feature:
git checkout -b new-feature
- Make the necessary changes and commit:
git commit -am 'Adds new feature'
- Push the branch to the remote repository:
git push origin new-feature
- Open a pull request for review and integration.
Make sure your changes do not break existing functionality and that the tests are up to date.
Copyright 2024 Mindful-AI-Assistants. Code released under the MIT license.