This project processes and analyzes speeches from the Greek Parliament, focusing on transforming raw data into a PostgreSQL database and providing data management functionality through Python scripts.
- Python: Version 3.10 - 3.12 (Download Python)
- PostgreSQL: Version 14 or higher, including pgAdmin (Download PostgreSQL)
- Microsoft C++ Build Tools: Required for some Python dependencies (Download Build Tools)
Create a .env
file in the root directory with the following format:
DB_USER=your_database_username
DB_PASSWORD=your_database_password
DB_HOST=localhost
DB_PORT=5432
DB_NAME=greek_parliament
For the GreekStemmer library to work properly with the correct encoding, modify the init.py file of the library as follows:
class GreekStemmer:
def load_settings(self):
custom_rules = ""
with open(os.path.join(
os.path.dirname(__file__), 'stemmer.yml'), 'r', encoding='utf-8') as f:
custom_rules = yaml.load(f.read(), Loader=yaml.FullLoader)
return custom_rules
This ensures that the stemmer.yml file is loaded with the correct encoding.
Run the script to import the CSV data into your PostgreSQL database:
python modules/import_csv_to_db.py
After importing the data, create the final_speeches
table:
python modules/create_final_speeches.py
Remove rows with NULL values in the member_name
column:
python modules/clear_null_values.py
- Ensure PostgreSQL is running before executing the scripts.
- Place the
Greek_Parliament_Proceedings_1989_2020.csv
file in thedata
folder. - The
.env
file is critical for securely passing database credentials.