This collection of scripts prepares data for the Hamburg Architecture project.
The data has been provided by Architekten- und Ingenieurkammer Hamburg and "Day of Architecture and Civil Engineering in Hamburg" event.
🔎 The script complexity arises due to variations in the layout and element locations across PDFs of previous years' projects. Separate scripts are necessary for each year to ensure accurate formatting and data extraction.
- go to parseNewData folder
- run
node index.js
to execute the script - the script will update finalProjects.json file with new projects and create a new json file with the current year's name in the json folder and add images to webp_images folder.
Optional:
To store the data in a database, create a model using Sequelize ORM and save it using the sequelize
and pg
packages. Refer to the populateDatabaseORM.js
file in helpers
folder for an example.
🔑 In geocodeAddresses.js, you can obtain the lat and lng coordinates for each project using the Google Maps API, which requires a Google Cloud Account and API Key for this project. Visit the Google Maps JavaScript API documentation for a guidance on how to get and set up your API key.
🪛 To obtain translations for the project descriptions, you can utilise the Google Translate API. To do this, it is necessary for you to establish an API Key.
❗️ Make sure to save your sensitive information in the env file and use dotenv to establish secure connections between your server and database. Refer to the .env.example
file.
The data has been provided by Architekten- und Ingenieurkammer Hamburg.
All image rights belong to the Architekten- und Ingenieurkammer Hamburg. Images downloaded from their program PDFs using PDF24 Tools.
Contributions to the data scraper application are welcomed. If you have any suggestions, bug reports, or feature requests, please feel free to submit an issue or a pull request. 👋🏼
This project is released under the MIT License, so you are free to use, modify, and redistribute the code under the terms of the license.