This demo shows how one would use Airflow to copy data from one Postgres database to another. Docker containers are used to represent services in a production environment.
The docker-compose.yaml
describes the service definitions. Most of it is adapted from the official Airflow docker-compose, with some minor tweaks, mainly the use of the LocalExecutor
instead of the CeleryExecutor
to reduce overhead since this is a toy application. For more information about the Airflow-provided docker-compose
, see this page.
To set up the test environment, first run the startup script startup.sh
if running from Linux. This creates three directories Airflow is going to use as volumes as specified in the docker-compose
file and set the Airflow UID to the UID your user account is using.
you@host:$ docker-compose up airflow-init
you@host:$ docker-compose up
you@host:$ docker-compose down --volumes --remove-orphans
postgres
: Airflow metadata serverairflow-webserver
: Airflow webserver. Accessible atlocalhost:5884
. Web UI login credentials:user: airflow; password: airflow
airflow-scheduler
: Airflow schedulerairflow-init
: Performs some checks and initializes Airflowairflow-cli
: Airflow CLI
postgres-source
: Login credentials:user: postgres; password: postgres
postgres-target
: Login credentials:user: postgres; password: postgres
pg-admin
: Web-based frontend for inspecting the databases. Accessible atlocalhost:5050
. Login credentials:email: admin@admin.com; password: root
The demonstration can be found in dags/postgres_dag/postgres_dag.py
. The DAG should show up automatically on the Airflow Web UI.
At startup, postgres-source
gets populated with the sales
table
id | creation_date | sale_value |
---|---|---|
1 | 2022-06-15 | 1234.56 |
2 | 2022-06-16 | 9876.54 |
You can view this in pg-admin
by first linking to the Postgres database, then navigating to Servers/<registered server name>/Databases/postgres/Schemas/public/Tables/sales
. To link the database to pg-admin
- Give the server a name in the
General
tab (can be any name) - Go to the
Connection
tab and supply the following:
- Host name/address:
postgres-source
(orpostgres-target
) - Post:
5432
- Maintenance database:
postgres
- Username:
postgres
- Password:
postgres
The postgres-target
database starts off empty, but after running the DAG, the sales
table should be copied there.