Skip to content

A demonstration of how to copy data between Postgres databases using Airflow

Notifications You must be signed in to change notification settings

sean-1014/airflow-postgres-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Airflow Postgres Demo

This demo shows how one would use Airflow to copy data from one Postgres database to another. Docker containers are used to represent services in a production environment.

Setup

The docker-compose.yaml describes the service definitions. Most of it is adapted from the official Airflow docker-compose, with some minor tweaks, mainly the use of the LocalExecutor instead of the CeleryExecutor to reduce overhead since this is a toy application. For more information about the Airflow-provided docker-compose, see this page.

To set up the test environment, first run the startup script startup.sh if running from Linux. This creates three directories Airflow is going to use as volumes as specified in the docker-compose file and set the Airflow UID to the UID your user account is using.

Initializing Environment

you@host:$ docker-compose up airflow-init

Running Airflow

you@host:$ docker-compose up

Cleaning Up

you@host:$ docker-compose down --volumes --remove-orphans

Services

Airflow Components

  • postgres: Airflow metadata server
  • airflow-webserver: Airflow webserver. Accessible at localhost:5884. Web UI login credentials: user: airflow; password: airflow
  • airflow-scheduler: Airflow scheduler
  • airflow-init: Performs some checks and initializes Airflow
  • airflow-cli: Airflow CLI

Toy Database Components

  • postgres-source: Login credentials: user: postgres; password: postgres
  • postgres-target: Login credentials: user: postgres; password: postgres
  • pg-admin: Web-based frontend for inspecting the databases. Accessible at localhost:5050. Login credentials: email: admin@admin.com; password: root

Demonstration

The demonstration can be found in dags/postgres_dag/postgres_dag.py. The DAG should show up automatically on the Airflow Web UI.

Data

At startup, postgres-source gets populated with the sales table

id creation_date sale_value
1 2022-06-15 1234.56
2 2022-06-16 9876.54

You can view this in pg-admin by first linking to the Postgres database, then navigating to Servers/<registered server name>/Databases/postgres/Schemas/public/Tables/sales. To link the database to pg-admin

  1. Give the server a name in the General tab (can be any name)
  2. Go to the Connection tab and supply the following:
  • Host name/address: postgres-source (or postgres-target)
  • Post: 5432
  • Maintenance database: postgres
  • Username: postgres
  • Password: postgres

The postgres-target database starts off empty, but after running the DAG, the sales table should be copied there.

About

A demonstration of how to copy data between Postgres databases using Airflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published