Skip to content

Latest commit

 

History

History
49 lines (41 loc) · 1.92 KB

README.md

File metadata and controls

49 lines (41 loc) · 1.92 KB

Data pipeline: the integration of dbt with DuckDB

The DBT (data build tool) is a framework, which uses SQL as a syntax base, for processing/transforming analytical data. It focuses on the Transformation (T) step of the ETL (Extraction, Transformation and Load)

What is DuckDB?

DuckDB is a relational embeddable analytical DBMS that focuses on supporting analytical query workloads (OLAP). Similar to SQLite, DuckDB prioritizes simplicity and ease of integration by eliminating external dependencies for compilation and run-time. Why DuckDB ? DuckDB is designed to be embedded within applications or used as a serverless database. You can integrate it directly into your data pipeline without the need for a separate server installation or configuration.

Dependencies

  • dbt core
  • duckdb
  • DBeaver (optional)

Set up the project

  • Create an isolated virtual environment for dbt-core
    conda create --name dbtenv python=3.11
    
  • Activate the Environment
    conda activate dbtenv
    
  • Install duckdb adapter
    pip install dbt-duckdb
    

Run the project

(Optional) Verify the data using DBeaver IDE

  • Connect DuckDB to DBeaver

    Alt text The Path should the same as you defined in the profiles.yml or choose Open to browse up the directory. Alt text

Resources: