Skip to content

Latest commit

 

History

History
9 lines (6 loc) · 288 Bytes

README.md

File metadata and controls

9 lines (6 loc) · 288 Bytes

s3-data-lake-example

Creating a S3 Data lake with pyspark ETL.

First step involves using pandas to only extract the columns that are required and to create files in data lake using parquet format.

Queries to be supported The Lines where the expected vs actual arrival time is long.