Distributed Frameworks spark https://github.com/ray-project/ray https://dask.org/ Warehouse redshift snowflake vertica Data Lake databricks delta change data capture / streaming https://www.attunity.com/ https://streamsets.com/ https://github.com/debezium/debezium Streaming kafka flink beam Notebooks / exploration jupyter https://zeppelin.apache.org/ https://github.com/ironmussa/Optimus labeling https://www.snorkel.org/ https://www.figure-eight.com/ https://scale.ai https://www.labelbox.com Orchistration airflow luigi https://github.com/dagster-io/dagster solutions snowplow https://www.prefect.io/ https://www.ascend.io/ https://www.datmo.com/ TFX-OSS a/b testing https://www.optimizely.com/ https://github.com/YahooArchive/mendel catalogue https://github.com/airbnb/knowledge-repo https://github.com/lyft/amundsen BI superset looker periscope neat utils query CSVs w/ SQL: cq books Designing Data-Intensive Applications