• Implemented a distributed file system using Python and PySpark for rapid analysis of millions of data rows. • Managed Spark clusters on Google Dataproc and hosted data on GCS buckets. • Performed ad-hoc analysis using BigQuery on a cloud data warehouse with 500k+ rows
Video Presentation : https://www.youtube.com/watch?v=C9lxvqlx-7g&ab_channel=SaiRaina