The purpose of this repository is to process data, prepare it, and build models to predict certain activities using ML techniques. The entire process leverages PySpark for distributed data processing . I included some personal notes from Advanced Machine Learning Signal Processing and Applied AI with deeplearning courses from IBM and other BigData courses.