Skip to content

Latest commit

 

History

History
11 lines (8 loc) · 893 Bytes

File metadata and controls

11 lines (8 loc) · 893 Bytes

Enhancing-performance-of-big-data-machine-learning-models-on-Google-Cloud-Platform

The project is focused on parallelising pre-processing, measuring and machine learning in the cloud, as well as the evaluation and analysis of the cloud performance.

Dataset - A public dataset “Flowers” (3600 images, 5 classes) is used for the analysis.

About the project - A comprehensive in-depth analysis of the effect of parallelisation on the performance of various Cluster configurations(GCP's Dataproc) in terms of CPU/Memory utilisation, Disk I/O operations and Network bandwidth . The project also experiments with different VM configurations and distribution strategies to analyse the more efficient combination for training the ML model in the Google Cloud's AI platform.

Repository Contents - BigData_ML_Model_on_Google_Cloud_Platform.ipynb - Google Colaboratory file containing the code.