- know each specs for a GCP component & vice versa
- be able to choose by the functionality (see this table)
- determine the purpose of the use case presented in the question
- know the difference between GCP components & their open source / hadoop counterparts
- be aware how to migrate datawarehouse, on premise clusters...
- know the basics of Tensorflow, Stackdriver, Data Fusion & Data Studio, the regulations...
- practice - see the sample questions to get familiar with the exam
- keep in mind all the best practices, the anti/pattern for each component
- acquire a strong knowledge in NoSql databases & in Machine Learning
- beware there could be multiple answers
- know when answers are wrong and try to eliminate them
- Storage (20%),
- Big Data Processing (35%),
- Machine Learning (18%),
- case studies (same as sample case studies 15%) and
- others (Hadoop, security, stackdriver about 12%).
or more precisely (dated back mid 2019, since then redis, airflow & RGPD topics have been added) :
GCP Service | Service Function | Certification weight |
---|---|---|
Cloud Storage | Unified object storage | 2 % |
Cloud SQL & Spanner | Fully-managed SQL Database | 4 % |
Cloud Datastore | NoSQL database (think adhoc storage) | 2 % |
Big Table | NoSQL massive data big data service | 16 % |
Big Query | Petabyte scale data warehouse | 16 % |
Pub/Sub | Asynchronous messaging service | 6 % |
Cloud Dataproc | Managed hadoop and spark | 12 % |
Cloud Dataflow | Data Processing (Pipelines) | 16 % |
Tensorflow | Machine learning language | 20 % |
- How I passed the Google Professional Data Engineer Exam in 2020
- A mini review of GCP for data science and engineering
- A GCP flowchart a day
- A Study Guide to the Google Cloud Professional Data Engineer Certification Path
- Google Cloud Certified Professional Data Engineer - 2019 Updated exam
- A TensorFlow Glossary/Cheat Sheet
- Big data on google cloud
- Google Cloud Official Icons and Solution Architectures
- Data and Analytics on Google Cloud Platform
Other links :