Google Cloud Platform Serverless Data Analysis with BigQuery and Cloud Dataflow
This 8-hour instructor-led course builds upon the CPB100 (which is a prerequisite). Through a combination of instructor-led presentations, demonstrations, and hands-on labs, students learn how to carry out no-ops data warehousing, analysis, and pipeline processing.
At the end of this one-day course, participants will be able to:
- Build up a complex BigQuery using clauses, inner selects, built-in functions, and joins
- Load and export data to/from BigQuery
- Identify need for nested, repeated fields, and user-defined functions
- Understand pipeline processing, terms, and concepts
- Write pipelines in Java or Python and launch them locally or in the Cloud
- Implement Map Reduce transforms in Dataflow pipelines
- Join datasets as side inputs
- Interoperate Dataflow, BigQuery, and Cloud Pub/Sub for real-time streaming
Who Should Attend
This class is intended for data analysts and data scientists responsible for: analyzing and visualizing big data, implementing cloud-based big data solutions, deploying or migrating big data applications to the public cloud, implementing and maintaining large-scale data storage environments, and transforming/processing big data.
- CPB100 - Google Cloud Platform Big Data & Machine Learning Fundamentals (or equivalent experience)
- Experience using a SQL-like query language to analyze data
- Knowledge of either Python or Java
Module 1: BigQuery Deep Dive
- A 3-hour (deep dive into details of BigQuery.
- What is BigQuery?
- Queries and functions + lab
- Load and export data + lab
- Advanced Capabilities
- Performance and pricing
Module 2: Dataflow Deep Dive
- A 3-hour deep dive into details of Cloud Dataflow. What is Dataflow?
- Data pipeline + lab
- MapReduce in Dataflow + lab
- Side inputs + lab
- Streaming + demo
Module 3: Summary
- Where to go from here