This three-day course is for data engineers, analysts, architects; software engineers; IT operations; and technical managers interested in a thorough, hands-on overview of the Apache Spark platform.
The course covers the core APIs for using Spark, fundamental mechanisms and basic internals of the platform, SQL and other high-level data access tools, as well as Spark's streaming capabilities and machine learning APIs.
Each topic includes slide and lecture content along with hands-on use of Spark through the elegant Databricks web-based notebook environment. Inspired by tools like IPython/Jupyter and Matlab, Databricks notebooks allow attendees to code jobs, data analysis queries, and generate visualizations using their own cloud-based Spark cluster, accessed through a web browser.*
Duration: 3 Days, Full Time (9AM to 5PM)
We will have a break from noon to 1pm daily; lunch will not be provided, but there are several options nearby.
Prerequisites: Basic understanding of programming in Python or Scala. Knowledge or experience in Java, SQL can be beneficial but is not essential.
After taking this class you will be able to:
• Describe Spark’s fundamental mechanics
• Use the core Spark APIs to operate on data
• Articulate and implement typical use cases for Spark
• Build data pipelines with SparkSQL and DataFrames
• Analyze Spark jobs using the UIs and logs
• Create Streaming and Machine Learning jobs
*Note: Safari and Internet Explorer are not supported. You will need to bring your own laptop.
The creators of Apache Spark spun out of UC Berkeley to start Databricks in 2013. At Databricks we continue to grow the Spark project via software development, roadmap planning, and fostering the community. We have deeply integrated our Spark engineering efforts and our training program. The lead committers on Spark help design, create, and review our training curriculum and courseware. When you learn about Spark from Databricks you are learning from the Authority on Spark.