Machine Learning with Apache Spark and Python 1.5-days workshop
Learn to use open source software Apache Spark and Python to process large datasets and build predictive models from a real-world data.
This workshop will introduce you to essential concepts and practices for using Apache Spark. You will learn:
- How to set up Spark and connect to other data infrastructure
- RDDs and basic operators: maps, joins, filters, etc.
- Spark as an tool for data exploration: notebooks and workflow
- Advanced data structures: Dataframes and SparkSQL
- Machine learning in Spark
- Essentials for deploying Spark in the cloud
- Data visualization for dashboarding and analysis from Spark pipelines
The workshop is conceived to maximize the learning experience for everyone and includes 50% theory and 50% hands-on practice.
Class will run from 4pm to 8pm on Friday and from 9am to 5pm on Saturday.
Is lunch provided
Yes! Lunch on Saturday is included.
Are there any prerequisites?
Previous experience programming in Python or in other languages is advised to make best use of the workshop: most activities are conducted in Pyspark.
In the last 2 years Python has become a de-facto standard in data science and is widely adopted by most major companies. Reasons for this success include:
- large set of mature data science libraries => most needs covered
- worldwide community of enthusiasts => get help when you need it
- easy to learn, read and write => start contributing immediately
- supports both functional and object oriented coding => versatile and powerful
- full stack programming language => easier interaction between data scientists and software engineers
Apache Spark has revolutionized how we build and deploy data pipelines for ETL, Visualization and Machine Learning. Reasons for this success include:
- Flexible enough to run SQL-style queries, machine learning algorithms, and everything in between
- Fast and scalable: efficient memory use => runs up to 100x faster than Hadoop
- Supports data exploration and production workflows => same code that works on a laptop can be deployed to cloud-based computing clusters
- Free and open-source
The course is led by Abe Gong, Ph.D. Abe was the founding data scientist at Jawbone and Chief Data Officer at Aspire Health. He has deep experience building data systems and teams at growth-stage companies, especially in IoT and healthcare.
Terms & Conditions
In certain cases, we may need to cancel this workshop due to circumstances beyond our control or otherwise. If this happens, we will refund all registration fees for those who signed up. We are not responsible for any related expenses incurred by registered attendees (including but not limited to travel and hotel expenses).
More than 2 weeks before course: full refund.
1-2 weeks before course: 50% refund.
Less than 1 week before course: no refund available.
All public workshops come with a no-questions-asked money-back guarantee. If you are unhappy for any reason after attending the class, you can ask for a full refund.
If you liked the program so much that you'd like your friends to attend our workshop will give you 50% refund for each friend that subscribes to the next workshop. (2 friends = 100% money back). Ask us for details at the workshop.