Actions and Detail Panel
Data Exploration Bootcamp with Python
Mon, May 15, 2017, 9:00 AM – Fri, May 19, 2017, 5:00 PM CDT
This bootcamp is designed to introduce you to data exploration using the Python programming language. Through this intense, weeklong program you will master the skills necessary to manipulate, visualize and explore datasets to extract valuable insights.
Is this course for you?
If you are excited about the world of machine learning and data science but have yet to fully dive in, this course will catapult your skills so that you can smoothly transition into it. Data exploration is the foundation for all good data analysis and where the vast majority of time is spent for data scientists. No prior programming experience is needed as a thorough introduction to Python will be given in a pre-course assignment.
May 15th - 19th: 9 a.m. - 5 p.m.
If you are a student or unemployed use discount code student25 to get a 25% discount. If you are a member of Houston Data Science, use hds10 to get 10% off. Become an affiliate and earn $100 for every attendee that you refer.
Structure of Course
Learning is accomplished by working through difficult assignments and receiving and reviewing modeled solutions. Using a 'flipped classroom', students will be expected to prepare and read each day's material before coming to class. In class, students will rotate from instructor guided lessons to student-focused exercises and projects. The instructor will personally review all code and give feedback for all course assignments. Approximately 300 short answer questions with detailed solutions will be available. No more than 10 students will be enrolled in the class ensuring personalized learning and participation.
Before the Course:
Students will need to set aside 10 - 20 hours to set up the programming environment and to complete a thorough overview of the fundamentals of Python. An additional class will be held the week before the bootcamp to ensure all students are completing this assignment.
Day 1: Introduction to Pandas
Perhaps the most popular and widely used open-source data wrangling tool of the times, the Pandas library and its main data structures, the Series and DataFrame will be introduced.
Day 2: Split-Apply-Combine
The split-apply-combine paradigm is crucial for finding insights about particular groupings within your data. Many valuable insights from city of Houston public data will be discovered.
Day 3: Cleaning and Preparing Data for Machine Learning
All real-world data is messy and not immediately available for consumption by machine learning models. Many different methods on cleaning, tidying and preparing data for input into machine learning will be utilized before deploying some basic machine learning models.
Day 4: Time Series
Stemming from its original purpose, Pandas superior time series functionality will be explored by grabbing stock price data and building a prediction model for the major stock indices.
Day 5: Visualization and Assessment
All good data explorations will have visualizations that accurately and clearly describe the insights discovered. The fundamental plotting library Matplotlib and it's enhancer Seaborn will be introduced. A web application will be deployed using Flask with beautiful and modern interactive visualizations from the popular Bokeh library.
A mock data science interview assignment will test student progress on all course material. Additionally, a series of short-answer Pandas questions will be assigned intermittently after course completion to ensure retention of knowledge.
Ted Petrou is the author of the upcoming Pandas Cookbook. He is a data scientist at Schlumberger where he spends the vast majority of his time exploring data. Some of his projects include using targeted sentiment analysis to discover the root cause of part failure from engineer text, developing customized client/server dashboarding applications and real-time web services to avoid mispricing of sales items. Ted received his Masters degree in statistics from Rice University and used his analytical skills to play poker professionally and teach math before becoming a data scientist. He is also head of Houston Data Science.