This bootcamp is designed to introduce you to data exploration using the Python programming language. Through this intense, weeklong program you will master the skills necessary to manipulate, visualize and explore datasets to extract valuable insights.
Is this course for you?
If you are excited about the world of machine learning and data science but have yet to fully dive in, this course will catapult your skills so that you can smoothly transition into that world. Data exploration is the foundation for all good data analysis and where the vast majority of time is spent for data scientists. No prior programming experience is needed as a thorough introduction to Python will be given in a pre-course assignment.
December 12th - 16th
Monday and Thursday: 9 a.m. - 3 p.m.
Tuesday, Wednesday and Friday: 9 a.m. - 6 p.m.
Structure of Course
Learning is accomplished by working through difficult assignments and receiving and reviewing modeled solutions. The class will rotate from instructor guided lessons to student-focused exercises. The instructor will personally review all code and give feedback for all course assignments. Approximately 200 short answer questions with detailed solutions will be available. No more than 10 students will be enrolled in the class ensuring personalized learning and participation.
Before the Course:
Students will need to set aside 10 - 20 hours to set up the programming environment and to complete a thorough overview of the fundamentals of Python.
Day 1: Python Review
Since it is vital to have a firm grasp of Python, a review of the most important concepts of the pre-course assignment will be covered.
Day 2: Introduction to Pandas
Perhaps the most popular and widely used open-source data wrangling tool of the times, the Pandas library and its main data structures, the Series and DataFrame will be introduced.
Day 3: Split-Apply-Combine
The split-apply-combine paradigm is crucial for finding insights about particular groupings within your data. Many difficult questions from the popular quiz show Jeopardy will be answered.
Day 4: Time Series
Stemming from its original purpose, Pandas superior time series functionality will be explored by grabbing stock price data and building a simple prediction model for the major stock indices.
Day 5: Visualization and Assessment
House pricing data from a kaggle competition will be visually inspected using Python's plotting libraries Matplotlib and Seaborn. Additionally, a test of all skills covered will be required for a mock data science assessment.
The instructor will be available indefinitely to help students achieve their data exploration goals.
Ted Petrou is a data scientist at Schlumberger where he spends the vast majority of his time exploring data. Some of his projects include using targeted sentiment analysis to discover the root cause of part failure from engineer text, developing customized client/server dashboarding applications and real-time web services to avoid mispricing of sales items. He also enjoys teaching the fundamentals of data science/analytics and offers a wide range of classes. Ted received his Masters degree in statistics from Rice University and used his analytical skills to play poker professionally and teach math before becoming a data scientist. He is also head of Houston Data Science.