This workshop will guide the aspiring Data Scientist through modeling and machine learning. You will use machine learning methods to validate and evaluate data models.
As a beginning Data Scientist, you will learn how to load data into Python. Interpret and visualize the data, while dealing with variables and missing values. We will teach you how to come to sound conclusions about your data, despite some real-world challenges.
By the end of this course, you will have an understanding of applied predictive modeling methods, and the know how to use existing machine learning methods in Python. This will allow you to work with team members in a data science projects, find problems, and come up solutions.
This course is for IT professionals aspiring to be Data Scientists, students who want to learn about Data Science, Statisticians, Project Managers who want to expand their horizon into Data Science, and any person who is interested in Data Science.
In this workshop you’ll learn an in-depth process of Data Science :
- Collect data from a variety of sources (e.g., Excel, web scraping, APIs and others)
- Explore large data sets
- Learn to use Python for executing Data Science Projects
- Master the application of Analytics and Machine Learning techniques
- Know how to use matplotlib and seaborn libraries to create beautiful data visualization.
This is a very practical and hands-on workshop that has lots of class exercises. Through this course, we strive to make you fully equipped to become a developer who can execute full-fledged Data Science projects.
Session I: Introduction to Data Science with Python
In our first class we will go over some Python fundamentals, which will cover syntax, data structures, and built-in functions. We will move on to practicing for loops, functions, and introducing the packages that will be covered over the course and how to install them.
Session II: Exploratory Data Analysis
We will start by introducing NumPy and Pandas and showcasing how to clean, manipulate, and analyze data. Students will practice on the Titanic dataset before moving on to web scraping techniques and extracting data from APIs.
Session III: Fundamental Modeling Techniques and Data Visualization
We will begin by reviewing NumPy and Pandas before delving deeper into more advanced techniques to clean and munge data. Using Matplotlib and Seaborn packages, students will learn to visualize data and identify trends.
Session IV: Data Mining and Machine Learning
We will be introducing the Cross Industry Standard Process for Data Mining (CRISP-DM) and data mining with supervised learning and unsupervised learning. Afterwards, students will explore machine learning algorithms such as Linear Regression, Multivariable Regression, and Logistic Regression, Naive Bayes, Decision Trees, and ensemble techniques.
Session V: Machine Learning Concepts and Recommendation Systems
Students will review machine learning concepts including K-Nearest Neighbors Classification, K-Means Clustering, and will start building their own recommendation system with a MovieLens dataset, understanding dimension reduction with Principal Component Analysis, exploring Suport Vector Machines, and learning A/B testing with T-Tests and P-Values.
Session VI: Natural Language Processing and Sentiment Analysis
Students will explore the natural language toolkit (NLTK) to process and extract text data. Students will then start a Natural Language Processing project with Yelp data before we move on to Sentimental Analysis to predict positive versus negative Yelp reviews.
Session VII: Big Data with Spark
Students will be introduced to Big Data and data engineering with the Hadoop ecosystem, the MapReduce paradigm, and the up-and-coming Apache Spark.
Session VIII: Deep Learning and Time Series
We will be introducing deep learning and training neural networks and visualizing what a neural network has learned using TensorFlow Playground. Students will also learn time series, what makes them special, loading and handling time series in Pandas. Understand how seasonality affects trends.
Session IX: Computer Vision with OpenCV
Students will be introduced to computer vision fundamentals using OpenCV to detect faces, people, cars, and other objects.
Session X: Hack Day
In the last session, we will host a private Kaggle competition amongst the students. Students will be grouped into teams and will showcase their group project at the end of class.
Prereqs & Preparation
Students must bring a laptop and should install Anaconda, which is a free package that includes python and a number of tools that will be used in class (http://continuum.io/downloads).
Anyone taking this course should have some minimum experience with programming with R, Python, or any other programming language.
If not, we offer our Python Foundation Course for Free!!!