Machine learning is the art of training some model by using existing data along with a statistical method to create a parametric representation of a model that fits the data. That’s kind of a mouthful, but what that essentially means is that a machine learning algorithm uses statistical processes to learn from examples, then applies what it has learned to future inputs to predict an outcome.
Machine learning can classically be summarized with two methodologies: supervised and unsupervised learning.
- In supervised learning, the “correct answers” are annotated ahead of time and the algorithm tries to fit a decision space based on those answers.
- In unsupervised learning, algorithms try to group like examples together, inferring similarities usually via distance metrics.
These learning types allow us to explore data and categorize them in a meaningful way, predicting where new data will fit into our models.
Receive a digital CERTIFICATE OF COMPLETION for display on your LinkedIn profiles with links back to the content and verification details to allow anyone to connect to your learning.
Divergence Academy is Texas Workforce Commission approved career school.
This workshop can be taken standalone or as part of the sequence of four workshops that make up Data Science for Analysts.
- WORKSHOP #1: Python for Data Analysis (3 days)
- WORKSHOP #2: Introduction to Machine Learning (1 day)
- WORKSHOP #3: Scaling Data Analysis with Spark (3 days)
- WORKSHOP #4: Enterprise Data Warehousing & Analytics with Hadoop and Tableau (3 days)
WHAT YOU WILL LEARN
Scikit-Learn is a powerful machine learning library implemented in Python with numeric and scientific computing powerhouses Numpy, Scipy, and matplotlib for extremely fast analysis of small to medium sized data sets. It is open source, commercially usable and contains many modern machine learning algorithms for classification, regression, clustering, feature extraction, and optimization. For this reason Scikit-Learn is often the first tool in a Data Scientists toolkit for machine learning of incoming data sets.
The purpose of this one day seminar is to serve as an introduction to Machine Learning with Scikit-Learn. We will explore several clustering, classification, and regression algorithms for a variety of machine learning tasks and learn how to implement these tasks with our data using Scikit-Learn and Python. In particular, we will structure our machine learning models as though we were producing a data product, an actionable model that can be used in larger programs or algorithms; rather than as simply a research or investigation methodology.
The workshop will cover the following topics:
- An introduction to machine learning
- Loading datasets into Scikit-Learn
- Building models and model persistence
- Feature extraction from data sets
- Model selection and evaluation
- Building a data pipeline
After this workshop you should understand the basics of machine learning and how to implement machine learning algorithms on your data sets using Python and Scikit-Learn. In particularly you should understand basic regressions, classifiers, and clustering algorithms and how to fit a model and use it to predict future outcomes.
When & Where