Data Science 202 - Ensemble Methods
Weekly Event - Every Saturday: 9:00 AM to 1:00 PM (PDT)
Ensemble methods are generally agreed to be the closest thing to a cure-all in machine learning. Google uses ensemble methods for determining what ads to place on pages and eBay uses them for determining what search results to return. Ensemble methods combine a large number of mediocre machine learning algorithms into a single answer. When done properly this results in a algorithm that resists over-training and that delivers top performance on a wide variety of problems - winning a majority of Kaggle competitions for example.
This class will start with the basics of trees and will cover the background, usage, strengths and weaknesses of the major ensemble algorithms. By the end of class attendees will understand when these algorithms are applicable and how to get the best performance from them. The class will meet for 4 hours on 4 Saturdays. We will schedule a project or practice session if there's interest. Here are the topics we'll cover.
1. Decision Trees, Boosting
2. Gradient Boosting
3. Random Forests
4. Topics of Interest in Machine Learning (e.g. active learning, low-rank matrix approx for recommender, expectation maximization algo)
If attendees are interested we can schedule a session for projects or a Kaggle machine learning competition.
The class is intended for computer programmers. No prior knowlege of machine learning is assumed. The course will primarily use R statistical language. There will be a separate review of R language for those requiring it. The class will include derivations that require undergrad level math - calculus and linear algebra.
Sept 14, Sept 21, Sept 28, Oct 5 - One payment covers all 4 sessions.
Those interested in attending by webcast should sign up at least 12 hours before class starts in order to receive instructions and passwords.