Ever since Paco Nathan began offering his Introduction to Data Science, people have been asking if he would offer a similar class for Machine Learning. Paco first offered an abbreviated version of this workshop at the recent Big Data Tech Con.
The last time this class was offered in Austin, during Data Day Texas, it sold out. Since Paco is coming back to offer his Cluster Computing workshop, we asked if he would offer a repeat of the Machine Learning class as well.
If you really want to take a deep dive on Machine Learning, there are some good MOOCs , like Andrew Ng's course at Stanford or Pedro Domingos' Machine Learning course out of the University of Washington. The thing that is missing from both of these courses is the busines case -- how does this fit within a team? What's the process for engineering, using predictive analytics / predictive modeling. How do you deply machine learning apps into something that brings revenue? This course will focus more on the business / use cases.
Paco begins the course with a history -- all the way back to the Battle of London, the precursors of cybernetics, from early work in adaptive signal processing to Neural Networks then to Genetic Programming, early AI systems, and ultimately back through Neural Networks.
How much math is needed for the class? Basic linear algebra / matrix multiplication / pre-calc-level analytic geometry. That's all.
Paco Nathan @pacoid, is a “player/coach” who's led innovative Data teams building large-scale apps for 10+ years. Expert in distributed systems, machine learning, Enterprise data workflows. Paco is an O'Reilly author, and an advisor for several firms including The Data Guild, Agrepedia, and TagThisCar. Paco received his BS Math Sci and MS Comp Sci degrees from Stanford University, and has 25+ years technology industry experience ranging from Bell Labs to early-stage start-ups.
Paco is a frequent speaker at data conferences. In the last year, his speaking dates include Strata, OSCON, Big Data Tech Con, and Data Day Texas.
This workshop provides a crash-course introduction to Machine Learning, one which is complementary to the popular MOOCs about ML.
We'll start by defining the terminology, drawing comparisons with related fields (statistical inference, optimization theory, etc.), and then reviewing a brief history of ML, from early neural nets onward.
We'll consider aspects of the practice of applying ML in business use cases: a process for feature engineering, tools for data prep and visualization, how to grapple with dimensional reduction, etc.
The presentations will emphasize three aspects of Machine Learning in particular:
- Representation: a survey of useful algorithms, including probabilistic data structures, text analytics, plus issues to consider.
- Evaluation: distinguishing how some methods work better for given use cases, including issues of overfitting, bias, etc., and the use of quantitative measures.
- Optimization: methods for improving on a good thing, including how to move from graph theory to sparse matrices, ensemble models, plus a look at ML competition platforms.
We'll conclude with a recommender case study, plus suggestions for where to continue with further studies.
Prerequisites: some familiarity with programming, some math. We will be programming in R and Python, along with cluster computing examples in Hadoop, Spark, etc.
Note: This class is part lecture and part hands-on; you are required to bring a laptop.
- hands-on experience with foundational technologies
- set up your own environment for this kind of work
- evaluate different ML frameworks and their trade-offs
- discussion about advanced math use cases
- professional networking (perhaps the best part!)
- inter-disciplinary perspectives and how to build teams
- exchange about ML use cases that you need to solve
objectives, access to workshop materials, server login, etc.
foundational themes in ML: from Radar to Internet
"WHO, ...and now":
foundational themes in ML: from Internet to IoT
a general approach to ML, from real-world problems to parallelized solutions
feedback loops based on machine data, business disruptions, datacenter computing
"Just Enough Math":
leveraging adv. math in business ... beyond calculus
"Let Computers Perform the Heavy Lifting":
exercise with gradient descent; strategy at scale
lunch (networking during these workshops is some of the best part)
"HOW, the business perspectives":
managing team and effective process
hands-on with popular tools (part 1) -- R deep-dive into algorithms
hands-on with popular tools (part 2) -- Spark, KNIME, Titan, VW, etc.
hands-on with popular tools (part 3) -- Py stack, scikit-learn, etc.
integrating data workflows: Python, Scala, KNIME, etc.
case study: CoPA recommender
If you have any questions regarding the class, send them to email@example.com