Code Your Own Spam (or Awesome) Filter!
Sunday, December 6, 2009 from 1:00 PM to 3:00 PM (EST)
This class is a gentle introduction to the document classification techniques used in spam filters, news sites, and language detectors. We'll cover simple parsing, feature selection, and naive bayesian classification. By the end of the class, you'll have written your own Twitter spam (or interestingness or happiness or annoyingness) filter, and have the code and tools to develop your own projects.
There are no math prerequisites, but you should be familiar with basic Python syntax (variables, conditionals, loops, functions, classes, lists, dictionaries, and importing packages).
We'll be using Python with the free Natural Language Toolkit (NLTK - http://www.nltk.org/download). Please bring a laptop with these packages installed (if you need help installing NLTK, please just arrive 15 minutes early).