The goal of this workshop is to learn how to use Cascalog to build complex data processing workflows on top of Hadoop.
Cascalog's tight integration with Clojure lends itself to lots of powerful techniques which will be covered in this workshop. I will be using real BackType code as illustration of these techniques.
We'll spend a short amount of time going through Cascalog's features and spend most of our time learning techniques to use these features to build real apps.
1. Bring your laptop.
2. You should have a basic understanding of Clojure (e.g., have gone through the Programming Clojure book)
3. You should know how to use leiningen to build Clojure applications.
4. No prior understanding of Cascalog necessary, but you'll get more value if you go through the tutorials and experiment with the playground beforehand.
1. Incremental development using emacs and leiningen
2. Basics of Cascalog
3. The Cascalog query planner in depth: Cascalog -> Cascading -> MapReduce
4. The when, how, and why of Cascalog’s custom operation types
5. Making queries dynamically: :<<, :>>, construct, and associated techniques
6. Abstraction and composition: functions and predicate macros
7. Understanding the performance of Cascalog queries
8. Custom taps
9. Unit testing Cascalog queries
10. Exporting data with ElephantDB
1. ElephantDB will be open-sourced sometime prior to the workshop.