This event has passed. Are you looking for Data Day Texas 2013?
Data Day Austin
Saturday, January 29, 2011
Norris Conference Center, Austin
Cassandra talks/workshops include:
A Billion Columns? No problem: an Introduction to Cassandra 0.7
Jonahan B. Ellis - Cassandra Project Lead and Co-Founder, Data Stax
Introduction to Cassandra - for Java Developers
Nate McCall - Software Developer, DataStax
Beyond SQL - Cassandra Data Modeling
Tyler Hobbs - Software Developer, Data Stax
Hadoop talks/workshops include:
What BigData folks need to know about DevOps
Matt Ray - Technical Evangelist, OpsCode
Matt will walk attendees through launching a multi-node Hadoop cluster, and then show them the right direction to build their own. Matt will then show how Chef works and why it's relevant to Big Data. This will be followed by a brief exploration of the Cluster Chef work that Infochimps has done. Matt will then hand the workshop over Flip Kromer of Infochimps for more Hadoop-specific questions.
Higher Order Languages for Hadoop I - Wukong
Flip Kromer Founder and CTO, Infochimps
Wukong allows you to treat your dataset like:
* a stream of lines when it’s efficient to process by lines, * a stream of field arrays when it’s efficient to deal directly with fields
* a stream of lightweight objects when it’s efficient to deal with objects
No one knows more about Wukong that Flip Kromer.
Higher Order Languages for Hadoop II- Pig
Jacob Perkins - Hadoop Engineer, Infochimps
Pig is a Hadoop extension that simplifies Hadoop programming by giving you a high-level data processing language while keeping Hadoop’s simple scalability and reliability.
Web Crawling and Data Gathering with Apache Nutch
Steve Watt (blog) - IBM Big Data Lead, IBM Software Strategy
The first phase of any analytics pipeline is finding and loading the data. Apache Nutch is a Hadoop based web crawler that acts as an excellent tool to be able to pull down content from the web and load it into the HDFS to make it available for Hadoop Analytics. This session will teach you how to install and configure Nutch, how to use it to crawl and gather targeted content from the web and how to fine tune your crawls through the Nutch API.
Hadoop Analytics for the Business Professional
Gino Bustelo - IBM BigSheets Lead
IBM BigSheets is an emerging technology that provides an all encompassing Hadoop based Analytics tool for the line of business user to Gather, Explore and Visualize both their structured and unstructured data. We'll explore the business patterns around the Big Data analytics space, take you through what the tool does and show you how it can be used.
Other workshops include:
I Know Where You Are: an introduction to working with location data.
Sandeep Parikh - Co-founder, Argia, Inc
Shaun Dubuque - Co-founder, Argia, Inc
Thinking of developing location-based apps? Sandeep and Shaun show you sources for location data and strategies for managing it.
Additional workshops/presentations to be announced...
For comments, questions, or sponsorship opportunities, contact email@example.com