Excited about big data, and want to get hands-on playing with data sets and popular data tools? It’s time to Code Big or Go Home!
- Date / time: December 4, 2012 @ 18:30 - 20:30
- Location: hack/reduce (275 3rd St, Cambridge, MA 02142)
- Cost: free (limited to 50 attendees)
See this event page for details.
- Background: Basic programming skills (e.g. with Java or Python). Knowledge on Big Data or Hadoop is NOT required.
- Laptop: Bring a Linux or Mac laptop with the software tools mentioned in the Prerequisites section here preinstalled. Users of a Windows laptop can install a Linux virtual machine (e.g. see notes here).
- Mindset: Excited to learn new tools and gain relevant skill sets
In this session, we will teach you how to program and operate Hadoop, the poster child technology enabler of Big Data. Why should you care? Take a look at this exploding chart of Hadoop job trends. Even CEOs are starting to care about Hadoop. Oh by the way, it’s open source and free to use.
By attending this session, you will be able to:
- Gain hands-on big data experiences with the experts
- Learn a cutting-edge tool that may help you tackle open problems at your current job, or open up new career opportunities
- Network with fellow hack / reduce technologists and find ways to work together on big data problems
In this 90-minute session, we will cover the following ground:
- Write and execute Java-based Map Reduce (MR) jobs to analyze the data at hand
- Program MR jobs with other languages (e.g. python, ruby) via Hadoop Streaming
- Basic usage of HDFS, the Hadoop file system to deploy and run MR jobs
- Introduction to monitoring and performance tuning on Hadoop
- Declarative data processing on Hadoop via HIVE
By the end of this session, you will be able to program and run Hadoop jobs on your own computer as well as on the cloud.
Chief Data Scientist at Hadapt
hack / reduce Contributo
Mingsheng Hong is Chief Data Scientist at Hadapt, driving the product roadmap and incubating analytic use cases. Prior to this role, Mingsheng was Field CTO at Vertica, an HP company, and was instrumental in its product development and positioning.
Mingsheng obtained his Computer Science Ph.D. degree at Cornell University, where he built Cayuga, the world's first expressive and scalable CEP engine. Mingsheng also co-founded the Microsoft CEDR event processing project, which became the Microsoft StreamInsight technology shipped with SQL Server 2008 and 2012.
Mingsheng is a frequent speaker on Big Data, and has given talks, lectures and demos at Hadoop World, TDWI conferences, the Cube and Harvard Business School.
Software Engineer at Hopper
Founding Member of hack / reduce Hackathons
Greg Lu is a software engineer at Hopper, a travel search engine company. He is well versed in Java, Hadoop/Mapreduce (distributed filesystem and computation), Cassandra and HBase (distributed databases), Heritrix (web crawling), and Solr/Lucene (search and indexing).
Greg is also the technical organizer of HackReduce (hackreduce.org), where he implemented an automated cluster management system in EC2 for creating and expanding the multiple Hadoop clusters (300-500 instances), as well as mentoring the participants during the one day hackathon events.
Greg began his software career as a web developer for over 6 years, starting with PHP and then Ruby on Rails for the later 5. He has also worked with many other languages and technologies throughout my own explorations and studies.
When & Where
hack/reduce // Boston's Big Data hackerspace