The 7th Hadoop User Group UK meetup (HUGUK #7)
Friday, November 19, 2010 at 2:00 PM (GMT)
London, United Kingdom
UPDATE: Slides and videos are available at http://lanyrd.com/2010/huguk7.
SCHEDULE (Friday 19th November):
2:30pm: HBase at Facebook by Jonathan Gray (Facebook, HBase committer)
Jonathan Gray is a software engineer and open source advocate at Facebook. He has been an HBase committer for several years, getting his start with HBase at his previous startup Streamy which ran solely on a Hadoop/HBase data stack. At Facebook, Jonathan works with different engineering teams on using, contributing, and releasing open source software related to big data and infrastructure, working with technologies like Hadoop, HBase, Hive, Scribe, and Thrift.
Outline: This talk will discuss the usage of HBase at Facebook.
3:00pm: HBase project update by Michael Stack (StumbleUpon, HBase committer)
Michael Stack is an HBase committer and member of the Hadoop management committee. He works at StumbleUpon.
Outline: This talk will present the current state of HBase and what's coming in the next few months.
3:30pm: short break with refreshments
4:00pm: HBase and datamining at Mendeley by Dan Harvey (Mendeley)
Dan Harvey is the lead engineer of the datamining team at Mendeley, who are working on the research and development of a range of information retrieval and machine learning projects. He is also architecting Mendeley's systems to support these projects at the web-scale.
Outline: We are using hbase for a wide range of data mining projects at Mendeley. I will talk about why we chose hbase over other solutions when we moved from mysql, how we went about migrating our systems over, and the problems we faced along the way. I will also discuss where HBase fits in today along side other open source technologies at Mendeley.
4:30pm: Using HFile outside HBase by Marc de Palol (Last.fm)
Marc is currently a Data Engineer at Last.fm, solving data intensive problems using Hadoop. Previously he worked in semantics, distributed systems and grid technologies in the Barcelona Supercomputing Center.
Outline: HFile is the file format used to store data inside HBase, basically a block-indexed format to store sorted key-value pairs. These properties, plus the ability to create HFiles in hadoop jobs, make it very useful to serve/store data even outside HBase itself. This talk explains an approach to build a data server that uses HFiles as storage format.
5:00pm: food and drinks