Actions Panel
H2O World Registration 2014
When and where
Date and time
Location
Computer History Museum 1401 N Shoreline Blvd Mountain View, CA 94043
Map and directions
How to get there
Refund Policy
Description
Day 1 Hands on training
TimeEventMore Ongoing Events 08:00 AM - 09:00 AM Registration & Breakfast 09:00 AM - 09:40 AM Introduction to Data Science, Machine Learning, Predictive Analytics Hacker’s Corner with Cliff Click 09:50 AM - 10:10 AM Hands-On : H2O on the Web UI Hack Spark at the Sparkling Water Falls with Michal Malohlava 10:10 AM - 10:30 AM H2O on Big Data Environments: Hadoop, Spark, and EC2 Commit Code with Chris Severs 10:30 AM - 10:45 AM Coffee Break Tutorials Session 10:45 AM - 11:15 AM Hands-On : R - Part 1 - Exploratory Data Analysis on R 11:15 AM - 12:30 PM Hands-On : R - Part 2 - Supervised Learning: Regression and Classification 12:30 PM - 01:30 PM Lunch Break 01:30 PM - 02:00 PM Hands-On : R - Part 3 - Unsupervised Learning: Clustering, Dimensionality Reduction, Anomaly Detection 02:00 PM - 03:00 PM Hands-On : Advanced Topics: Tools for real-world data science 03:00 PM - 03:15 PM Coffee & Snacks 03:15 PM - 04:15 PM Hands-On: Marketing Algorithms and usecase Demostrations 04:15 PM - 04:35 PM Demo : Tableau and Excel integration 04:35 PM - 04:55 PM Demo : Real time prediction with H2O and storm for high-velocity data streams Developer’s Section 04:55 PM - 05:25 PM Sparkling Water 05:25 PM - 05:55 PM Contributing to Open Source H2O (h2o-dev, droplet)
Day 2
TimeEvent 08:00 AM - 09:00 AM Breakfast and Registration Intro, Welcome and Keynote, Sri Ambati, CEO & Co-Founder, H2O.ai Demo and Road Ahead - Team Community Awards, presented by Jishnu Bhattacharjee Values and Art of Scale in Business, Michael Marks / Paul Erdos Main Stage / / John Tukey Memorial Stage / Gradient Boosting Machine, Trevor Hastie, Stanford Brief reminiscences of John Tukey, John Chambers, Founder of S and R-core member R and ROI for Big Data, Nachum Shacham, PayPal Unsupervised Learning with H2O, Alex Tellez, Robert Half Predictive Model Factory, Lou Carvalheira, Cisco Ensembles in R, Erin LeDell 12:30 PM - 01:00 PM Lunch & Book Signing by Trevor Hastie Conversion Estimation in Display Advertising, Hassan Namarvar, ShareThis Krylov - H2O On-demand, Chris Severs, eBay Fraud Detection Using H2O’s Deep Learning, Venkatesh Ramanathan, Paypal Bayesian Networks with R and Hadoop, Ofer Mendelevitch MLlib and Spark, Sandy Ryza A Brief, Opinionated History of the API, Josh Bloch, Lord of the apis,http://en.wikipedia.org/wiki/Joshua_Bloch Data.Table and R in Insurance, Matt Dowle 03:00 PM - 07:00 PM Panels Macro and Micro Trends in Big Data, Hadoop and Open Source, Mitch Ferguson, Hortonworks; Jai Ranganathan, Cloudera; Sandy Ryza, Spark; Competitive Data Science: Kaggle, KDD and data sports, Mark Landry, Guocong Song, Jose Guerrero, Arno Candel Practical Data Science Panel - Prasanta Behera, Nachum Shacham, Chris Pouliot, Vinod Iyengar, Lou Carvalheira 07:00 PM - 08:00 PM Food Truck & Cocktail Party, Book Signing by Josh Bloch
Sri Satish Ambati
State of the Union of H2O (Day 2, 9:00 AM)
Sri is co-founder and ceo of 0xdata (@hexadata)
Alex Tellez
Lead Machine Learning Scientist at Robert Half International
Alex Tellez is currently a Lead Machine Learning Scientist at Robert Half International. Alex, along with this teammates, research and apply cutting-edge machine learning approaches to power the recruiting / HR industry. Alex’s research interest is in large-scale deep learning approaches utilizing stacked-autoencoders (SdA) and Deep Belief Networks (DBN). When not neck-deep in code, Alex enjoys riding and racing bicycles.
-Unsupervised Learning Algorithm Analysis-
Unlike supervised learning which requires labeled data, unsupervised learning attempts to find the hidden structure ofunlabeled data. In this discussion we will cover two very common unsupervised learning algorithms: Principal Component Analysis and K-Means clustering and showcase their application using real examples. Best of all, the analysis can all be run using H20!
Chris Pouliot
VP of Data Science Lyft
Chris Pouliot is a real life rocket scientist, who has also spun astronauts until they were motion sick, split atoms to make an aircraft carrier go fast, provided insightful analysis that led to Google to change their top ad color, and helped Netflix determine what movies and TV shows to buy, and how much they should pay. He helped advise many Cowboy portfolio companies on analytics and data infrastructure and is now the VP of Data Science at Lyft.
Chris Severs, Ph.D.
Applied Researcher/Software Engineer at eBay
Chris works as an Applied Researcher/Software Engineer at eBay. He uses Hadoop daily to run jobs on large eBay data sets. He is an official contributor on Twitter's Scalding project and author of the scalding-avro module for reading Avro files. Chris passion for Hadoop, scalding and Scala has been shared in and outside eBay in numerous talks. Prior to joining eBay, Chris was a postdoctoral researcher in mathematics at Reykjavk University in Iceland and at The Mathematical Sciences Research Institute in Berkeley.
Cliff Click
CTO and Co-Founder of 0xdata
Cliff Click is the CTO and Co-Founder of 0xdata, a firm dedicated to creating a new way to think about web-scale data storage and real-time analytics. Cliff wrote his first compiler when he was 15 (Pascal to TRS Z-80!), although my most famous compiler is the HotSpot Server Compiler (the Sea of Nodes IR). I helped Azul Systems build an 864 core pure-Java mainframe that keeps GC pauses on 500Gb heaps to under 10ms, and worked on all aspects of that JVM. Before that Cliff worked on HotSpot at Sun Microsystems, and am at least partially responsible for bringing Java into the mainstream. Cliff is invited to speak regularly at industry and academic conferences and has published many papers about HotSpot technology. He holds a PhD in Computer Science from Rice University and about 15 patents.
-Fast Analytics on Big Data
We have built an open-source platform for dealing with in-memory distributed data. We've used it to built state-of-the-art predictive modeling and analytics (e.g. GLM & Logistic Regression, GBM, Random Forest, Neural Nets, PCA to name a few) that's 1000x faster than the disk-bound alternatives, and 100x faster than R (we love R but it's tooo slow on big data!). We can run R expressions on tera-scale datasets, or munge data from Scala & Python. We're building our newest algorithms in a few weeks, start to finish, because the platform makes Big Math easy. We routinely test on 100G datasets, have customers using 1T datasets. This talk is about the platform, coding style & API that lets us seamlessly deal with datasets from 1K to 1TB without changing a line of code, lets us use clusters ranging from your laptop to 100 server clusters with many many TB of ram and hundreds of CPUs.
Principal Data Scientist at ShareThis Inc
Hassan is a Principal Data Scientist at ShareThis Inc. He works on online advertising optimization by modeling real time bidding transactional and user behavior datasets using large-scale machine learning techniques.
Before joining to ShareThis, Hassan was a Principal Data Scientist at Shopzilla Inc (now Connexity) where he significantly improved Shopzilla comparison shopping search engine relevancy and revenue.
Prior to that, Hassan was a Data Mining Engineer at Amazon.com where he focused on web analytics, development of parallel processing framework, graph analysis, recommendation systems, NLP, IR and automated website performance monitoring system.
Hassan holds M.Sc. and Ph.D. in Biomedical Engineering from University of Southern California. At USC, he developed large-scale dynamic synapse neural networks and applied them in speech recognition. He has authored more than 10 papers in journals and international conferences.
-Conversion Estimation in Display Advertising-
In online display advertising, the ultimate goal is to provide a best and relevant ad to an online user so that to influence him/her to take an action such as purchasing a product or signing up for a service. This requires estimating the probability of conversion for a given user, content and advertiser. Conversion estimation is extremely challenging task since conversion events are rare events and data dimension is huge. In this presentation, I will describe how, at ShareThis, we tackle conversion estimation problem. More specifically, we build CPA models by leveraging ShareThis social media and Ad exchange datasets and applying the state-of-the-art machine learning algorithms such as GLM, GBM, Random Forest and Deep Learning provided by the H2O platform. I will illustrate some results from real advertising campaigns to show the effectiveness of our approach.
Jishnu Bhattacharjee
Managing Director at Nexus Venture Partners
Jishnu is Managing Director at Nexus Venture Partners, a leading venture capital firm in India and US, investing in innovative early to early-growth stage companies. Jishnu brings to Nexus several years of operating and investing expertise in hi-technology start-ups. He has invested in technology infrastructure, mobile, internet and technology enabled services companies and is interested in a wide range of start-ups. Jishnu currently serves on the boards of Druva, Biz2credit, Kaltura, 0xdata, Elastic Box, Unmetric, Helpshift, Vdopia, Blueshift, and Vnera. His investments also include Cloud.com (acquired by Citrix) and Gluster (acquired by Red Hat).
Jishnu is an MBA from the Stanford Graduate School of Business and holds a masters and undergraduate from Georgia Tech and IIT Kharagpur in electrical engineering and is an inventor on more than a dozen US patents.
Josh Bloch
API Engineer
Dr. Joshua Bloch is an expert on API design, with over a quarter century of experience. He was Chief Java Architect at Google and Distinguished Engineer at Sun Microsystems. He led the design and implementation of numerous Java APIs and language features. He is the author of several books, including the bestselling, Jolt Award-winning Effective Java (Addison-Wesley, 2001; Second Edition, 2008). He holds a Ph.D. in Computer Science from Carnegie Mellon and a B.S. from Columbia.
Lou Carvalheira
Data Scientist at Cisco
Lou works as a senior Data Scientist at Cisco, where he is responsible for predictive modeling in the Marketing organization. He creates and supports the deployment of customer valuation and propensity models for the Marketing and Sales processes. Before joining Cisco, Lou worked as the CRM and BI practice leader for the IBM Services Group in Latin America. He also spent many years with IBM Canada where he specialized in systems for the planning and deployment of airline resources. He holds graduate degrees in Business from McGill University and in Electrical Engineering from Universidade de Sao Paulo, where he focused on Knowledge Engineering and Artificial Intelligence.
-A Predictive Model Factory ?Picks Up Steam - H2O and Cisco’s Propensity to Buy Factory-
Deep Learning has shown superior performance in the areas of image processing, object recognition and text processing. In this talk, I will present how H2O’s Deep Learning can help with payment fraud detection. I will present results from experiments conducted on a very large data set containing over 100 million examples and 1000s of features. I will also explore several advanced features implemented in H2O such as adaptive learning rate and dropout regularization and their impact on runtime and predictive performance. Cisco has a large, semi-automated model “factory” that has regularly produced, evaluated and deployed more than 60,000 distinct predictive models. These models determine the likelihood that any of the 160 million companies in Cisco’s databases will buy each of the most important technologies that Cisco sells. But the factory needed to expand and handle new products and services, new markets Cisco serves and the ever increasing reliance on advanced analytics to drive marketing and selling efforts. To answer to those challenges Cisco has re-engineered its model factory around H2O and its powerful in-memory distributed computing algorithms. This talk speaks about that effort to modernize the predictive models “production line”, and its benefits in terms development speed and prediction accuracy.
Michal Malohlava
Data Scientist H2O
Michal is a geek, developer, Java, Linux, programming languages enthusiast developing software for over 10 years. He obtained PhD from the Charles University in Prague in 2012 and post-doc at Purdue University. During his studies he was interested in construction of not only distributed but also embedded and real-time component-based systems using model-driven methods and domain-specific languages. He participated in design and development of various systems including SOFA and Fractal component systems or jPapabench control system.
Michael Marks
Founding Partner Riverwood Capital
Michael Marks is a founding Partner of Riverwood Capital. Prior to establishing Riverwood, he was a Partner and Senior Advisor at Kohlberg Kravis Roberts & Co. in 2006 and 2007. Before KKR, he spent 13 years as CEO of Flextronics International Ltd. and built the company into one of the largest technology companies in the world. As its CEO, Michael led Flextronics as it increased annualized revenues from $93 million to approximately $16 billion, while establishing operations in over 35 countries and integrating over 100 acquisitions. As Chairman, he helped Flextronics grow to annualized revenues of $36 billion. Electronic Business Magazine named Michael one of the Top Ten Most Influential Executives in Silicon Valley History and CEO of the Year for 2003. Michael earned an MBA from Harvard Business School and a BA and MA from Oberlin College in Oberlin, Ohio. He is a director of SanDisk Corporation (Chairman), Schlumberger Limited, Aptina, GoPro, and iFLY. In addition, Michael serves on the Board of The V Foundation for Cancer Research (non-profit), and as a Trustee of the Juilliard School.
Principal Data Scientist at PayPal
Nachum Shacham is a Principal Data Scientist at PayPal where he is working on modeling and extracting business value from large transactional, behavioral, and system performance datasets. Before, he was with eBay, analyzing performance of large data platforms.
Prior, he was with Sri, leading research in internet technologies, generation of wireless internet and real-time voice and video communications over mobile networks.
As co-founded CTO of Metreo, he developed models for B2B pricing and subsequently created revenue models for online display and search advertising.
Nachum holds BScEE & MScEE from the Technion, and PHD in EECS from UC Berkeley. Dr. Shacham is a Fellow of the IEEE.
Trevor Hastie
Gradient Boosting (Day 1, 10:00 AM)
Trevor Hastie was born in South Africa in 1953. He received his university education from Rhodes University, South Africa (BS) University of Cape Town (MS), and Stanford University (Ph.D Statistics 1984).
His first employment was with the South African Medical Research Council in 1977, during which time he earned his MS from UCT. In 1979 he spent a year interning at the London School of Hygiene and Tropical Medicine, the Johnson Space Center in Houston Texas, and the Biomath Department at Oxford University. He joined the Ph.D program at Stanford University in 1980. After graduating from Stanford in 1984, he returned to South Africa for a year with his earlier employer SA Medical Research Council. He returned to the USA in March 1986 and joined the statistics and data analysis research group at what was then AT&T Bell Laboratories in Murray Hill, New Jersey. After eight years at Bell Labs, he returned to Stanford University in 1994 as Professor in Statistics and Biostatistics. In 2013 he was named theJohn A. Overdeck Professor of Mathematical Sciences.
His main research contributions have been in applied statistics, and he has written three books in this area: "Generalized Additive Models" (with R. Tibshirani, Chapman and Hall, 1991), "Elements of Statistical Learning" (with R. Tibshirani and J. Friedman, Springer 2001; second edition 2009) and "An Introduction to Statistical Learning, with Applications in R" (with G. James, D. Witten and R. Tibshirani, Springer 2013). He has also made contributions in statistical computing, co-editing (with J. Chambers) a large software library of modeling tools in the S language (Statistical Models in S, Wadsworth, 1992), which form the basis for much of the statistical modeling in R. His current research focuses on applied statistical modeling andprediction problems in biology and genomics, medicine and industry.
Venkatesh Ramanathan
Data Scientist PayPal
Venkatesh is a senior data scientist at PayPal where he is working on building state-of-the-art tools for payment fraud detection. He has over 20+ years experience in designing, developing and leading teams to build scalable server side software. In addition to being an expert in big-data technologies, Venkatesh holds a Ph.D. degree in Computer Science with specialization in Machine Learning and Natural Language Processing (NLP) and had worked on various problems in the areas of Anti-Spam, Phishing Detection, and Face Recognition. Venkatesh is a senior data scientist at PayPal where he is working on building state-of-the-art tools for payment fraud detection. He has over 20+ years experience in designing, developing and leading teams to build scalable server side software. In addition to being an expert in big-data technologies, Venkatesh holds a Ph.D. degree in Computer Science with specialization in Machine Learning and Natural Language Processing (NLP) and had worked on various problems in the areas of Anti-Spam, Phishing Detection, and Face Recognition.
-Fraud Detection Using H20’s Deep Learning-
Deep Learning has shown superior performance in the areas of image processing, object recognition and text processing. In this talk, I will present how H2O’s Deep Learning can help with payment fraud detection. I will present results from experiments conducted on a very large data set containing over 100 million examples and 1000s of features. I will also explore several advanced features implemented in H2O such as adaptive learning rate and dropout regularization and their impact on runtime and predictive performance.