ADAM: Fast, Scalable Genome Analysis
Frank Nothalf, UC Berkeley
Friday, February 17th
12:00N - 1:00PM
Clark S361 - Lunch provided
Abstract: The detection and analysis of rare genomic events requires integrative analysis across large cohorts with terabytes to petabytes of genomic data. Contemporary genomic analysis tools have not been designed for this scale of data-intensive computing. This talk presents ADAM, an Apache 2 licensed library built on top of the popular Apache Spark distributed computing framework. ADAM is designed to allow genomic analyses to be seamlessly distributed across large clusters, and presents a clean API for writing parallel genomic analysis algorithms. In this talk, we’ll look at how we’ve used ADAM to achieve a 3.5× improvement in end-to-end variant calling latency and a 66% cost improvement over current toolkits, without sacrificing accuracy. We will also talk about using ADAM alongside Apache Hbase to interactively explore large variant datasets.
Bio: Frank Austin Nothaft is a PhD candidate in Computer Science at UC Berkeley. His research uses large scale commodity computing systems to process scientific data. Frank works with Professors David Patterson and Anthony Joseph in the AMPLab and the ASPIRE lab. From 2013 to 2016, Frank was supported by a NSF Graduate Research Fellowship. Frank received an MS in Computer Science from Berkeley in 2015. Prior to Berkeley, Frank was an IC Design engineer at Broadcom Corporation from 2011 until 2016, and his work focused on mixed-signal design automation. Before Broadcom, Frank received a Bachelors of Science with Honors in Electrical Engineering from Stanford University in 2011, where he was advised by Professor William J. Dally.
ABOUT THE SCGPM: The Stanford Center for Genomics and Personalized Medicine (SCGPM) seeks to advance genomic technology so that someday both genetic and molecular profiling will become powerful and routine tools for predicting disease risk and monitoring and treating a wide range of pathologies. Towards this mission, the SCGPM serves to centralize and develop collaborative intellectual and technological resources that promote genomic research and analysis, predict drug response, educate physicians, and examine the ethics of personalized medicine. This includes large basic science projects such as ENCODE that decipher the human genome as well as clinical research projects such as the sequencing of cancer genomes and individuals with inherited diseases. Through these efforts, the Center aims to bring genomics to the clinic.
For more information about the SCGPM, go to http://scgpm.stanford.edu.
The SCGPM supports the Genetics Bioinformatics Service Center, a SoM core facility for genomics research that provides a secure on-premise computing infrastructure, Google Cloud gateway, and bioinformatics consulting. The facilities are available to all faculty members at Stanford University.