Actions and Detail Panel
SCGPM Seminar: Sasha Zaranek & Jonathan Sheffi, Curoverse
Thu, March 10, 2016, 2:30 PM – 3:30 PM PST
Arvados: A Free Platform for Big Data Science
Sasha Wait Zaranek & Jonathan Steffi, Co-founders, Curoverse
Thu, March 10th
2:30 - 3:30 PM
Abstract: This talk will introduce the Arvados (http://arvados.org) platform for data science. Arvados is a software system for managing compute clusters built around a scale-out content-addressed distributed file system (Arvados Keep) for storage, a cluster job queuing system designed for reproducibility (Arvados Crunch), and a user and group permission system for controlling and sharing access to those resources. Arvados provides web-based and command line tools for transferring, managing, sharing, and computing on very large data sets.
In working with a diverse set of researchers, physicians, and patients that are all examining sequencing data, we have identified a need for a consistent naming scheme for parts of the genome. As an application within the Arvados platform, we invented tiling – a technique that divides the genome into about 10 million overlapping, variable-length sequences, or “tiles”, each with a unique 24-base tag at each end. We use examples from public data to show that tiling supports simple and consistent names, annotation, queries, machine learning, and clinical screening. We support tiling with Arvados Lightning, software which will scale to millions of genomes in a few racks of off-the-shelf hardware.
Bios: Alexander (Sasha) Wait Zaranek, PhD is co-founder and Chief Scientist at Curoverse, a venture-backed company focused on building a free and open-source platform for storing, analyzing and sharing biomedical data. Sasha works on open technologies that are part of the revolution that reduced human DNA sequencing costs by a million-fold since the completion of the Human Genome Project. A current research focus is the development of clinical-quality applications for processing massive data sets spanning millions of individuals across collaborating organizations, eventually encompassing exabytes of data. His contributions have led to highly cited publications in Science, Nature, the Lancet and other leading scientific journals. Sasha is also a co-founder and Director of Informatics at the Harvard Personal Genome Project.
Jonathan Sheffi is co-founder and leader of customer & business development at Curoverse, a federated data service for genomics & health. Before cofounding Curoverse, he spent several years in the biotechnology industry, including roles with Novartis Diagnostics, Amgen, and Accenture. Jonathan holds an MBA from Harvard Business School, an MEng focused in computational molecular biology from MIT, and undergraduate degrees in mathematics & computer science, also from MIT.
ABOUT THE SCGPM: The Stanford Center for Genomics and Personalized Medicine (SCGPM) seeks to advance genomic technology so that someday both genetic and molecular profiling will become powerful and routine tools for predicting disease risk and monitoring and treating a wide range of pathologies. Towards this mission, the SCGPM serves to centralize and develop collaborative intellectual and technological resources that promote genomic research and analysis, predict drug response, educate physicians, and examine the ethics of personalized medicine. This includes large basic science projects such as ENCODE that decipher the human genome as well as clinical research projects such as the sequencing of cancer genomes and individuals with inherited diseases. Through these efforts, the Center aims to bring genomics to the clinic.
For more information about the SCGPM, go to http://scgpm.stanford.edu.
The SCGPM supports the Genetics Bioinformatics Service Center, a SoM core facility for genomics research that provides a secure on-premise computing infrastructure, Google Cloud gateway, and bioinformatics consulting. The facilities are available to all faculty members at Stanford University.
More @ http://gbsc.stanford.edu. Send inquiries to email@example.com .