CEHG Workshop: A roadmap to de-novo assembly of higher eukaryote genomes

Registrations are closed

Unfortunately, registration has ended. Please contact the event coordinators at stanfordcehg@stanford.edu if you have any questions.

CEHG Workshop: A roadmap to de-novo assembly of higher eukaryote genomes

By Stanford CEHG

Date and time

May 30, 2017 · 10am - May 31, 2017 · 4pm PDT

Location

Frances C. Arrillaga Alumni Center

326 Galvez Street Lane/Lyons/Lodato Room Stanford, CA 94305

Description

Visit our event page!

Whole genome-based analyses are becoming more and more important in biological research, even in evolutionary, medical, and conservation contexts. Genome assembly, usually the first step in genomic analyses, is a fast developing area of research. It can be very difficult to keep up to date with its current state or for researchers new to the field to understand. This workshop is targeted both at researchers with a more advanced understanding, as well as researchers with no background knowledge of the topic. It will function as a roadmap from designing genome sequencing projects to a “final” genome assembly, with some brief discussions on downstream analyses. Workshop facilitator Stefan Prost plans to start with basics, such as the different sequencing technologies available and how to decide on which sequencing platform and library preparation method to use. He will then outline the different steps needed to process the raw sequencing data, as well as the different assembly and assembly quality assessment methods. To make it more user-friendly, he will discuss popular tools applied in the different steps to help researchers decide which to use.

Registration is free and capped at 40 participants. Lunch is not provided, but participants are encouraged to take advantage of The Alumni Cafe next door and Arbuckle Dining Pavilion, also nearby.

Topics

  • Basics and A Priori Knowledge of the Genome to be Sequenced: To begin, the facilitator, Stefan Prost, will cover some basics and then discuss different genome characteristics that strongly influence whether a genome will be easy or difficult to sequence and assemble successfully, and where to find information on genome characteristics for different taxa.

  • Sequencing Platforms: Outline of 1st, 2nd, and 3rd generation sequencing technologies. The sequencing platforms Prost will cover in this section include Illumina (MiSeq, HiSeq, and NovaSeq), IonTorrent & IonProton, ABI Solid, PacBio, Nanopore, and Helicos.

  • Library Setup: Next, Prost will discuss the differences, pros, and cons of different Illumina library preparation methods, such as paired-end (PE), mate pair (MP), Dovetail Genomics’s Chicago and Hi-C library. He will further outline other strategies, such as BAC or fosmid-based sequencing.

  • Raw Data Processing: Includes a discussion of tools used to assess as well as improve read quality.

  • Assembly vs Mapping: This section will cover the differences between de-novo genome assembly and reference-based mapping, and when either approach is favorable over the other.

  • De-Novo Assemblers: To make the workshop more useful, Prost will outline the different popular assembly tools (for assembly of large genomes), and briefly discuss the underlying algorithms. By doing so, he will also explain terms commonly used in genome assembly, such as "kmer."

  • Assembly Quality Assessment: A critical step after assembling a genome is the quality assessment of the resulting assembly. In cases where different assemblers or different kmer sizes are used, tools are needed to decide which of the assemblies is the best.

  • Assembly Improvement: There are different tools that can be used to improve a genome sequence after the initial assembly, either by filling gap regions or finding and resolving misassembled regions. Furthermore, genome assemblies can be merged to improve quality.

  • Draft vs. Finished Assembly: A crucial decision in genomics is whether a genome assembly is good enough to address the desired research questions. Here, Prost will explain the differences between finished and draft genome assemblies, and give some guidance on deciding if further sequencing is needed or not.

  • Downstream Analyses: To conclude the workshop, Prost will briefly outline subsequent downstream processing and analyses steps, such as repeat and gene annotation, and how to get a haploid genome sequence into a diploid genome mapping.

About Stefan Prost

Stefan Prost is currently a Postdoctoral Fellow in Dmitri Petrov's lab at Stanford University. His research focuses on evolutionary genomics, genome architecture changes, and genome assembly. More precisely, he studies how genomes change in response to adaption to new environments and living conditions in a variety of taxa. He started his research at the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, working on ancient DNA analyses. He graduated with a Master’s in Microbiology and Genetics from the University of Vienna, Austria, before starting his Ph.D. at the University of Otago, Dunedin, New Zealand. After he received his degree in New Zealand, he relocated to Sweden to do a short-term postdoc on evolutionary genomics at the Swedish Natural History Museum in Stockholm, before joining Rasmus Nielsen's lab at the University of Berkeley for two years as a postdoc. He is currently setting up a large-scale comparative genomics project of Drosophila flies with Dmitri Petrov and other colleagues. Besides evolutionary genomics, he is also interested in genome assembly methods, and working with 2nd and 3rd generation sequencing technologies.

Organized by

The Stanford Center for Computational, Evolutionary and Human Genomics (CEHG) was founded in 2012 to foster interdisciplinary research at the University. A collaboration between the School of Humanities and Sciences and the School of Medicine, the Center is the intellectual home of 40 professors and over 200 postdoctoral scholars and graduate students. CEHG funding opportunities include annual fellowships and event partnerships.

Sales Ended