The Genome Reference Consortium (GRC) utilizes an assembly model for the human (and mouse) reference genome assemblies that allows for the representation of different sequence paths for loci where allelic diversity is needed (PLoS Biol. 2011 Jul:9(7):e1001091). In the ideal case, these alternate paths would only be present in regions where haplotypes are not easily reconstructed from a reference haplotype using a set of defined edits. In practice, however, some alternate alleles are added becuase they add sequence not represented in the primary assembly that will likely improve the utility of the assembly as an alignment substrate.
While this model may better represent population diversity, most commonly used analysis tools were not developed to use this type of assembly model. The focus of this workshop is to bring together a range of software developers to discuss approaches for utilizing the full assembly in a diverse range of analysis tools (from alignment to variant calling). Its goal is the development of practical ideas that will allow adoption of the full reference assembly in the short term, as well as longer term goals for the types of data structures and tools needed to analyze a eukaryotic pan-reference genome.
- Alignment/Mapping tools for using the full assembly: distinguishing allelic duplication from paralogous duplication.
- Representing alignment data in BAM files
- Variant calling
- Representing variant calls in VCF (or other formats)
- Reporting results to users in biological friendly ways
- Relationship to parallel interests in the Global Alliance for Genomics and Health (GA4GH) Data Working Group
- Chen-Shan (Jason) Chin (Pacific Biosciences)
- Aaron Quinlan (University of Virginia)
- Michael Schatz (Cold Spring Harbor Laboratory)
- Gabor Marth (USTAR Center for Genetic Discovery)
- Bronwen Aken (EBI)
- Paul Kitts (NCBI)
- Valerie Schneider (NCBI)
Workshop sponsored by Personalis.
When & Where
Genome Reference Consortium
The Genome Reference Consortium is the group responsible for the update and maintenance of the human, mouse and zebrafish reference genome assemblies. The GRC works to create assemblies that better represent population diversity and provide more robust substrates for genome analysis. The GRC is comprised of: The National Center for Biotechnology Information (NCBI), The Wellcome Trust Sanger Insititute (WTSI), The Genome Institute at Washington University (TGI) and the European Molecular Biology Laboratory, European Bioinformatics Institute (EBI).