CONSORTIA OVERVIEW   |   GIAB   |   RESOURCES   |   GOOGLE GROUP: GENERAL   |   GOOGLE GROUP: ANALYSIS TEAM

Best Practices for Benchmarking Germline Small Variant Calls in Human Genomes

bioRxiv pre-print
Abstract

Assessing accuracy of NGS variant calling is immensely facilitated by a robust benchmarking strategy, and tools to carry it out in a standard way. Benchmarking variant calls requires careful attention to definitions of performance metrics, sophisticated comparison approaches, and stratification by variant type and genome context. The Global Alliance for Genomics and Health (GA4GH) Benchmarking Team has developed standardized performance metrics and tools for benchmarking germline small variant calls. This Team includes representatives from sequencing technology developers, government agencies, academic bioinformatics researchers, clinical laboratories, and commercial technology and bioinformatics developers for whom benchmarking variant calls is essential to their work. Benchmarking variant calls is a challenging problem for many reasons: Evaluating variant calls requires complex matching algorithms and standardized counting, because the same variant may be represented differently in truth and query callsets; Defining and interpreting resulting metrics such as precision (aka positive predictive value = TP/(TP+FP)) and recall (aka sensitivity = TP/(TP+FN)) requires standardization to draw robust conclusions about comparative performance for different variant calling methods; Performance of NGS methods can vary depending on variant types and genome context, and as a result understanding performance requires meaningful stratification; High-confidence variant calls and regions that can be used as "truth" to accurately identify false positives and negatives are difficult to define, and reliable calls for the most challenging regions and variants remain out of reach. We have made significant progress on standardizing comparison methods, metric definitions and reporting, as well as developing and using truth sets. Our methods are publicly available on GitHub (https://github.com/ga4gh/benchmarking-tools) as well as in a web-based app on precisionFDA, which allows users to compare their variant calls against truth sets and to obtain a standardized report on their variant calling performance. Our methods have been piloted in the precisionFDA variant calling challenges to identify the best-in-class variant calling methods within high-confidence regions. Finally, we recommend a set of best practices for using our tools and critically evaluating the results.

https://www.biorxiv.org/content/early/2018/02/23/270157

GenomeWeb Article: Genome In A Bottle Consortium Developing New Reference Materials, Variant Call Sets

 

Julia Karow

NEW YORK (GenomeWeb) – The Genome in a Bottle Consortium has been developing new reference materials for genome sequencing and is working on a first set of high-quality structural variants for human genomes. In the meantime, laboratories have begun to adopt its first pilot genome as a standard for developing new sequencing technologies and assays.

Last month, the private-public consortium, which is spearheaded by the National Institute of Standards and Technology, released four new DNA reference materials, adding to the pilot sample it made available last year. The consortium counts about 20 members on the sample development and an equal number on the data analysis sides, including clinical laboratories, sequencing technology companies, professional organizations, and research groups at academic institutions and government agencies. While the majority of these are US-based, the group also has members from Australia, Asia, and Europe.

Read More

NIST Releases New 'Family' of Standardized Genomes

With the addition of four new reference materials (RMs) to a growing collection of “measuring sticks” for gene sequencing, the National Institute of Standards and Technology (NIST) can now provide laboratories with even more capability to accurately “map” DNA for genetic testing, medical diagnoses and future customized drug therapies. The new tools feature sequenced genes from individuals in two genetically diverse groups, Asians and Ashkenazic Jews; a father-mother-child trio set from Ashkenazic Jews; and four microbes commonly used in research.

Read More

 

Scientific Data Publication- Extensive sequencing of seven human genomes to characterize benchmark reference materials

Genome in a Bottle and collaborators have published a paper in Nature describing the extensive sequencing of seven human genomes to characterize benchmark reference materials. The paper is publicly available via the link below.

The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.

Read More

 

7th GIAB Public Workshop

 

The Genome in a Bottle Consortium is holding its 7th open, public workshop at the Li Ka Sheng Center on the Stanford University campus January 28-29, 2016.  GIAB is developing reference materials, reference data, and methods to use the materials and data to assess performance of human genome sequencing.  At this workshop, we will discuss progress developing high-confidence vari...

Pre-Print– Extensive sequencing of seven human genomes to characterize benchmark reference materials

Genome in a Bottle and collaborators have published a paper describing the extensive sequencing of seven human genomes to characterize benchmark reference materials. The paper is publicly available at the links below. Extensive sequencing of seven human genomes to characterize benchmark reference materials Justin M Zook, David Catoe, Jennifer McDaniel, Lindsay Vang, Noah...