Seeking an optimal variant calling pipeline for medical genetics
Autori
Barbitoff A., YuryPanteleeva, Alexandra
Predeus V., Alexander
Ostala autorstva
Morić, IvanaĐorđević, Valentina
Konferencijski prilog (Objavljena verzija)
,
© 2023 Institute of Molecular Genetics and Genetic Engineering, University of Belgrade
Metapodaci
Prikaz svih podataka o dokumentuApstrakt
Accurate and comprehensive variant discovery is extremely important for rare disease
diagnostics using next-generation sequencing (NGS) methods. Over the recent years, a
plethora of methods have been developed for short variant calling from NGS data, and the
most recent tools extensively use machine learning algorithms for both variant discovery
and filtering. In our study, we took an effort to systematically evaluate the performance of
different pipelines for short variant calling in the human genome.
To perform such a systematic comparison, we collected a large dataset of both “gold
standard” (provided by the Genome In A Bottle (GIAB) consortium) and in-house wholeexome
sequencing (WES) and whole-genome sequencing (WGS) datasets. (a total of 20
different datasets was used). We tested all combinations of 4 popular short read aligners
(BWA, Bowtie2, Isaac, and Novoalign) and 9 novel and well-established variant calling
and filtering methods (Freebayes, Clair3, DeepVariant, G...enome Analysis ToolKit (GATK),
Octopus, Strelka2). We also used several different tools for preprocessing of short reads.
Our analysis showed negligible effects of adapter trimming on the accuracy of short
variant calling. Among read aligners, Bowtie2 performed significantly worse than other
tools, suggesting it should not be used for medical variant calling. For pipelines based
on BWA, Isaac, and Novoalign, the accuracy of variant discovery mostly depended on
the variant caller and not the read aligner. DeepVariant consistently showed the best
performance and the greatest robustness compared to all other tested variant callers. We
have also compared the consistency of variant calls in GIAB and non-GIAB samples. With
few important caveats, best-performing tools have shown little evidence of overfitting.
Taken together, our study showed that modern strategies for NGS data analysis allow for
high accuracy of genetic variant discovery within coding regions of the human genome.
However, there is still a need for development of new library preparation and variant
calling methods to enhance variant discovery in the challenging regions of the human
genome.
Ključne reči:
pipeline / variant calling / human genetics / medical geneticsIzvor:
4th Belgrade Bioinformatics Conference, 2023, 4, 98-98Izdavač:
- Belgrade : Institute of molecular genetics and genetic engineering
Finansiranje / projekti:
- We thank JetBrains Ltd. for providing financial support and computing resources for the project.
Napomena:
- Book of abstract: 4th Belgrade Bioinformatics Conference, June 19-23, 2023
Kolekcije
Institucija/grupa
Institut za molekularnu genetiku i genetičko inženjerstvoTY - CONF AU - Barbitoff A., Yury AU - Panteleeva, Alexandra AU - Predeus V., Alexander PY - 2023 UR - https://belbi.bg.ac.rs/ UR - https://imagine.imgge.bg.ac.rs/handle/123456789/2043 AB - Accurate and comprehensive variant discovery is extremely important for rare disease diagnostics using next-generation sequencing (NGS) methods. Over the recent years, a plethora of methods have been developed for short variant calling from NGS data, and the most recent tools extensively use machine learning algorithms for both variant discovery and filtering. In our study, we took an effort to systematically evaluate the performance of different pipelines for short variant calling in the human genome. To perform such a systematic comparison, we collected a large dataset of both “gold standard” (provided by the Genome In A Bottle (GIAB) consortium) and in-house wholeexome sequencing (WES) and whole-genome sequencing (WGS) datasets. (a total of 20 different datasets was used). We tested all combinations of 4 popular short read aligners (BWA, Bowtie2, Isaac, and Novoalign) and 9 novel and well-established variant calling and filtering methods (Freebayes, Clair3, DeepVariant, Genome Analysis ToolKit (GATK), Octopus, Strelka2). We also used several different tools for preprocessing of short reads. Our analysis showed negligible effects of adapter trimming on the accuracy of short variant calling. Among read aligners, Bowtie2 performed significantly worse than other tools, suggesting it should not be used for medical variant calling. For pipelines based on BWA, Isaac, and Novoalign, the accuracy of variant discovery mostly depended on the variant caller and not the read aligner. DeepVariant consistently showed the best performance and the greatest robustness compared to all other tested variant callers. We have also compared the consistency of variant calls in GIAB and non-GIAB samples. With few important caveats, best-performing tools have shown little evidence of overfitting. Taken together, our study showed that modern strategies for NGS data analysis allow for high accuracy of genetic variant discovery within coding regions of the human genome. However, there is still a need for development of new library preparation and variant calling methods to enhance variant discovery in the challenging regions of the human genome. PB - Belgrade : Institute of molecular genetics and genetic engineering C3 - 4th Belgrade Bioinformatics Conference T1 - Seeking an optimal variant calling pipeline for medical genetics EP - 98 SP - 98 VL - 4 UR - https://hdl.handle.net/21.15107/rcub_imagine_2043 ER -
@conference{ author = "Barbitoff A., Yury and Panteleeva, Alexandra and Predeus V., Alexander", year = "2023", abstract = "Accurate and comprehensive variant discovery is extremely important for rare disease diagnostics using next-generation sequencing (NGS) methods. Over the recent years, a plethora of methods have been developed for short variant calling from NGS data, and the most recent tools extensively use machine learning algorithms for both variant discovery and filtering. In our study, we took an effort to systematically evaluate the performance of different pipelines for short variant calling in the human genome. To perform such a systematic comparison, we collected a large dataset of both “gold standard” (provided by the Genome In A Bottle (GIAB) consortium) and in-house wholeexome sequencing (WES) and whole-genome sequencing (WGS) datasets. (a total of 20 different datasets was used). We tested all combinations of 4 popular short read aligners (BWA, Bowtie2, Isaac, and Novoalign) and 9 novel and well-established variant calling and filtering methods (Freebayes, Clair3, DeepVariant, Genome Analysis ToolKit (GATK), Octopus, Strelka2). We also used several different tools for preprocessing of short reads. Our analysis showed negligible effects of adapter trimming on the accuracy of short variant calling. Among read aligners, Bowtie2 performed significantly worse than other tools, suggesting it should not be used for medical variant calling. For pipelines based on BWA, Isaac, and Novoalign, the accuracy of variant discovery mostly depended on the variant caller and not the read aligner. DeepVariant consistently showed the best performance and the greatest robustness compared to all other tested variant callers. We have also compared the consistency of variant calls in GIAB and non-GIAB samples. With few important caveats, best-performing tools have shown little evidence of overfitting. Taken together, our study showed that modern strategies for NGS data analysis allow for high accuracy of genetic variant discovery within coding regions of the human genome. However, there is still a need for development of new library preparation and variant calling methods to enhance variant discovery in the challenging regions of the human genome.", publisher = "Belgrade : Institute of molecular genetics and genetic engineering", journal = "4th Belgrade Bioinformatics Conference", title = "Seeking an optimal variant calling pipeline for medical genetics", pages = "98-98", volume = "4", url = "https://hdl.handle.net/21.15107/rcub_imagine_2043" }
Barbitoff A., Y., Panteleeva, A.,& Predeus V., A.. (2023). Seeking an optimal variant calling pipeline for medical genetics. in 4th Belgrade Bioinformatics Conference Belgrade : Institute of molecular genetics and genetic engineering., 4, 98-98. https://hdl.handle.net/21.15107/rcub_imagine_2043
Barbitoff A. Y, Panteleeva A, Predeus V. A. Seeking an optimal variant calling pipeline for medical genetics. in 4th Belgrade Bioinformatics Conference. 2023;4:98-98. https://hdl.handle.net/21.15107/rcub_imagine_2043 .
Barbitoff A., Yury, Panteleeva, Alexandra, Predeus V., Alexander, "Seeking an optimal variant calling pipeline for medical genetics" in 4th Belgrade Bioinformatics Conference, 4 (2023):98-98, https://hdl.handle.net/21.15107/rcub_imagine_2043 .