000 01973nam a22002177a 4500
008 220816b2020 |||||||| |||| 00| 0 eng d
020 _a9789352139934
082 _a004.6782 AUW-G
100 _aAuwera, Geraldine A. van der
245 _aGenomics in the cloud : using docker, GATK, and WDL in terra /
_cGeraldine A. Van der Auwera and Brian D. O'Connor
260 _aBeijing
_bO'Reilly and SPD
_c2020
300 _a467 p.
365 _aINR
_b1750.00
500 _aData in the genomics field is booming. In just a few years, organizations such as the National Institutes of Health (NIH) will host 50+ petabytes—or over 50 million gigabytes—of genomic data, and they’re turning to cloud infrastructure to make that data available to the research community. How do you adapt analysis tools and protocols to access and analyze that volume of data in the cloud? With this practical book, researchers will learn how to work with genomics algorithms using open source tools including the Genome Analysis Toolkit (GATK), Docker, WDL, and Terra. Geraldine Van der Auwera, longtime custodian of the GATK user community, and Brian O’Connor of the UC Santa Cruz Genomics Institute, guide you through the process. You’ll learn by working with real data and genomics algorithms from the field. This book covers: Essential genomics and computing technology background Basic cloud computing operations Getting started with GATK, plus three major GATK Best Practices pipelines Automating analysis with scripted workflows using WDL and Cromwell Scaling up workflow execution in the cloud, including parallelization and cost optimization Interactive analysis in the cloud using Jupyter notebooks Secure collaboration and computational reproducibility using Terra.
650 _aGenomics
650 _aCloud computing
650 _aGenomics--Data processing
650 _aBig data
650 _aSPARK (Electronic resource)
700 _aO'Connor, Brian D.
999 _c79993
_d79993