WGS germline / somatic analysis
Runs with illumina DNA-sequencing data, WGS or targeted sequencing. Aligns to the reference genome, gives QC metrics, does variant-calling and finishes with annotation.
nf-core/sarek (paper) is an analysis pipeline for WGS and targeted sequencing data. Previously known as the Cancer Analysis Workflow (CAW), Sarek can handle regular samples or tumour/normal pairs, including relapse samples if required. Sarek was co-developed by NGI.
Sarek analysis can be divided into two different use cases: germline analysis and somatic analysis. These two use cases share the same main steps: mapping, variant calling and annotation.
When we run analysis
We routinely run Sarek germline analysis upon request for human WGS projects while a decision whether to run Sarek somatic analysis is made on a case by case basis. If you’re interested, please get in touch with us and mention that you would like us to run this analysis.
The analysis currently works with the human reference genomes available in AWS-iGenomes (GRCh37/GRCh38). If in doubt, please ask whether we can run the pipeline for you.
Sarek can start from the unprocessed demultiplexed FastQ files from the sequencer together with a small bit of contextual data in the form of a TSV-file. For each sample, the TSV-file should denote the sex of the subject and whether the sample is tumour or normal. In most cases, this information needs to be submitted to NGI by the user.
The pipeline generates BAM alignment files and variant-calling VCF files, along with numerous quality control metrics. For more information, please see the official documentation.