WGS and WES germline / somatic analysis
Runs with illumina DNA-sequencing data, WGS or targeted sequencing e.g. WES. Aligns to the reference genome, gives QC metrics, does variant-calling and finishes with annotation.
nf-core/sarek (paper) is an analysis pipeline for WGS and targeted sequencing data e.g WES. Previously known as the Cancer Analysis Workflow (CAW), Sarek can handle regular samples or tumour/normal pairs, including relapse samples if required. Sarek was co-developed by NGI.
Sarek analysis can be divided into two different use cases: germline analysis and somatic analysis. These two use cases share the same main steps: mapping, variant calling and annotation.
nf-core/sarek
https:
Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing - https://nf-co.re/sarek
When we run analysis
We routinely run Sarek germline analysis upon request for human WGS and WES projects. For the Sarek somatic analysis, the decision to run the analysis is made on a case by case basis. If you’re interested, please get in touch with us and mention that you would like us to run this analysis.
The analysis currently works with the human reference genomes available in AWS-iGenomes (GRCh37/GRCh38). If in doubt, please ask whether we can run the pipeline for you.
Input data
Sarek can start from the unprocessed demultiplexed FastQ files from the sequencer together with a small bit of contextual data in the form of a TSV-file. For each sample, the TSV-file should denote the sex of the subject and whether the sample is tumour or normal. In most cases, this information needs to be submitted to NGI by the user.
Results
The pipeline generates BAM alignment files and variant-calling VCF files, along with numerous quality control metrics. For more information, please see the official documentation.
Last Updated: 1st February 2023