De novo sequencing

This method is useful to build novel reference genomes, which could serve as a foundation for future research. Long-read technologies such as PacBio and ONT can decipher much of the structural properties of a genome. While PacBio HiFi assemblies do not need to be polished with short reads, Nanopore data requires an extra polishing step using Illumina data, e.g. paired-end libraries or HiC. HiC adds an additional layer of information to long-read data, arranges scaffolds in chromosomes and proofreads assembly quality.

NGI now offers de novo projects as one single package. Users can send in their sample(s) and NGI will take care of the separate library preparation setups suited for your particular project. A typical setup involves an initial draft genome assembled from long sequence reads, followed by scaffolding to get longer contigs and error-correction. This is followed by annotation of the new reference genome, eg. of genes and other functional elements. We also offer DNA extraction as a service for de-novo projects if required. For more info, please refer to our recent online webinar.

Project setup
In order to know how contiguous your assembly should be, please have a look at the flowchart.

Each study setup is described in more detail below. Once you have chosen the setup suitable for your de-novo project, the arrows direct you to the type of data you need. You can read more about the different technologies NGI offers to generate the data in the technology section below.

More info about what applications, methods and bioinformatics options NGI provides can be found further down.

All new projects should first be discussed with us prior to applications. Please contact us here.

I. Simple population study

Applications

Genome-wide association studies (GWAS)
Marker-assisted breeding
Studies involving Mendelian traits, etc.

They are commonly used to:

Find and select signals (predominantly SNPs)
Study allelic frequencies between populations, limited to small blocks of genomic regions.

These assemblies result from scaffolded paired-end Illumina reads and have:

A high base quality
Very low contiguity (N50 in kb-range)
A sole possibility for algorithm-based haplotyping

Structural variation analysis is very cumbersome, and mainly short indels can be analysed. It can be problematic to predict if the observed variation is present at a single locus, or is a part of a larger genomic structure. Genomic repeats are usually collapsed, or mis-assembled; gene duplication events can be problematic to detect.

II. Complex population study

Long-read only assemblies (PacBio or Nanopore)

The latest iteration of PacBio chemistry (HiFi) does not require additional polishing.
Nanopore data should be polished by Illumina PE-reads.
Assemblies exhibit dramatic increases in contiguity - megabase-size contigs (not scaffolds), and contain fewer gaps.
Haplotypes can be phased based on long stretches of overlapping sequences.
Causative differences between alleles can be better detected due to improved contiguity.
Better handling of short to medium repeat segments.
Better detection of genome incorporation of retroviruses and transposons.
Detection and phasing of structural variants of 5-10 kb, even if they are present in multiple copies across the genome. These assemblies are invaluable for studies of complex phenotypes (e.g. polygenic, non-Mendelian traits).
Complexity posed by evolutionary events can be resolved.

Hybrid long-read and Hi-C assembly

State-of-the-art level for assembling reference genomes, recognized as standard by EBP.
Data acquired through long-read sequencing that are polished/scaffolded with, and arranged into chromosomes by HiC.
The highest possible contiguity and base quality are achieved, and assemblies with one contig per chromosome are often observed with this method.
Recommended by reference genome sequencing initiatives as it yields data with higher contiguity than for long-read assemblies alone. This is essential for studies on complex genomic regions (e.g. MHC), centromeric and telomeric repeats, and on previously unknown parts of genomes of functional interest (“genomic dark matter”).

Technologies offered at NGI

PacBio Revio
Long-read technology which enables contig assembly.
- Guidelines for sample preparation done by users
- HMW DNA extraction as a service
- Assembly strategies - contact NGI
- Sequencing and analysis costs for long-read sequencing

Oxford Nanopore (ONT) MinION and PromethION
Long-read technology which enables contig assembly.
- Guidelines for sample preparation done by users
- HMW DNA extraction as a service
- Assembly strategies - contact NGI
- Sequencing and analysis costs for long-read sequencing

Illumina for scaffolding purposes - HiC or Omni-C
HiC and Omni-C are a powerful scaffolding tools that are also useful for polishing, SV Detection, SNP Calling and Phasing.
- Sample requirements - Arima HiC or Dovetail Omni-C
- Illumina library prep and sequencing costs

Illumina for polishing purposes
Illumina short-read data is used for lower-quality scaffold assemblies, as well as for polishing long-read data.
- Sample requirements - Illumina TruSeq DNA PCR-free
- Illumina library prep and sequencing costs

Other valuable technologies for de-novo project but are currently not available at NGI:
- BioNano optical maps
- Linkage maps

Applications 3 Methods 12 Bioinformatics 4

Assembly

Methods for the initial sequencing of genomic DNA in order to build a draft genome reference.

Scaffolding

Methods to scaffold contigs together and correct genome assembly errors.

Annotation

RNA sequencing methods that can be used to annotate de-novo genomes with transcript locations.

Illumina TruSeq Stranded mRNA

RNA sequencing of mRNAs selected through poly-A enrichment.

Arima HiC

Production of high-quality proximity ligation libraries, using two restriction enzymes.

chromatin scaffolding library preparation epigenetics TADs illumina de novo

Dovetail Omni-C

A proximity-ligation protocol using a sequence-independent endonuclease, generating data for TAD identification and scaffolding.

chromatin scaffolding library preparation epigenetics TADs illumina de novo

Illumina DNA

Low cost library preparation option for gDNA based on bead-linked transposase. Only for full plates of samples.

normalization library preparation genome illumina WGS dna nextera

Illumina DNA PCR-Free

Method for shotgun DNA libraries used for whole genome sequencing and metagenomics.

PCR-free library preparation genome illumina WGS dna tagmentation

Illumina TruSeq DNA PCR-free

Gold standard method for shotgun DNA libraries used for whole genome sequencing and metagenomics.

truseq genome illumina WGS dna library preparation

Illumina TruSeq DNA Nano

Library preparation from limited input DNA, used in whole genome sequencing and metagenomics etc.

library preparation truseq genome illumina WGS dna

SMARTer ThruPLEX DNA-seq

Library preparation for DNA, ideal for preparing libraries from small amounts of input material. Works well for shotgun libraries, ChIP DNA and FFPE samples, amongst others.

library preparation genome illumina WGS dna

Nanopore cDNA sequencing

Nanopore cDNA sequencing is able to sequence entire transcripts in one go, ideal for detecting isoforms and fusions events.

assembly long-read nanopore

Nanopore DNA sequencing

Nanopore instruments can sequence very long continuous fragments of DNA. Sequencing native DNA allows detection of base modifications.

assembly long-read nanopore

Nanopore Direct RNA sequencing

Nanopore direct RNA sequencing is able to sequence entire transcripts from native RNA, opening up opportunities to detect RNA modifications.

assembly long-read nanopore

PacBio SMRT sequencing

PacBio SMRT sequencing generates reads tens of kilobases in length enabling high quality genome assembly, structural variant analysis, amplicon resequencing, full-length transcript isoform sequencing, full-length 16S rRNA sequencing and amplification free epigenetic characterization.

pacbio methylation amplicon sequel hifi clr de novo iso seq sv revio smrt assembly

De novo sequencing

I. Simple population study

II. Complex population study

Technologies offered at NGI

Assembly

Scaffolding

Annotation

Illumina TruSeq Stranded mRNA

Arima HiC

Dovetail Omni-C

Illumina DNA

Illumina DNA PCR-Free

Illumina TruSeq DNA PCR-free

Illumina TruSeq DNA Nano

SMARTer ThruPLEX DNA-seq

Nanopore cDNA sequencing

Nanopore DNA sequencing

Nanopore Direct RNA sequencing

PacBio SMRT sequencing

Genome assemblies with HiFi data

Nanopore analysis

PromethION secondary analysis

Illumina QC analysis