NGI during Covid-19 outbreak

NGI is still up and running during the Covid-19 pandemic, but we are experiencing some limitations in terms of personnel and key reagents. Each NGI node is following its respective host university recommendations and will continue operation until further notice.

Read more

PacBio secondary analysis

Analysis applications provided by NGI using PacBio’s open-source SMRT Analysis software suite.

Depending on the PacBio sequencing application the following analyses are/can be performed at NGI.

Assembly (HGAP 4)

This application (Hierarchical Genome Assembly Process) is used to generate high quality de novo assemblies of genomes, using PacBio data. HGAP 4 includes pre-assembly, de novo assembly and assembly polishing steps. HGAP 4 uses Falcon for de novo assembly and Arrow for polishing.

Following results are provided:

  • Polished Assembly: The final polished assembly, in Data Set, FASTA and FASTQ formats.
  • Draft Assembly: The unpolished draft assembly.

Circular Consensus Sequencing (CCS)

This application identifies consensus sequences for single molecules.

Following results are provided:

  • CCS Statistics: Summary of CCS performance and yield.
  • FASTQ File(s), FASTA File(s), BAM File(s): Consensus sequences generated from CCS, in FASTA, FASTQ, and BAM format

CCS with Mapping

This application generates consensus sequences from single molecules, maps ccs reads to a provided reference sequence, and then identifies consensus and variants against this reference. Haploid variants and small indels, but not diploid variants, are called as a result to alignment to the reference sequence.

Following results are provided:

  • Alignments: Data Set containing alignment results.
  • Consensus Sequences: Consensus sequences generated from CCS.
  • CCS Statistics: Summary of CCS performance and yield.
  • Coverage Summary: Coverage summary for regions (bins) spanning the reference.
  • FASTQ File(s), FASTA File(s), BAM file(s): Consensus sequences generated from CCS, in FASTA, FASTQ, and BAM format.

Iso-Seq Analysis

This application characterizes full-length transcript isoforms. The analysis is performed de novo, without a reference genome. The Iso-Seq application enables analysis and functional characterization of transcript isoforms for sequencing data generated on PacBio instruments. This application generates full-length transcript isoforms, eliminating the need for computational reconstruction. The Iso-Seq application provides accurate information about alternatively spliced exons and transcriptional start and end sites.

The application includes four main steps:

  1. CCS: Build Circular Consensus Sequences (CCSs) from each sequencing ZMW.
  2. Classify: Identify and remove primers. Identify strandedness based on the 5’ and 3’ primers.
  3. Cluster (Optional): Trim off polyA tails. Perform de novo clustering and consensus calling. Output full-length consensus isoforms that are further separated into high-quality (HQ) and low-quality (LQ) based on estimated accuracies.
  4. Collapse (Optional): When a reference genome is selected, the Iso-Seq application maps HQ isoforms to the selected reference genome, and then collapses isoforms which mapped to similar genomic loci into unique isoform groups.

Following results are provided:

  • CCS FASTQ: Circular Consensus Sequences in FASTQ format.
  • Full-Length Non-Concatemer Reads: Full-length reads that have primers and polyA tails removed, in BAM format.
  • Full-Length Non-Concatemer Report: Includes strand, 5’ primer length, 3’ primer length, polyA tail length, insertion length, and primer IDs for each full-length read that has primers and polyA tail, in CSV format.
  • Low-Quality Isoforms: Isoforms with low consensus accuracy, in FASTQ and FASTA format. We recommend that you work only with High-Quality isoforms, unless there are specific reasons to analyze Low-Quality isoforms.
  • High-Quality Isoforms: Isoforms with high consensus accuracy, in FASTQ and FASTA format. This is the recommended output file to work with.
  • Cluster Report: Report of each full-length read into isoform clusters.
  • Mapped High Quality Isoforms: Alignments mapping isoforms to the reference genome, in BAM and BAI (index) formats.
  • Collapsed Filtered Isoforms GFF: Mapped, unique isoforms, in GFF format. This is the Mapping step output that is the recommended output file to work with.
  • Collapsed Filtered Isoforms FASTQ: Mapped, unique isoforms, in FASTQ format. This is the Mapping step output that is recommended output file to work with.
  • Collapsed Filtered Isoforms Groups: Report of isoforms mapped into collapsed filtered isoforms.
  • Full-length Non-Concatemer Read Assignments TXT: Report of full-length read association with collapsed filtered isoforms.
  • Collapsed Filtered Isoform Counts: Report of read count information for each collapsed filtered isoform.

Microbial Assembly

This application generates de novo assemblies of small prokaryotic genomes between 1.9-10 Mb and companion plasmids between 2 – 220 kb. In addition it includes polishing and rotation of the origin of replication for each circular contig. Facilitates assembly of larger genomes (yeast) as well.

Following results are provided:

  • Polished Assembly: The polished assembly before oriC rotation is applied, in FASTA and FASTQ formats.
  • Final Assembly: The final polished assembly with applied oriC rotation and header adjustment for NCBI submission, in FASTA format (.fsa extension).
  • Polished Contigs After oriC Rotation: Polished contigs with oriC rotation applied, before the NCBI adjustment process is applied.

Structural Variant Calling

This application can identify structural variants (Default: ≥20 bp) in a sample or set of samples relative to a reference. Variant types identified are insertions, deletions, duplications, copy number variants (CNVs), inversions, and translocations.

Following results are provided:

  • Aligned Reads (per sample): Aligned reads, in BAM format, separated by individual.
  • Index of Aligned Reads (per sample): BAM index files associated with the Aligned Reads BAM files.
  • Structural Variants: All the structural variants, in VCF format.

Base Modification Analysis

This application can identify putative sites of base modification as well as common bacterial base modifications (6mA, 4mC), and then optionally analyze the methyltransferase recognition motifs. Detection can use an in-silico control consisting of expected kinetic signals.

Following results are provided:

  • Alignments: Data Set of alignment results.
  • IPD Ratios: BigWig file containing encoded base IPD ratios.
  • Modifications: Duplicate of the modification summary file.
  • Full Kinetics Summary: HDF5 file containing per-base information.

Resequencing

This application maps length and quality-filtered reads against a reference sequence, then identifies consensus and variant sequences. It can be used for whole-genome or targeted resequencing analysis.

Following results are provided:

  • Consensus FASTQ: Consensus sequences, in FASTQ format.
  • Consensus Contigs: Consensus contigs in FASTQ format.
  • Coverage Summary: Coverage summary for regions (bins) spanning the reference.
  • Coverage and Variant Call Summary: Coverage and variant call summary for regions (bins) spanning the reference.
  • Variant Calls: List of variants from the reference, in BED, GFF or VCF format.

Long Amplicon Analysis (LAA)

This application can determine phased consensus sequences for pooled amplicon data. It allows for accurate allelic phasing and variant calling in large genomic amplicons. Supports the phasing and consensus of novel haplotypes in loci of biomedical interest, such as the HLA genes in the MHC region of the human genome. Reads are clustered into high-level groups, then each group is phased and a consensus generated for each resulting phase using the Arrow algorithm.

Following results are provided:

  • Consensus Sequences: Consensus amplicons that passed all sequence quality filters, in FASTQ and zipped-FASTQ format.
  • Chimeric/Noise Consensus Sequences: Consensus amplicons that failed one or more sequence quality filters, in FASTQ and zipped-FASTQ format.
  • Consensus Sequences Summary: Combined consensus sequences, summary information and sample map as a single ZIP file for ease of importing into third-party applications for sequence typing.

Applications
Method Status

Service

We are routinely running this method. Please visit the Order Portal to place an order.

Keywords
Compatible Methods