Falcon assemblies with CLR or HiFi data
NGI can generate high quality assemblies using PacBio CLR or HiFi data together with FALCON and FALCON-Unzip assemblers.
FALCON and FALCON-Unzip are de novo genome assemblers for PacBio long reads. Either CLR or HiFi data can be assembly process.
FALCON is a diploid-aware assembler which follows the hierarchical genome assembly process and is optimized for large genome assemblies. FALCON produces a set of primary contigs (p-contigs) as the primary assembly and a set of associate contigs (a-contigs) which represent divergent allelic variants. Each a-contig is associated with a homologous genomic region on an p-contig.
Image: Courtesy of Pacific Biosciences of California, Inc.
FALCON-Unzip is a true diploid assembler. It takes the contigs from FALCON and phases the reads based on heterozygous SNPs identified in the initial assembly. It then produces a set of partially-phased primary contigs and fully-phased haplotigs which represent divergent haplotypes.
The recommended coverage for genome assembly based on read type:
- CLR – 30-50X unique molecular coverage per haplotye
- HiFi – 15-20X coverage for haploids or diploids
Coverage requirements scale linearly by the number of unique haplotypes. For example, a highly heterozygous diploid may require double the recommended coverage. While a homozygous tetraploid may also require double coverage, (in a case where haplotypes are identical, but homeologs are not).
The magnitude of haplotype divergence determines the structure of the resulting FALCON-Unzip assembly. Genomic regions with low heterozygosity will be assembled as a collapsed haplotype on a single primary contig. Haplotypes up to ~5% diverged will be unzipped, while highly divergent haplotypes will be assembled on different primary contigs. In the latter case, it is up to the user to identify these contigs as homologous using gene annotation or sequence alignment.
Following results are provided:
- Falcon Assembly: Full output of Falcon pipeline.
- Falcon preads (CLR only): Pre-assembled Reads, error corrected reads through the pre-assembly process.
- Falcon-Unzip results: Partially-phased primary contigs and fully-phased haplotigs.
- Polished Falcon-Unzip assembly: Highly accurate consensus sequences generated using Arrow algorithm (CLR) or Racon (HiFi).