Tandem repeats (TRs) are a type of structural variant; TR expansions are important genetic aberrations that are associated with a variety of neurological diseases. The study of such diseases requires an accurate size determination of the expanded repeat, which has proven to be challenging with traditional analysis methods, such as short-read sequencing. Long nanopore sequencing reads can span repeat expansions end to end in single reads, without the need for PCR, enabling unambiguous size determination.
What are repeat expansions?
Tandem repeats (TRs) in DNA are contiguously repeated units of DNA, with the units being 1-6 bps (STRs), or > 6 bps (VNTRs) in length (see Figure 1 and Table 1). Approximately 3% of the human genome is occupied by TRs, with around 500,000 mapped to the human genome. Moreover, TRs are highly mutable with a notable propensity to expand. It is these expanded repeats that are implicated in neurological diseases. Yet, despite their well-documented contribution to genetic variation, TRs remain poorly understood, and their impact on phenotype and disease is likely underestimated. This is largely down to the repetitive nature, high GC content, and length of these expansions, which has rendered them refractory to PCR amplification — a typical step prior to short-read sequencing. On top of this, repeat expansions regularly exceed 10 kb in length, and many cannot be spanned by short reads — a major challenge for their accurate computational resolution. For these reasons, repeat expansions are precluded from base-level resolution by most technologies.
Comprehensive repeat expansion analysis, including direct detection of modified bases
With Nanopore technology, there is no limit to read length: single reads frequently reach hundreds of kilobases in length, with a current record of over 4 Mb. This means that even the largest of repeat expansions can be sequenced end to end in single reads, enabling unambiguous determination of the repeat length and voiding the need for assembly, simplifying downstream computational analysis. Amplification is not required, eliminating PCR bias, and enabling repeat expansion detection across the genome, irrespective of GC content/low complexity regions.
Repeat expansion loci have been shown to have an altered methylation status, which can change the disease phenotype. To this end, characterisation of the methylation status in expansion loci, and untangling its effects on different disease phenotypes will be an important question to further examine. Nanopore sequencing does not require amplification, allowing the direct detection of base modifications (see Figure 2) alongside the nucleotide sequence for comprehensive repeat expansion interrogation.
Investigating repeat expansions in dementia using PromethION
‘We show that long-read sequencing with a single Oxford Nanopore Technologies PromethION flow cell per individual achieves 30× human genome coverage and enables accurate assessment of tandem repeats including the 10,000-bp Alzheimer’s disease-associated ABCA7 VNTR.’
De Roeck et al. demonstrated the utility of nanopore sequencing for accurately resolving repeat expansions. Using the PromethION for whole genome sequencing, the group were able to accurately characterise many tandem repeats, including the 10,000 bp Alzheimer’s disease associated ABCA7 VNTR. In addition, they developed a novel squiggle-based algorithm, which uses nanopore raw squiggle data to robustly determine repeat sequence composition. The possibility to resolve nucleotide composition offers the prospect of exploring interruption motifs, which are known to act as disease modifiers in other repeat disorders.
Cas9 enrichment and nanopore sequencing for repeat expansion resolution
‘We demonstrate the precise quantification of repeat numbers in conjunction with the determination of CpG methylation states in the repeat expansion.’
Cas9 enrichment and nanopore sequencing enables a significantly increased coverage of target sequences, without the need for PCR amplification. Therefore, it is possible to enrich for a target in genomic regions that are impervious to PCR, and has the added benefit of preserving epigenetic modifications thereby enabling simultaneous detection of methylation. Giesselman et al. demonstrated accurate determination of both repeat length and methylation status of the C9ORF72 locus.
Repeat expansion detection: in action
For high-throughput whole genome sequencing with repeat expansion detection and characterisation we recommend the following: