NCM 2021: ORG.one: a new program to promote sequencing biodiversity
Tomas Marques-Bonet (Institute of Evolutionary Biology, Spain) described that we are in ‘a very complicated situation’, as it has been stated that we are in the sixth mass extinction of species — and ‘this time, it is because of us’. The International Union for Conservation of Nature (IUCN) has declared that there are tens of thousands of species currently threatened with extinction; this is 28% of all assessed species (including both plants and animals). There is a particular threat for amphibians.
Tomas pointed out that there isn’t an easy solution to this, or the problem would have been solved by now; he is one of the ‘strong defenders’ that one solution cannot apply to all species. Action needs to be taken from multiple perspectives, including economic, social, ecological, and genetic. Genetics itself isn’t the sole solution here, but Tomas believes that geneticists have a lot to contribute when it comes to taking steps to address the problem. Understanding the genetic diversity of a species is important for identifying, for example, how healthy the population is and the presence of out- or in-breeding depression.
‘Bringing you all to my territory’, which is great ape genomics, Tomas explained that there is a ‘long road’ from when the chimp genome was first published in 2005, stressing that the combination of a good quality reference genome assembly, plenty of population data, and informative markers, are key for making informative decisions about the species.
There are now many global, complementary initiatives that are ‘taking a genomic picture’ of today’s species. ‘I am really, really ecstatic’ that ‘Nanopore would like also to contribute to that’, in the form of the ORG.one initiative (for more information on this, see the website: org.one); Tomas stated that ‘we are very blessed’ to be part of that initiative, and acknowledged the team involved in the project presented.
In the pilot phase of their project, nine species were selected that were critically endangered, including birds, mammals, and amphibians, for which there were snap-frozen tissues or cell lines stored via zoo biobanking efforts. ‘It was really fantastic’ to see what could be achieved with regards to the generation of high-quality genomes for these species in only a couple of months – from sample acquisition, to sequencing, and genome assembly. In terms of the computational steps involved in genome assembly, these involved: filtering and pre-QC, assembly with Flye (v2.8.3) (for which Tomas noted more time was required for amphibian genomes, and less was needed for bird genomes, due to genome size and complexity), polishing (Racon and Medaka), assembly evaluations (e.g. BUSCO score and continuity), and, lastly, purging haplotigs using purge_dups. In total, this assembly pipeline required around seven working days per genome (exact time being species dependent). Displaying representative read length distribution plots for samples from each of the nine species sequenced, Tomas stated that the starting material was good enough to extract high molecular-weight DNA, allowing them to obtain read length N50s of ~20–40 kbps, which is ‘really good’ for subsequent assembly. Regarding the assembled genomes themselves, the contig N50s were ~30–50 Mbps, ‘which is quite remarkable’.
Moving on, Tomas displayed sequencing data and assembly metrics for a few example species. Starting with the spider monkey, for which 44x genome depth of coverage was obtained (138.2 Gb); the resultant genome assembly had a contig N50 of 50.52 Mb, total length of 2.65 Gb, and the largest scaffold of 142.9 Mb. By integrating this data into another project that Tomas’s team are running on the population genomics of spider monkey and crested macaque species, the team will be able to use cross-species analyses to better understand aspects of population and conservation status.
Tomas next described how the nanopore whole-genome sequencing data they have produced on the blue-throated macaw could be used to improve the current short-read-based genome for this species. The nanopore-only assembly had a contig N50 of >42.5 Mb, a length of 1.14 Gb, and the largest scaffold was >121.9 Mb. By combining all the data, a ‘much better assembly’ was produced.
Lastly, Tomas focused on the most endangered amphibian in Europe — the Montseny brook newt. This newt is only found in one specific mountain region of Spain, close to where Tomas lives. There have been a lot of breeding and reintroduction efforts to expand the population numbers, and there are now ~2,000 individuals. However, human visitors continue to have a major impact on the habitat of these newts.
There is no reference genome for this newt, and very few genetic studies have been undertaken. The problem is that it has a ‘colossal genome’, which is full of repeats and redundancies that Tomas stated only long reads can help us solve. This work is in progress because, ‘as you can imagine’, assembling this genome is very complicated! A depth of coverage of ~20x has been produced with nanopore data from one individual; the contig N50 is on ‘the megabase scale’, but is expected to improve. Nonetheless, Tomas pointed out that the data as it is, is still very useful and allows us to do ‘a lot of beautiful, beautiful stuff’. For example, in collaboration with another team, they are aiming to generate data from around 200 newts throughout the area, to understand more about the genetic diversity of the Montseny newt population. The aim is that all the information will help in population restoration efforts.
Tomas stated that the data the team are producing is all online and open access, with the idea of exciting and engaging the community, and promoting further studies into these species, to prevent these species from ultimately disappearing. In the next month, Tomas plans to assemble genomes from further species.