Trial & Error
Progress in biodiversity genomics rarely happens in a straight line. In this section, EBP contributors reflect on the experimentation, troubleshooting, and persistence required to sequence life at scale. From failed libraries and difficult samples to unexpected technical breakthroughs, these stories highlight how iteration, collaboration, and continual refinement are driving the field forward.
Which features — size, repeats, polyploidy, degradation, preservation — create the biggest challenges?
Erna King is collecting intertidal mud from the Blackwater Estuary in Essex. Nematodes and other meiofauna are isolated from the samples, and single individuals identified, imaged as morphological vouchers, and frozen ahead of genome and transcriptome generation using the Picogram input Multimodal Sequencing (PiMmS) technique.
Mark Blaxter: As Head of the Tree of Life programme at the Wellcome Sanger Institute, where we are sequencing all of biodiversity at scale, I am not supposed to have any favourites. But, having worked on nematodes for nearly all of my career, I must admit to wanting to be able to genome sequence the whole of the phylum Nematoda. Some species are big (like many parasites) and some species can be cultured and inbred, but most nematodes are about 1 mm long and have 1,000 cells — and only 100 pg or so of DNA.
We have been working hard to generate genomes from single nematodes. Using the Picogramme-input, Multi-modal Sequencing (PiMmS) method developed by Chris Laumer (in my lab, now at the Natural History Museum, London), Erna King at Sanger is able to generate high-contiguity contig assemblies from tiny marine nematodes, and has over 100 pretty good genomes. The challenge now is to get Hi-C methods to work at similarly small scales. Erna can get down to 10 nematodes (sometimes), so we are close.
However, it’s not just nematodes that are small: the majority of species on this planet are small, and giant species like mice, trees, and dragonflies are the exception. PiMmS, and related methods using phi29 polymerase amplification, promise to unlock the genomes of all of diversity.
How has the push toward telomere-to-telomere genomes changed what we consider “complete”?
Giulio Formenti: The push toward telomere-to-telomere (T2T) assemblies has fundamentally redefined what we mean by a “complete” genome. Until recently, many chromosome-level references were considered finished despite containing unresolved gaps, collapsed repeats, and missing centromeric or subtelomeric regions. T2T efforts have shown that these omitted regions often contain important biology, including genes, regulatory elements, structural variants, and key chromosomal features. In our recent T2T zebra finch assembly, for example, completing the genome added nearly 90 million base pairs of previously missing sequence and enabled the first sequence-level characterization of avian centromeres in this species and of a large amplicon gene array on chrZ. “Complete” no longer means simply scaffolded into chromosomes—it increasingly means every chromosome is resolved end-to-end, with all major repetitive and structurally complex regions represented.
Giulio Formenti is a Research Assistant Professor at The Rockefeller University, Co-Director and Bioinformatics Lead of the Vertebrate Genome Laboratory, and Chair of the Assembly Group for the Vertebrate Genomes Project (VGP).
Mark Blaxter
Which species has most surprised you by how difficult it was to work with, and why?
Mark Blaxter. In the first days of the Tree of Life programme at Sanger, we collected and froze specimens of some very common land snails in the UK: the banded grove and field snails Cepaea hortensis and Cepaea nemoralis. These banded snails have been the subject of genetic and ecological research for a century, and I hoped that one of the first fruits of our genomics efforts would be reference genomes that would allow snail colour pattern researchers to finally solve the genetic riddle of how the banding patterns are controlled. Fast forward five years and finally we released reference genomes…
Why was it so hard? It turned out to be very difficult to extract long DNA that would sequence well with either PacBio or ONT technologies: umpteen cells were run with tiny, tiny data yields and ever more frustrated lab teams and Cepaea collaborators. Extensive development of extraction methods to solve the Cepaea problem now means we are confident in being able to generate sequenceable DNA from any snail…