Trial & Error

Progress in biodiversity genomics rarely happens in a straight line. In this section, EBP contributors reflect on the experimentation, troubleshooting, and persistence required to sequence life at scale. From failed libraries and difficult samples to unexpected technical breakthroughs, these stories highlight how iteration, collaboration, and continual refinement are driving the field forward.


What have you learned from working on hard genomes?

Kerstin Howe in her office (with her favourite painting of lichens by Samantha Clark and a Hi-C map of a tetraploid assembly on her screen).

Kerstin Howe: I have learned about the importance of patience and collaboration.

In the Tree of Life programme at the Sanger Institute, we are working on many Biodiversity Genomics projects, for instance the Darwin Tree of Life programme, Project Psyche, AEGIS, and the Aquatic Symbiosis Genomics Project. Since we are producing genome assemblies at scale – currently up to 40 per week – we can afford the luxury of sidelining certain species or even whole clades that are recalcitrant to current methods of extraction, sequencing, or assembly and instead turn to others. It’s better to be patient, pause things and invest in R&D, rather than running the risk of using up all available material and/or driving your lab scientists and bioinformaticians insane.

Our R&D never sleeps and we work closely with others in the field, both in-house and internationally, to constantly exchange the latest advances and enable previously impossible things. This way we found out that some things we thought we’d never get sequenceable DNA out of are actually good to go now. Even ultra-low amounts of DNA are sequenceable when suitably amplified. Many people contributed to identifying the ideal enzymes and conditions for this and are still working on further optimisations. When it comes to sequencing, it turns out that no single technology rules them all, you have to pick and sometimes mix to get the best outcome. And we’re all benefitting from the constant improvements in algorithms and available software that allow us to correctly piece together what was left terribly fragmented and misordered not long ago. The EBP provides a fantastic ecosystem for swift dissemination of knowledge on the latest successes and failures so that new approaches can immediately benefit many projects out there.


How has the push toward telomere-to-telomere genomes changed what we consider “complete”?

Giulio Formenti: The push toward telomere-to-telomere (T2T) assemblies has fundamentally redefined what we mean by a “complete” genome. Until recently, many chromosome-level references were considered finished despite containing unresolved gaps, collapsed repeats, and missing centromeric or subtelomeric regions. T2T efforts have shown that these omitted regions often contain important biology, including genes, regulatory elements, structural variants, and key chromosomal features. In our recent T2T zebra finch assembly, for example, completing the genome added nearly 90 million base pairs of previously missing sequence and enabled the first sequence-level characterization of avian centromeres in this species and of a large amplicon gene array on chrZ. “Complete” no longer means simply scaffolded into chromosomes—it increasingly means every chromosome is resolved end-to-end, with all major repetitive and structurally complex regions represented.

Giulio Formenti is a Research Assistant Professor at The Rockefeller University, Co-Director and Bioinformatics Lead of the Vertebrate Genome Laboratory, and Chair of the Assembly Group for the Vertebrate Genomes Project (VGP).


Mark Blaxter

 

Which species has most surprised you by how difficult it was to work with, and why?

Mark Blaxter. In the first days of the Tree of Life programme at Sanger, we collected and froze specimens of some very common land snails in the UK: the banded grove and field snails Cepaea hortensis and Cepaea nemoralis. These banded snails have been the subject of genetic and ecological research for a century, and I hoped that one of the first fruits of our genomics efforts would be reference genomes that would allow snail colour pattern researchers to finally solve the genetic riddle of how the banding patterns are controlled. Fast forward five years and finally we released reference genomes…

Why was it so hard? It turned out to be very difficult to extract long DNA that would sequence well with either PacBio or ONT technologies: umpteen cells were run with tiny, tiny data yields and ever more frustrated lab teams and Cepaea collaborators. Extensive development of extraction methods to solve the Cepaea problem now means we are confident in being able to generate sequenceable DNA from any snail…