Technology & Infrastructure: How We Actually Do This
Solutions in motion.
Advances in technology and infrastructure are rapidly changing what’s possible in biodiversity genomics. This section highlights emerging tools, overlooked innovations, and pipeline improvements that are helping to reduce costs, increase quality, and scale genome production across diverse taxa.
Mark Blaxter, happily lost in a hillside of bracken at Beinn Eighe National Nature Reserve in northern Scotland during a Darwin Tree of Life collection trip—coordinating sampling efforts while searching regenerating Scots pine forest for beetle larvae and the right kind of beetle frass to collect nematodes.
What new technologies will most reduce cost over the next 5–10 years?
Mark Blaxter: The most important breakthrough for the success of the whole project is likely to be the validation of cold-chain-free collection methods that allow both long-read and long-range (Hi-C) sequencing from samples collected and shipped at room temperature. This will open up new vistas in collection and make it possible for many more naturalists, parataxonomists, and field staff to participate.
I have faith that we will also drive down lab costs in collaboration with technology providers and make assembly and annotation faster through the deployment of AI, but the “first mile” breakthrough will be a game changer.
What infrastructure gaps currently slow progress, and how could they be solved?
Erick Duarte: At the Vertebrate Genome Lab, we have established a well-organized infrastructure that supports a continuous production flow of reference genomes. However, the volume of data generated is growing rapidly, increasing the demand on computational resources. This challenge becomes particularly evident for large or complex genomes, which require significant memory, storage, and processing power during assembly. As a result, computational capacity can become a limiting factor.
Addressing this gap will require scaling computational infrastructure through expanded high-performance computing resources, the use of cloud-based solutions, and further optimization of pipelines to better match the growing sequencing output.
Erick Duarte (top left) and members of the Vertebrate Genomes Project team during a hike in Beacon.
Mark Blaxter samples a teasel head for spiders and earwigs in the wetlands beside the Wellcome Sanger Institute.
What’s the most underrated piece of technology in your pipeline?
Mark Blaxter: We have developed a method called PiMmS (picogramme input multimodal sequencing) that allows us to generate a good genome from the tiniest of animals and plants—single tardigrades or one leaf of a tiny moss. The first step in the process is to lyse the organism, but this can be tricky when you cannot even see the tiny worm in the ice at the bottom of the tube.
We use a powermasher—basically a tiny motorised pestle that fits into a microfuge tube—to turn the tardigrade (or whatever) and the 2–5 µL of frozen buffer into a puff of foamed ice. Simple, handheld, effective: the DNA and RNA can then be extracted, amplified, and sequenced with minimal loss.
What automation step would most reduce hands-on time or error?
Erick Duarte: One of the current bottlenecks in the production pipeline is the manual curation of genome assemblies. This process requires extensive human intervention and attention to detail to identify and resolve assembly errors, improve genome contiguity, and assign chromosomes.
To address this challenge, the Vertebrate Genome Lab is developing an AI-assisted genome curation tool. This tool is designed to support curators by automatically detecting potential errors and suggesting corrections, thereby accelerating the curation process while reducing hands-on effort and the risk of human error.
Erick Duarte in the Vertebrate Genomes Project lab conference room.
This pill bug is a terrestrial isopod—one of the groups long considered difficult to sequence, but now increasingly within reach.
Which organismal groups have historically been the most challenging to genome sequence, and are they still challenging today?
Mark Blaxter: Looking back on the first six years of work at Tree of Life, it is clear that some species that were once considered “impossible” in the early months are now zipping through our systems at pace, because we have learnt the tricks needed to get DNA out of them while avoiding their exoskeletons, exotic chemical compositions, or recently eaten food items.
Today, I am confident that any group that has not yet yielded to the careful skill of the ToL Core Lab will do so very soon. That said, isopods—marine, freshwater, and terrestrial—are still annoyingly difficult: we extract DNA that looks “good”, but when this is offered to the sequencers, the answer is “no.” Some isopods have worked well recently, so perhaps the solution is just round the corner.
Paris japonica — a plant with one of the largest known genomes, making it exceptionally challenging to sequence.