Summary (200 words): The EBP Assembly Standards define high-quality reference genome targets: >1 Mb contig NG50, >90% of sequence assigned to chromosomes, and Q40 accuracy (<1 error per 10,000 bases), alongside proper identification of organelles, contaminants, and sex chromosomes. The standards are technology-agnostic but supported by practical recommendations.

Achieving these goals typically requires long-read sequencing (PacBio HiFi or Oxford Nanopore) for primary assembly and long-range scaffolding data (primarily Hi-C). Ultra-long ONT reads can sometimes reduce the need for scaffolding. Additional useful data include parental genomes for haplotype phasing and short reads for polishing, though polishing is less critical with high-quality long reads.

Recommended strategies include ~12.5–20× HiFi coverage per haplotype or ~20× ONT coverage plus short-read polishing, with ~50× Hi-C coverage for scaffolding. Data should ideally come from a single individual, preferably the heterogametic sex.

The assembly workflow involves genome profiling (k-mers), haplotype-aware assembly, removal of duplicates and contaminants, organelle separation, scaffolding, and optional polishing. Final assemblies require curation, validation (e.g., completeness and accuracy), and submission with raw data.

More advanced “telomere-to-telomere” assemblies require substantially higher coverage and combined technologies.

Meeting Portal