Repeat masking:

These tools are used to identify, classify, curate, and mask repetitive DNA within genome assemblies before downstream analyses such as gene annotation, comparative genomics, and evolutionary studies. Many genomes contain large amounts of repetitive sequence — including transposable elements, tandem repeats, satellites, and low-complexity regions — which can interfere with genome assembly, create false gene predictions, complicate sequence alignment, and obscure biological interpretation. Together, these tools help researchers build species-specific repeat libraries, detect lineage-specific repetitive elements, refine repeat annotations, and mask repetitive regions so that genes and other functional genomic features can be analyzed more accurately. They are especially important for large, repeat-rich, or evolutionarily complex genomes across the Tree of Life.

RepeatModeler2

For building a de novo library of repeats that can be used for repeat masking. Options for additional LTR identification.

URL: https://www.repeatmasker.org/RepeatModeler/

DOI: https://doi.org/10.1073/pnas.1921046117

TE-Trimmer

Automates manual curation of TE libraries: TE boundary definition and classification to improve library construction. 

URL: https://github.com/qjiangzhao/TEtrimmer

DOI: https://doi.org/10.1101/2024.06.27.600963

RepeatMasker

For annotating and masking repetitive elements in genomic sequences.

URL: https://www.repeatmasker.org/

Tandem Repeat Finder

Integrated in RepeatMasker but can additionally be run with different parameters to mask more repeats in large genomes. 

URL: https://github.com/Benson-Genomics-Lab/TRF

DOI: https://doi.org/10.1093/nar/27.2.573

Ultra

For the identification and classification of tandem repeats (including satellites). 

URL: https://github.com/TravisWheelerLab/ULTRA

DOI: https://doi.org/10.1101/2024.06.03.597269

RepeatDetector

For detecting repeats in genomic sequences. K-mer based. Works well for vertebrates, insects and plants.

URL: https://github.com/BioinformaticsToolsmith/Red

DOI: https://doi.org/10.1093/nargab/lqac089

Windowmasker

For masking low-complexity regions in genomic sequences. K-mer based. Tends to mask less than other tools. 

URL: https://www.ncbi.nlm.nih.gov/tools/windowmasker/

DOI: https://doi.org/10.1093/bioinformatics/bti774

The Extensive de novo TE Annotator (EDTA)

For automated de novo TE annotation and species-specific TE library construction.

URL: https://github.com/oushujun/EDTA

DOI: https://doi.org/10.1186/s13059-019-1905-y

About the Subcommittee

This Report on Annotation Tools Recommendations was developed by EBP’s Scientific Subcommittee for Annotation.