Repeat masking:
These tools are used to identify, classify, curate, and mask repetitive DNA within genome assemblies before downstream analyses such as gene annotation, comparative genomics, and evolutionary studies. Many genomes contain large amounts of repetitive sequence — including transposable elements, tandem repeats, satellites, and low-complexity regions — which can interfere with genome assembly, create false gene predictions, complicate sequence alignment, and obscure biological interpretation. Together, these tools help researchers build species-specific repeat libraries, detect lineage-specific repetitive elements, refine repeat annotations, and mask repetitive regions so that genes and other functional genomic features can be analyzed more accurately. They are especially important for large, repeat-rich, or evolutionarily complex genomes across the Tree of Life.
RepeatModeler2
For building a de novo library of repeats that can be used for repeat masking. Options for additional LTR identification.
URL: https://www.repeatmasker.org/RepeatModeler/
DOI: https://doi.org/10.1073/pnas.1921046117
TE-Trimmer
Automates manual curation of TE libraries: TE boundary definition and classification to improve library construction.
URL: https://github.com/qjiangzhao/TEtrimmer
DOI: https://doi.org/10.1101/2024.06.27.600963
RepeatMasker
For annotating and masking repetitive elements in genomic sequences.
URL: https://www.repeatmasker.org/
Tandem Repeat Finder
Integrated in RepeatMasker but can additionally be run with different parameters to mask more repeats in large genomes.
URL: https://github.com/Benson-Genomics-Lab/TRF
DOI: https://doi.org/10.1093/nar/27.2.573
Ultra
For the identification and classification of tandem repeats (including satellites).
URL: https://github.com/TravisWheelerLab/ULTRA
DOI: https://doi.org/10.1101/2024.06.03.597269
RepeatDetector
For detecting repeats in genomic sequences. K-mer based. Works well for vertebrates, insects and plants.
URL: https://github.com/BioinformaticsToolsmith/Red
DOI: https://doi.org/10.1093/nargab/lqac089
Windowmasker
For masking low-complexity regions in genomic sequences. K-mer based. Tends to mask less than other tools.
URL: https://www.ncbi.nlm.nih.gov/tools/windowmasker/
DOI: https://doi.org/10.1093/bioinformatics/bti774
The Extensive de novo TE Annotator (EDTA)
For automated de novo TE annotation and species-specific TE library construction.
About the Subcommittee
This Report on Annotation Tools Recommendations was developed by EBP’s Scientific Subcommittee for Annotation.