Evo 2 is the DNA model that finally made genomics feel like NLP. Released by the Arc Institute in early 2025 as a follow-on to the original Evo model, it combines three things that had not been put together before at this scale: a massive training corpus of roughly 9 trillion base pairs, a context window near one million tokens, and a parameter count in the tens of billions. The result is a DNA language model that performs zero-shot variant effect prediction and generative sequence design well enough to be genuinely useful.
This guide explains what Evo 2 actually is, how the training corpus and long context change what a DNA model can do, the kinds of zero-shot tasks it unlocks, and how to run it through the SciRouter DNA Lab without wiring up a GPU cluster.
The one-line summary of Evo 2
Evo 2 is a large autoregressive transformer-style model trained on roughly 9 trillion base pairs of DNA spanning bacteria, archaea, phages, and eukaryotic genomes. It processes context windows approaching one million nucleotide tokens. It is capable of zero-shot variant scoring, zero-shot regulatory element annotation, and generative design of functional DNA sequences conditioned on context.
Why training corpus matters
Earlier DNA models like DNABERT and Nucleotide Transformer were trained on much smaller slices of the tree of life. Evo 2's training corpus is an order of magnitude larger and spans a much wider range of organisms, which matters for two reasons:
- Evolutionary coverage. A model trained on a diverse enough slice of evolution starts to learn sequence conservation patterns implicitly. Highly conserved regions look like high-probability tokens; variable regions look like low-probability tokens. That is the foundation of zero-shot variant effect prediction.
- Functional coverage. More organisms means more regulatory element types, more gene structures, more codon usage patterns, and more opportunities for the model to learn functional constraints on sequence.
Why context length matters
Earlier DNA transformers were limited to short context windows — typically a few thousand base pairs. That was enough to handle a single gene but nowhere near enough to capture long-range regulatory relationships. Enhancers can be hundreds of kilobases away from the genes they regulate. Chromatin domains span millions of base pairs. A model with a short context is structurally unable to learn these long-range dependencies.
Evo 2's million-token context window is a qualitative jump. It can see a whole gene plus its regulatory region plus neighboring genes in a single forward pass. This unlocks things that were out of reach for earlier models:
- Long-range regulatory effect prediction, where a variant hundreds of kilobases from a gene affects its expression.
- Full-gene variant scoring that accounts for splicing and regulatory context together.
- Generative sequence design at the scale of a complete transcriptional unit.
How variant effect prediction works
Evo 2 is trained as an autoregressive model that predicts the next nucleotide from the preceding context. That turns out to be exactly the right loss for zero-shot variant effect prediction.
The trick is simple. For a given variant, you compute the model's log-likelihood of the sequence with the reference allele and the log-likelihood of the same sequence with the alternate allele. The difference — the log ratio — is a zero-shot score for how unusual the variant looks to the model. Large negative log ratios flag variants that would disrupt patterns the model has learned across millions of genomes.
This is analogous to how ESM-2 is used for protein variant effect prediction. The DNA version works for the same reason: when a model trained on a large slice of evolution sees an unusual base in a conserved position, it reports low likelihood for that base. Unusual plus conserved usually means functional.
Zero-shot capabilities
Beyond variant scoring, Evo 2 supports several zero-shot tasks that previously required bespoke models:
Regulatory element annotation
Probabilities along a stretch of DNA tend to dip inside conserved regulatory elements and rise inside less constrained regions. This gives you a cheap first-pass annotation without running a chromatin-state model.
Generative design
Because it is autoregressive, Evo 2 can sample new DNA sequences conditioned on genomic context. This enables applications like generating candidate promoter variants, designing synthetic regulatory elements, and proposing mutations for directed evolution.
Essentiality prediction
The model's per-position entropy gives a soft signal for essentiality. Regions where the model is very confident about the next base are usually under strong purifying selection, which correlates with essential function.
Limits and caveats
Evo 2 is a powerful model, but there are a few things it is not and a few pitfalls worth knowing about:
- Not a classifier. Evo 2 does not tell you whether a variant is benign or pathogenic. It tells you how unusual it looks. Clinical interpretation requires orthogonal evidence.
- Biased toward what it has seen. Like all foundation models, Evo 2 reflects the biases of its training corpus. Genes and regions under-represented in the training data will have less reliable scores.
- Computationally heavy. Running the full model on long contexts requires serious GPU resources. Do not expect to run it on a laptop.
- Evolutionary framing. The model scores variants through an evolutionary lens. De novo mutations in genes with strong purifying selection are well-handled. Lineage-specific functional variants are more subtle.
Running Evo 2 through SciRouter
The full Evo 2 model needs multiple A100-class GPUs for inference. That is why the SciRouter DNA Lab workspace exposes it as a hosted endpoint. You POST a reference sequence plus a list of variants and receive log-likelihood ratios back without worrying about the GPU. The Evo 2 tool page documents the exact request and response schemas.
When to reach for Evo 2 vs something else
Reach for Evo 2 when…
- You need zero-shot variant scores across the genome, not just in coding regions.
- Long-range regulatory context matters for the variants you care about.
- You want to generate new DNA sequences conditioned on genomic context.
Reach for a smaller model when…
- You only care about protein-coding variants. ESM-2 on the translated protein can be faster and equally accurate.
- You need real-time inference on a laptop without a GPU.
- Your workflow is limited to a few kilobases of context.
Bottom line
Evo 2 is the clearest sign that the foundation model playbook works for DNA. The combination of massive corpus, long context, and autoregressive training gives it zero-shot capabilities that earlier models could only dream about. For any project that involves variant effects, regulatory elements, or sequence design, it belongs in your toolkit.