What is Evo 2 in one sentence?

Evo 2 is a large DNA language model released by the Arc Institute, trained on roughly 9 trillion base pairs of genomic sequence with a context window near one million tokens, capable of zero-shot variant effect prediction and generative sequence design.

Is Evo 2 actually a language model?

Yes, in the architectural sense. It is an autoregressive transformer-style model trained on nucleotide tokens instead of word tokens. The training objective is next-token prediction over DNA sequences, which is the same loss used for text language models.

What can Evo 2 do zero-shot?

The most important zero-shot capability is variant effect prediction. Without any task-specific training, Evo 2 can score the likelihood of a reference versus alternate allele and return a ratio that correlates with functional impact. It also supports zero-shot sequence generation conditioned on genomic context.

How does Evo 2 differ from DNABERT and Nucleotide Transformer?

Scale. Evo 2 is roughly two orders of magnitude larger in parameters, trained on a much larger corpus, and uses a much longer context window (close to a million tokens vs a few thousand). These scale differences translate into better zero-shot generalization across the genome.

Do I need a GPU to run Evo 2?

Yes, and a beefy one. The full model needs multiple A100-class GPUs for inference. That is why SciRouter hosts it as a managed endpoint — you call the API from a laptop and the endpoint handles the hardware.

Is Evo 2 suitable for clinical use?

Not without further validation. Evo 2 is a research tool. It produces reasonable rankings of variant impact, but clinical use requires orthogonal validation, orthogonal methods, and careful handling of false positives. Treat its scores as candidates for follow-up, not diagnoses.

What is Evo 2? The DNA Language Model That Changed Genomics in 2026

Evo 2 is the DNA model that finally made genomics feel like NLP. Released by the Arc Institute in early 2025 as a follow-on to the original Evo model, it combines three things that had not been put together before at this scale: a massive training corpus of roughly 9 trillion base pairs, a context window near one million tokens, and a parameter count in the tens of billions. The result is a DNA language model that performs zero-shot variant effect prediction and generative sequence design well enough to be genuinely useful.

This guide explains what Evo 2 actually is, how the training corpus and long context change what a DNA model can do, the kinds of zero-shot tasks it unlocks, and how to run it through the SciRouter DNA Lab without wiring up a GPU cluster.

Note

If you have followed protein language models (ESM-2, ProtBERT, etc.), the mental model for Evo 2 is the same thing applied to nucleotide sequences. The interesting differences are in what longer context lets the model learn and how variant effects fall out of next-token probabilities.

The one-line summary of Evo 2

Evo 2 is a large autoregressive transformer-style model trained on roughly 9 trillion base pairs of DNA spanning bacteria, archaea, phages, and eukaryotic genomes. It processes context windows approaching one million nucleotide tokens. It is capable of zero-shot variant scoring, zero-shot regulatory element annotation, and generative design of functional DNA sequences conditioned on context.

Why training corpus matters

Earlier DNA models like DNABERT and Nucleotide Transformer were trained on much smaller slices of the tree of life. Evo 2's training corpus is an order of magnitude larger and spans a much wider range of organisms, which matters for two reasons:

Evolutionary coverage. A model trained on a diverse enough slice of evolution starts to learn sequence conservation patterns implicitly. Highly conserved regions look like high-probability tokens; variable regions look like low-probability tokens. That is the foundation of zero-shot variant effect prediction.
Functional coverage. More organisms means more regulatory element types, more gene structures, more codon usage patterns, and more opportunities for the model to learn functional constraints on sequence.

Why context length matters

Earlier DNA transformers were limited to short context windows — typically a few thousand base pairs. That was enough to handle a single gene but nowhere near enough to capture long-range regulatory relationships. Enhancers can be hundreds of kilobases away from the genes they regulate. Chromatin domains span millions of base pairs. A model with a short context is structurally unable to learn these long-range dependencies.

Evo 2's million-token context window is a qualitative jump. It can see a whole gene plus its regulatory region plus neighboring genes in a single forward pass. This unlocks things that were out of reach for earlier models:

Long-range regulatory effect prediction, where a variant hundreds of kilobases from a gene affects its expression.
Full-gene variant scoring that accounts for splicing and regulatory context together.
Generative sequence design at the scale of a complete transcriptional unit.

How variant effect prediction works

Evo 2 is trained as an autoregressive model that predicts the next nucleotide from the preceding context. That turns out to be exactly the right loss for zero-shot variant effect prediction.

The trick is simple. For a given variant, you compute the model's log-likelihood of the sequence with the reference allele and the log-likelihood of the same sequence with the alternate allele. The difference — the log ratio — is a zero-shot score for how unusual the variant looks to the model. Large negative log ratios flag variants that would disrupt patterns the model has learned across millions of genomes.

This is analogous to how ESM-2 is used for protein variant effect prediction. The DNA version works for the same reason: when a model trained on a large slice of evolution sees an unusual base in a conserved position, it reports low likelihood for that base. Unusual plus conserved usually means functional.

Zero-shot capabilities

Beyond variant scoring, Evo 2 supports several zero-shot tasks that previously required bespoke models:

Regulatory element annotation

Probabilities along a stretch of DNA tend to dip inside conserved regulatory elements and rise inside less constrained regions. This gives you a cheap first-pass annotation without running a chromatin-state model.

Generative design

Because it is autoregressive, Evo 2 can sample new DNA sequences conditioned on genomic context. This enables applications like generating candidate promoter variants, designing synthetic regulatory elements, and proposing mutations for directed evolution.

Essentiality prediction

The model's per-position entropy gives a soft signal for essentiality. Regions where the model is very confident about the next base are usually under strong purifying selection, which correlates with essential function.

Limits and caveats

Evo 2 is a powerful model, but there are a few things it is not and a few pitfalls worth knowing about:

Not a classifier. Evo 2 does not tell you whether a variant is benign or pathogenic. It tells you how unusual it looks. Clinical interpretation requires orthogonal evidence.
Biased toward what it has seen. Like all foundation models, Evo 2 reflects the biases of its training corpus. Genes and regions under-represented in the training data will have less reliable scores.
Computationally heavy. Running the full model on long contexts requires serious GPU resources. Do not expect to run it on a laptop.
Evolutionary framing. The model scores variants through an evolutionary lens. De novo mutations in genes with strong purifying selection are well-handled. Lineage-specific functional variants are more subtle.

Running Evo 2 through SciRouter

The full Evo 2 model needs multiple A100-class GPUs for inference. That is why the SciRouter DNA Lab workspace exposes it as a hosted endpoint. You POST a reference sequence plus a list of variants and receive log-likelihood ratios back without worrying about the GPU. The Evo 2 tool page documents the exact request and response schemas.

When to reach for Evo 2 vs something else

Reach for Evo 2 when…

You need zero-shot variant scores across the genome, not just in coding regions.
Long-range regulatory context matters for the variants you care about.
You want to generate new DNA sequences conditioned on genomic context.

Reach for a smaller model when…

You only care about protein-coding variants. ESM-2 on the translated protein can be faster and equally accurate.
You need real-time inference on a laptop without a GPU.
Your workflow is limited to a few kilobases of context.

Bottom line

Evo 2 is the clearest sign that the foundation model playbook works for DNA. The combination of massive corpus, long context, and autoregressive training gives it zero-shot capabilities that earlier models could only dream about. For any project that involves variant effects, regulatory elements, or sequence design, it belongs in your toolkit.

Try Evo 2 in DNA Lab →