ProteinsESMFold

ESM3 and ESM Cambrian: The Next Generation of Protein Language Models

What is ESM3? How does ESM Cambrian differ from ESM-2? Understand EvolutionaryScale's generative protein foundation models, licensing, and what they mean for drug discovery and protein engineering.

Ryan Bethencourt
March 25, 2026
10 min read

What Is ESM3?

ESM3 is a generative protein foundation model from EvolutionaryScale, a company founded by the Meta FAIR researchers who created the original ESM family of protein language models. Published in Science in 2024, ESM3 represents a fundamental shift from earlier ESM models: instead of merely analyzing or predicting protein properties, it can design entirely new proteins from scratch.

The key innovation is that ESM3 jointly reasons over three modalities — amino acid sequence, three-dimensional structure, and biological function — within a single generative framework. You can condition generation on any combination of these inputs. Provide a desired function and partial structure, and ESM3 will generate sequences that satisfy both constraints. Provide a sequence with masked regions, and it fills them in while maintaining structural coherence.

The flagship ESM3 model has 98 billion parameters, trained on billions of protein sequences, structures, and functional annotations. A smaller 1.4 billion parameter open version, ESM3-open, is available for non-commercial research. Intermediate sizes (7B and 30B) are accessible through EvolutionaryScale's Forge API.

The Breakthrough: A Fluorescent Protein from 500 Million Years of Evolution

The most striking demonstration of ESM3's capabilities was the generation of esmGFP, a novel green fluorescent protein with less than 60% sequence identity to any known fluorescent protein. The researchers estimated that this level of divergence from natural fluorescent proteins is equivalent to roughly 500 million years of natural evolution. The protein was synthesized in the lab, expressed in bacteria, and confirmed to fluoresce — validating that ESM3 had learned enough about protein biology to design functional proteins far outside the space of known sequences.

Note
Fluorescent proteins are notoriously difficult to design because fluorescence depends on a precise chromophore formed by three amino acids that must be positioned with sub-angstrom accuracy within a specific barrel fold. The fact that ESM3 achieved this demonstrates deep understanding of structure-function relationships.

What Is ESM Cambrian (ESM C)?

ESM Cambrian, or ESM C, is a separate family of protein language models from EvolutionaryScale that focuses specifically on protein representations and embeddings rather than generative design. While ESM3 is built for creating new proteins, ESM C is optimized for understanding existing ones.

ESM C produces high-quality per-residue and per-sequence embeddings that serve as features for downstream prediction tasks: function annotation, variant effect prediction, subcellular localization, protein-protein interaction scoring, and more. It improves on the earlier ESM-2 embeddings that power tools like ESMFold, with better performance across standard benchmarks.

The most significant practical difference between ESM C and ESM3-open is licensing. ESM C is released under a commercial license, making it immediately usable for pharmaceutical companies, biotech startups, and commercial software products. ESM3-open's non-commercial restriction limits its use to academic research.

ESM3 vs ESMFold vs ESM-2: The Evolution of ESM Models

The ESM family has evolved significantly over several generations. Understanding how they relate helps clarify when to use each one:

  • ESM-2 (2022): A protein language model trained with masked language modeling on sequences. Produces embeddings that encode evolutionary and structural information. The foundation for ESMFold. Available in sizes from 8M to 15B parameters.
  • ESMFold (2022): A structure prediction model that takes ESM-2 embeddings and predicts 3D protein structure from a single sequence. Faster than AlphaFold2 because it skips the MSA step. Purely predictive — it folds, it does not design.
  • ESM3 (2024): A generative model that operates across sequence, structure, and function simultaneously. Can create new proteins, not just analyze existing ones. The 98B flagship is vastly more capable than the 1.4B open version.
  • ESM Cambrian (2024): An embedding-focused model optimized for protein representations. Successor to ESM-2 for downstream tasks. Commercially licensed.

Predictive vs Generative: The Key Distinction

ESMFold and ESM-2 are predictive tools. Given a protein sequence, they tell you something about it — its structure, its properties, its evolutionary relationships. ESM3 is a generative tool. Given constraints on what you want a protein to do, it proposes sequences that could work. This is the difference between reading a book and writing one.

For most day-to-day computational biology workflows — folding a sequence, computing embeddings, predicting variant effects — the predictive models remain the right choice. ESM3's generative capabilities shine in protein engineering contexts where you need to design new sequences: therapeutic antibodies, industrial enzymes, biosensors, or novel binders.

Licensing and Access: What You Can Actually Use

Licensing is one of the most practically important aspects of the ESM ecosystem and it varies significantly across models:

  • ESM-2 and ESMFold: Released by Meta under permissive open-source licenses. Free to use commercially. This is what SciRouter currently serves for protein folding and embeddings.
  • ESM3-open (1.4B): Available under a non-commercial license from EvolutionaryScale. Fine for academic research but cannot be used in commercial products or services.
  • ESM3 (7B, 30B, 98B): Accessible only through EvolutionaryScale's Forge API under commercial agreements. Not available for self-hosting.
  • ESM Cambrian: Released under a commercial license. Can be used freely in commercial applications, making it the practical choice for industry embeddings.
Tip
If you need protein embeddings for a commercial application, ESM Cambrian is the recommended path forward. For structure prediction, ESMFold (built on the permissively licensed ESM-2) remains freely available and is what SciRouter currently serves.

Implications for Drug Discovery and Protein Engineering

ESM3 opens several new capabilities for computational drug discovery and protein engineering workflows:

  • De novo protein design: Generate entirely new protein sequences conditioned on desired structural and functional properties, bypassing the limitations of natural sequence space.
  • Enzyme engineering: Design catalytic proteins with specific activity profiles by conditioning on active site geometry and function annotations.
  • Therapeutic protein optimization: Explore sequence variants that maintain function while improving stability, solubility, or manufacturability.
  • Better embeddings for ML pipelines: ESM Cambrian embeddings as drop-in replacements for ESM-2 in variant effect prediction, function classification, and protein property models.

How SciRouter Uses ESM Models Today

SciRouter currently serves ESMFold for protein structure prediction and ESM-2 for protein embeddings through its API. These remain excellent tools for the majority of computational biology workflows:

Predict structure with ESMFold via SciRouter
import requests

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}

# Fold a protein sequence
response = requests.post(
    f"{BASE}/proteins/fold",
    headers=headers,
    json={"sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH", "model": "esmfold"}
)
job = response.json()
print(f"Job submitted: {job['job_id']}")

As ESM Cambrian and ESM3 mature, SciRouter plans to integrate them as upgrade options. ESM Cambrian embeddings would provide improved representations for downstream tasks through the existing protein embeddings endpoint. ESM3 generative capabilities would complement the existing ProteinMPNN inverse folding workflow by adding a function-conditioned design mode.

For now, ESMFold remains the fastest and most accessible option for protein structure prediction, and the permissive ESM-2 license means there are no restrictions on commercial use. The SciRouter API handles GPU infrastructure, batching, and scaling so you can focus on the science.

What to Watch For

The ESM ecosystem is evolving rapidly. Several developments are worth tracking:

  • Forge API expansion: EvolutionaryScale is gradually expanding access to the larger ESM3 models through their API platform, which may enable new integration pathways.
  • ESM Cambrian adoption: As more benchmarks confirm ESM C's advantages over ESM-2, expect migration of embedding-dependent pipelines to the newer model.
  • Competition: Models like ProGen, ProtGPT2, and EvoDiff are also tackling generative protein design. The field is moving quickly and the landscape may shift as new models publish.
  • Multimodal integration: ESM3's joint sequence-structure-function approach may become the standard paradigm, displacing single-modality tools for many applications.

Ready to start with protein structure prediction? Try ESMFold through the SciRouter API, or explore our complete guide to ESMFold to understand the foundation that ESM3 builds upon. For protein design workflows, check out our ProteinMPNN tutorial to start designing sequences today.

Frequently Asked Questions

What is ESM3?

ESM3 is a generative protein foundation model developed by EvolutionaryScale, a company spun out of Meta FAIR. Unlike earlier ESM models that only analyze proteins, ESM3 can generate new proteins by jointly reasoning over sequence, structure, and function. The flagship model has 98 billion parameters, and a smaller 1.4 billion parameter version called ESM3-open is publicly available for non-commercial research.

What is the difference between ESM3 and ESMFold?

ESMFold is a predictive model that takes an amino acid sequence and outputs a 3D structure. ESM3 is a generative model that can create entirely new proteins by reasoning across sequence, structure, and function simultaneously. ESMFold is built on ESM-2 embeddings and solves the folding problem, while ESM3 solves the protein design problem. They serve fundamentally different purposes: ESMFold predicts what exists, ESM3 designs what could exist.

What is ESM Cambrian?

ESM Cambrian (ESM C) is a family of protein language models from EvolutionaryScale optimized for generating high-quality protein representations and embeddings. Unlike ESM3, which focuses on generative protein design, ESM C is purpose-built for downstream tasks like function prediction, variant effect scoring, and clustering. Critically, ESM C is released under a commercial license, making it suitable for industry applications where ESM3-open's non-commercial restriction is a barrier.

Is ESM3 open source?

Partially. EvolutionaryScale released ESM3-open, a 1.4 billion parameter version, under a non-commercial license. The full 98 billion parameter model and the larger 7B and 30B variants are only accessible through EvolutionaryScale's Forge API, which requires a commercial agreement. ESM Cambrian, by contrast, is available under a permissive commercial license.

What is EvolutionaryScale?

EvolutionaryScale is a biotechnology company founded by researchers from Meta FAIR who created the ESM family of protein language models. The company raised over $140 million to build large-scale protein foundation models for drug discovery and protein engineering. Their flagship products are ESM3 (generative protein design) and ESM Cambrian (protein embeddings), accessed through their Forge API platform.

Try It Free

No Login Required

Try this yourself

500 free credits. No credit card required.