What Is ESM3?
ESM3 is a generative protein foundation model from EvolutionaryScale, a company founded by the Meta FAIR researchers who created the original ESM family of protein language models. Published in Science in 2024, ESM3 represents a fundamental shift from earlier ESM models: instead of merely analyzing or predicting protein properties, it can design entirely new proteins from scratch.
The key innovation is that ESM3 jointly reasons over three modalities — amino acid sequence, three-dimensional structure, and biological function — within a single generative framework. You can condition generation on any combination of these inputs. Provide a desired function and partial structure, and ESM3 will generate sequences that satisfy both constraints. Provide a sequence with masked regions, and it fills them in while maintaining structural coherence.
The flagship ESM3 model has 98 billion parameters, trained on billions of protein sequences, structures, and functional annotations. A smaller 1.4 billion parameter open version, ESM3-open, is available for non-commercial research. Intermediate sizes (7B and 30B) are accessible through EvolutionaryScale's Forge API.
The Breakthrough: A Fluorescent Protein from 500 Million Years of Evolution
The most striking demonstration of ESM3's capabilities was the generation of esmGFP, a novel green fluorescent protein with less than 60% sequence identity to any known fluorescent protein. The researchers estimated that this level of divergence from natural fluorescent proteins is equivalent to roughly 500 million years of natural evolution. The protein was synthesized in the lab, expressed in bacteria, and confirmed to fluoresce — validating that ESM3 had learned enough about protein biology to design functional proteins far outside the space of known sequences.
What Is ESM Cambrian (ESM C)?
ESM Cambrian, or ESM C, is a separate family of protein language models from EvolutionaryScale that focuses specifically on protein representations and embeddings rather than generative design. While ESM3 is built for creating new proteins, ESM C is optimized for understanding existing ones.
ESM C produces high-quality per-residue and per-sequence embeddings that serve as features for downstream prediction tasks: function annotation, variant effect prediction, subcellular localization, protein-protein interaction scoring, and more. It improves on the earlier ESM-2 embeddings that power tools like ESMFold, with better performance across standard benchmarks.
The most significant practical difference between ESM C and ESM3-open is licensing. ESM C is released under a commercial license, making it immediately usable for pharmaceutical companies, biotech startups, and commercial software products. ESM3-open's non-commercial restriction limits its use to academic research.
ESM3 vs ESMFold vs ESM-2: The Evolution of ESM Models
The ESM family has evolved significantly over several generations. Understanding how they relate helps clarify when to use each one:
- ESM-2 (2022): A protein language model trained with masked language modeling on sequences. Produces embeddings that encode evolutionary and structural information. The foundation for ESMFold. Available in sizes from 8M to 15B parameters.
- ESMFold (2022): A structure prediction model that takes ESM-2 embeddings and predicts 3D protein structure from a single sequence. Faster than AlphaFold2 because it skips the MSA step. Purely predictive — it folds, it does not design.
- ESM3 (2024): A generative model that operates across sequence, structure, and function simultaneously. Can create new proteins, not just analyze existing ones. The 98B flagship is vastly more capable than the 1.4B open version.
- ESM Cambrian (2024): An embedding-focused model optimized for protein representations. Successor to ESM-2 for downstream tasks. Commercially licensed.
Predictive vs Generative: The Key Distinction
ESMFold and ESM-2 are predictive tools. Given a protein sequence, they tell you something about it — its structure, its properties, its evolutionary relationships. ESM3 is a generative tool. Given constraints on what you want a protein to do, it proposes sequences that could work. This is the difference between reading a book and writing one.
For most day-to-day computational biology workflows — folding a sequence, computing embeddings, predicting variant effects — the predictive models remain the right choice. ESM3's generative capabilities shine in protein engineering contexts where you need to design new sequences: therapeutic antibodies, industrial enzymes, biosensors, or novel binders.
Licensing and Access: What You Can Actually Use
Licensing is one of the most practically important aspects of the ESM ecosystem and it varies significantly across models:
- ESM-2 and ESMFold: Released by Meta under permissive open-source licenses. Free to use commercially. This is what SciRouter currently serves for protein folding and embeddings.
- ESM3-open (1.4B): Available under a non-commercial license from EvolutionaryScale. Fine for academic research but cannot be used in commercial products or services.
- ESM3 (7B, 30B, 98B): Accessible only through EvolutionaryScale's Forge API under commercial agreements. Not available for self-hosting.
- ESM Cambrian: Released under a commercial license. Can be used freely in commercial applications, making it the practical choice for industry embeddings.
Implications for Drug Discovery and Protein Engineering
ESM3 opens several new capabilities for computational drug discovery and protein engineering workflows:
- De novo protein design: Generate entirely new protein sequences conditioned on desired structural and functional properties, bypassing the limitations of natural sequence space.
- Enzyme engineering: Design catalytic proteins with specific activity profiles by conditioning on active site geometry and function annotations.
- Therapeutic protein optimization: Explore sequence variants that maintain function while improving stability, solubility, or manufacturability.
- Better embeddings for ML pipelines: ESM Cambrian embeddings as drop-in replacements for ESM-2 in variant effect prediction, function classification, and protein property models.
How SciRouter Uses ESM Models Today
SciRouter currently serves ESMFold for protein structure prediction and ESM-2 for protein embeddings through its API. These remain excellent tools for the majority of computational biology workflows:
import requests
API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}
# Fold a protein sequence
response = requests.post(
f"{BASE}/proteins/fold",
headers=headers,
json={"sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH", "model": "esmfold"}
)
job = response.json()
print(f"Job submitted: {job['job_id']}")As ESM Cambrian and ESM3 mature, SciRouter plans to integrate them as upgrade options. ESM Cambrian embeddings would provide improved representations for downstream tasks through the existing protein embeddings endpoint. ESM3 generative capabilities would complement the existing ProteinMPNN inverse folding workflow by adding a function-conditioned design mode.
For now, ESMFold remains the fastest and most accessible option for protein structure prediction, and the permissive ESM-2 license means there are no restrictions on commercial use. The SciRouter API handles GPU infrastructure, batching, and scaling so you can focus on the science.
What to Watch For
The ESM ecosystem is evolving rapidly. Several developments are worth tracking:
- Forge API expansion: EvolutionaryScale is gradually expanding access to the larger ESM3 models through their API platform, which may enable new integration pathways.
- ESM Cambrian adoption: As more benchmarks confirm ESM C's advantages over ESM-2, expect migration of embedding-dependent pipelines to the newer model.
- Competition: Models like ProGen, ProtGPT2, and EvoDiff are also tackling generative protein design. The field is moving quickly and the landscape may shift as new models publish.
- Multimodal integration: ESM3's joint sequence-structure-function approach may become the standard paradigm, displacing single-modality tools for many applications.
Ready to start with protein structure prediction? Try ESMFold through the SciRouter API, or explore our complete guide to ESMFold to understand the foundation that ESM3 builds upon. For protein design workflows, check out our ProteinMPNN tutorial to start designing sequences today.