Why AI Protein Design Tools Matter
Designing proteins with specific structures and functions is one of the most impactful applications of machine learning in biology. Three tools have defined the current landscape: ProteinMPNN for inverse folding, RFdiffusion for backbone generation, and Chroma for generative protein design. Each solves a different part of the design problem, and understanding when to use which tool can save weeks of experimental effort.
This guide provides a practical comparison of all three tools across architecture, use cases, compute requirements, and accessibility. Whether you are designing binders, engineering enzymes, or exploring novel folds, this comparison will help you choose the right tool for your project.
What Each Tool Does
ProteinMPNN: Inverse Folding (Sequence Design)
Published by the Baker Lab in 2022, ProteinMPNN solves the inverse folding problem: given a 3D protein backbone, it designs amino acid sequences predicted to fold into that structure. It uses a message-passing neural network that operates on the protein graph (residues as nodes, spatial contacts as edges) to predict sequences with high recovery rates.
- Input: Protein backbone coordinates (PDB file or coordinates)
- Output: Designed amino acid sequences with confidence scores
- Architecture: Message-passing neural network on protein graphs
- Published: September 2022 (Dauparas et al., Science)
RFdiffusion: Backbone Generation
Published by the Baker Lab in 2023, RFdiffusion generates novel protein backbone structures using denoising diffusion. Built on the RoseTTAFold architecture, it can create backbones from scratch or conditioned on specific constraints such as binding a target protein, scaffolding a functional motif, or incorporating symmetry.
- Input: Design constraints (target protein, motif, symmetry, or unconditional)
- Output: Novel protein backbone 3D coordinates
- Architecture: Denoising diffusion on RoseTTAFold structure prediction network
- Published: July 2023 (Watson et al., Nature)
Chroma: Generative Protein Design
Published by Generate Biomedicines in 2023, Chroma is a generative model that can produce novel protein structures conditioned on high-level properties. It uses a diffusion process over protein structure and sequence simultaneously, enabling generation of proteins with desired symmetry groups, shape constraints, or functional annotations.
- Input: Property constraints (symmetry, substructure, natural language prompts)
- Output: Full protein structures (backbone and sequence together)
- Architecture: Score-based diffusion with a graph neural network denoiser
- Published: March 2023 (Ingraham et al., Nature)
Head-to-Head Comparison
Problem Solved
- ProteinMPNN: Sequence design for a fixed backbone. You already have the structure you want; you need a sequence that folds into it.
- RFdiffusion: Backbone generation with structural constraints. You know the function (e.g., bind this target) but need a new structure that achieves it.
- Chroma: Exploratory structure generation. You want to sample diverse protein architectures with optional property conditioning.
Experimental Validation
- ProteinMPNN: Extensively validated. Designed sequences consistently fold as predicted in the lab, with sequence recovery rates above 50% on native backbones. Widely adopted in dozens of published experimental studies.
- RFdiffusion: Strong experimental results for binder design and motif scaffolding. De novo binders to targets like the insulin receptor and PD-L1 have been validated experimentally.
- Chroma: Early experimental validation shows generated proteins express and fold, though fewer independent experimental studies have been published compared to ProteinMPNN and RFdiffusion.
Compute Requirements
- ProteinMPNN: Lightweight. Runs on a single GPU in seconds or even on CPU. Batch design of hundreds of sequences is fast and inexpensive.
- RFdiffusion: Moderate to heavy. Requires a GPU with 16GB+ VRAM. Each diffusion trajectory takes 1 to 5 minutes depending on protein size and number of steps.
- Chroma: Heavy. Requires a high-end GPU (A100 recommended). Generation is slower than RFdiffusion for equivalent-sized proteins due to joint structure-sequence diffusion.
Open Source and Licensing
- ProteinMPNN: Fully open source (MIT license). Code and weights available on GitHub.
- RFdiffusion: Open source with a non-commercial license (BSD-style with restrictions). Free for academic use; commercial use requires a license from UW.
- Chroma: Open source under Apache 2.0 license. Code and weights released by Generate Biomedicines.
When to Use Each Tool
Choose ProteinMPNN When:
- You have a backbone structure and need sequences that fold into it
- You are redesigning an existing protein for improved stability or expression
- You need to design sequences for a scaffold from RFdiffusion or other backbone generators
- You want fast turnaround: seconds per design, not minutes
- You need experimentally reliable results with strong published validation
Choose RFdiffusion When:
- You need a completely new protein backbone that binds a specific target
- You are scaffolding a functional motif (e.g., placing a catalytic site in a new protein)
- You want to design symmetric assemblies (dimers, trimers, cages)
- You have a clear structural constraint but no starting backbone
Choose Chroma When:
- You want to explore diverse protein architectures without specific structural constraints
- You are conditioning generation on high-level properties (symmetry, shape class)
- You need joint backbone-and-sequence generation in a single step
- You are running exploratory research to discover novel protein topologies
Typical Protein Design Workflow
In practice, these tools are most powerful when used together. A common workflow chains backbone generation with sequence design and then validates the result with structure prediction:
- Step 1: Generate a backbone with RFdiffusion (or start from an existing PDB structure)
- Step 2: Design sequences for that backbone with ProteinMPNN
- Step 3: Validate designed sequences by folding them with ESMFold or AlphaFold2
- Step 4: Compare the predicted structure to the design target (RMSD check)
- Step 5: Send top candidates to the lab for experimental validation
Using ProteinMPNN via SciRouter API
SciRouter hosts ProteinMPNN as a managed API endpoint. You can design sequences for any backbone structure with a single API call, no GPU setup, no model download, and no dependency management.
from scirouter import SciRouter
client = SciRouter(api_key="sk-sci-your-api-key")
# Design sequences for a backbone structure
result = client.design.proteinmpnn(
pdb_id="1QYS", # PDB ID or upload coordinates
num_sequences=8, # Number of sequences to design
temperature=0.1, # Lower = more conservative designs
chain="A" # Target chain
)
for seq in result.sequences:
print(f"Sequence: {seq.sequence[:40]}...")
print(f"Score: {seq.score:.3f}")
print(f"Recovery: {seq.recovery:.1%}")
print()You can chain ProteinMPNN with ESMFold to validate designs in the same script:
from scirouter import SciRouter
client = SciRouter(api_key="sk-sci-your-api-key")
# Step 1: Design sequences
designs = client.design.proteinmpnn(
pdb_id="1QYS",
num_sequences=4,
temperature=0.1
)
# Step 2: Validate each design with ESMFold
for seq in designs.sequences:
fold = client.proteins.fold(
sequence=seq.sequence,
model="esmfold"
)
print(f"Score: {seq.score:.3f} | pLDDT: {fold.mean_plddt:.1f}")
if fold.mean_plddt > 80:
print(" -> High confidence fold. Good candidate.")Architecture Comparison at a Glance
Understanding the architectural differences helps explain why each tool excels at different tasks:
- ProteinMPNN uses a message-passing neural network that propagates information along the edges of a protein structure graph. This graph-based approach is naturally suited to reasoning about local structural contacts and designing sequences that satisfy spatial constraints.
- RFdiffusion adapts the RoseTTAFold structure prediction network for generative use via denoising diffusion. The model starts from random noise and iteratively refines it into a valid protein backbone, guided by optional conditioning signals like a binding target.
- Chroma uses a score-based diffusion framework with a graph neural network that operates on both backbone coordinates and sequence identity simultaneously. This joint generation avoids the two-step backbone-then-sequence approach but requires more compute.
Summary: Choosing the Right Tool
The choice between ProteinMPNN, RFdiffusion, and Chroma depends on where you are in the design process:
- Have a backbone, need a sequence? Use ProteinMPNN.
- Need a new backbone for a specific function? Use RFdiffusion.
- Want to explore diverse protein architectures? Use Chroma.
- Building a full pipeline? Use RFdiffusion for backbone generation, ProteinMPNN for sequence design, and ESMFold for validation.
ProteinMPNN is available now on SciRouter with free credits to get started. Read our ProteinMPNN tutorial for a step-by-step walkthrough, or explore the ProteinMPNN tool page to see full API documentation.
Sign up for a free API key and design your first protein sequence in under a minute. No GPU setup, no model downloads, no dependency management.