ProteinsESMFold

ProteinMPNN Tutorial: AI Protein Design for Beginners

Learn inverse folding with ProteinMPNN — design amino acid sequences for target protein structures. Step-by-step tutorial with SciRouter API examples.

Ryan Bethencourt
May 5, 2026
9 min read

What Is Inverse Folding?

Protein structure prediction answers the question: given a sequence, what shape does the protein adopt? Inverse folding asks the opposite: given a desired shape, what sequence will fold into it? This is the core challenge of computational protein design, and ProteinMPNN is the state-of-the-art solution.

Developed by the David Baker lab at the University of Washington, ProteinMPNN uses a message-passing neural network trained on thousands of experimentally determined protein structures. It takes a protein backbone as input and outputs amino acid sequences optimized to fold into that exact backbone geometry.

Why ProteinMPNN Is a Breakthrough

Before ProteinMPNN, protein sequence design relied on energy-based methods like Rosetta that were slow and had modest success rates. ProteinMPNN changed the field in three significant ways:

  • Higher success rates: Designed sequences fold into target structures 50-70% of the time, compared to 10-30% with earlier methods
  • Speed: Designs complete in seconds rather than hours, enabling high-throughput exploration of sequence space
  • Generalization: Works across diverse protein folds, including de novo designs that do not exist in nature
  • Multi-chain support: Can design sequences for protein complexes, maintaining interface contacts between chains
Note
ProteinMPNN earned the 2024 Nobel Prize in Chemistry (jointly with David Baker) for its contributions to computational protein design. It is now a foundational tool in protein engineering pipelines worldwide.

How ProteinMPNN Works

ProteinMPNN represents the protein backbone as a graph where each residue position is a node and edges connect spatially neighboring residues. A message-passing neural network propagates information through this graph, and at each position the model predicts a probability distribution over the 20 amino acids.

The model uses an autoregressive decoding strategy — it designs one residue at a time, conditioning each choice on the backbone geometry and the residues already placed. This captures dependencies between positions: for example, a hydrophobic residue at one position favors hydrophobic neighbors to form a stable core.

Sampling Temperature

ProteinMPNN supports a temperature parameter that controls diversity. Low temperature (0.1) produces conservative sequences close to the most probable design. High temperature (0.5 or above) produces diverse sequences that explore more of sequence space. For most applications, a temperature of 0.1 to 0.2 is recommended for stability, while 0.3 to 0.5 is useful for generating diverse libraries for experimental screening.

Using ProteinMPNN via the SciRouter API

SciRouter hosts ProteinMPNN as an API endpoint so you do not need to install PyTorch, download model weights, or configure GPU environments. Submit a PDB structure and receive designed sequences.

Design sequences with ProteinMPNN
import os, requests

API_KEY = os.environ["SCIROUTER_API_KEY"]
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# Read your target backbone structure
with open("target_backbone.pdb") as f:
    pdb_content = f.read()

# Submit a design job
resp = requests.post(f"{BASE}/proteins/design", headers=HEADERS, json={
    "model": "proteinmpnn",
    "pdb": pdb_content,
    "num_sequences": 8,       # Generate 8 sequence variants
    "temperature": 0.1,       # Low temp = high confidence designs
    "fixed_positions": [],    # Optional: fix specific residue indices
})
result = resp.json()

for i, seq in enumerate(result["sequences"]):
    print(f"Design {i+1}: {seq['sequence']}")
    print(f"  Score: {seq['score']:.3f}")
    print(f"  Recovery: {seq['recovery']:.1%}")
    print()
Tip
The recovery metric shows what fraction of residues match the original sequence (if one was present in the input PDB). Low recovery with high design score indicates a novel but stable sequence — exactly what you want for de novo design.

Step-by-Step Protein Design Workflow

Step 1: Obtain or Design a Backbone

You need a target backbone structure in PDB format. This can come from experimental data (X-ray crystallography, cryo-EM), from a structure prediction tool like ESMFold, or from de novo backbone generation tools like RFdiffusion.

Step 2: Run ProteinMPNN

Submit the backbone to ProteinMPNN with your desired parameters. Generate multiple sequence variants (8 to 64 is typical) to have a diverse set for experimental testing.

Step 3: Validate with Structure Prediction

For each designed sequence, run structure prediction with ESMFold to verify the sequence folds into the target shape. Compare the predicted structure to your target backbone using RMSD (root-mean-square deviation). Designs with RMSD below 2 angstroms are strong candidates.

Validate designs with ESMFold
import time

# For each designed sequence, predict its structure
for seq_data in result["sequences"][:3]:  # validate top 3
    fold_job = requests.post(f"{BASE}/proteins/fold", headers=HEADERS, json={
        "sequence": seq_data["sequence"],
        "model": "esmfold",
    }).json()

    while True:
        fold_result = requests.get(
            f"{BASE}/proteins/fold/{fold_job['job_id']}", headers=HEADERS
        ).json()
        if fold_result["status"] == "completed":
            break
        time.sleep(3)

    print(f"Sequence: {seq_data['sequence'][:30]}...")
    print(f"  pLDDT: {fold_result['mean_plddt']:.1f}")
    print()

Common Use Cases

  • Enzyme engineering: Redesign enzyme surfaces for thermostability while preserving the active site
  • Therapeutic proteins: Optimize protein drugs for stability, expression, and reduced immunogenicity
  • De novo protein design: Design sequences for entirely new backbone geometries created by generative models
  • Interface design: Design sequences at protein-protein interfaces for tighter or more specific binding

Next Steps

ProteinMPNN pairs naturally with structure prediction tools on SciRouter. Use ESMFold to validate that designed sequences fold correctly, or combine ProteinMPNN with the Antibody Design Lab for CDR sequence optimization on antibody scaffolds.

Sign up for a free SciRouter API key and start designing proteins in minutes. With 500 free credits per month and no infrastructure to manage, it is the fastest path from backbone to sequence.

Frequently Asked Questions

What is inverse folding?

Inverse folding is the reverse of structure prediction. Instead of predicting a 3D structure from a sequence, inverse folding designs a sequence that will fold into a desired 3D structure. ProteinMPNN is the leading tool for this task.

How accurate is ProteinMPNN?

ProteinMPNN designs sequences that experimentally fold into the target structure about 50-70% of the time, depending on the complexity of the fold. This is a dramatic improvement over earlier methods like Rosetta fixed-backbone design, which achieved roughly 10-30% success rates.

What input does ProteinMPNN need?

ProteinMPNN requires a protein backbone structure as input, typically provided as a PDB file. The model reads the 3D coordinates of backbone atoms (N, CA, C, O) and outputs amino acid sequences predicted to fold into that structure.

Can I fix certain residues during design?

Yes, ProteinMPNN supports position-specific constraints. You can fix catalytic residues, binding site residues, or any positions you want to preserve while letting the model redesign the rest of the sequence for stability.

How long does ProteinMPNN take?

ProteinMPNN is very fast — most designs complete in under 10 seconds via the SciRouter API. Generating 100 sequence variants for a typical protein domain takes about 30 seconds.

Try It Free

No Login Required

Try this yourself

500 free credits. No credit card required.