What Is Inverse Folding?
Protein structure prediction answers the question: given a sequence, what shape does the protein adopt? Inverse folding asks the opposite: given a desired shape, what sequence will fold into it? This is the core challenge of computational protein design, and ProteinMPNN is the state-of-the-art solution.
Developed by the David Baker lab at the University of Washington, ProteinMPNN uses a message-passing neural network trained on thousands of experimentally determined protein structures. It takes a protein backbone as input and outputs amino acid sequences optimized to fold into that exact backbone geometry.
Why ProteinMPNN Is a Breakthrough
Before ProteinMPNN, protein sequence design relied on energy-based methods like Rosetta that were slow and had modest success rates. ProteinMPNN changed the field in three significant ways:
- Higher success rates: Designed sequences fold into target structures 50-70% of the time, compared to 10-30% with earlier methods
- Speed: Designs complete in seconds rather than hours, enabling high-throughput exploration of sequence space
- Generalization: Works across diverse protein folds, including de novo designs that do not exist in nature
- Multi-chain support: Can design sequences for protein complexes, maintaining interface contacts between chains
How ProteinMPNN Works
ProteinMPNN represents the protein backbone as a graph where each residue position is a node and edges connect spatially neighboring residues. A message-passing neural network propagates information through this graph, and at each position the model predicts a probability distribution over the 20 amino acids.
The model uses an autoregressive decoding strategy — it designs one residue at a time, conditioning each choice on the backbone geometry and the residues already placed. This captures dependencies between positions: for example, a hydrophobic residue at one position favors hydrophobic neighbors to form a stable core.
Sampling Temperature
ProteinMPNN supports a temperature parameter that controls diversity. Low temperature (0.1) produces conservative sequences close to the most probable design. High temperature (0.5 or above) produces diverse sequences that explore more of sequence space. For most applications, a temperature of 0.1 to 0.2 is recommended for stability, while 0.3 to 0.5 is useful for generating diverse libraries for experimental screening.
Using ProteinMPNN via the SciRouter API
SciRouter hosts ProteinMPNN as an API endpoint so you do not need to install PyTorch, download model weights, or configure GPU environments. Submit a PDB structure and receive designed sequences.
import os, requests
API_KEY = os.environ["SCIROUTER_API_KEY"]
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# Read your target backbone structure
with open("target_backbone.pdb") as f:
pdb_content = f.read()
# Submit a design job
resp = requests.post(f"{BASE}/proteins/design", headers=HEADERS, json={
"model": "proteinmpnn",
"pdb": pdb_content,
"num_sequences": 8, # Generate 8 sequence variants
"temperature": 0.1, # Low temp = high confidence designs
"fixed_positions": [], # Optional: fix specific residue indices
})
result = resp.json()
for i, seq in enumerate(result["sequences"]):
print(f"Design {i+1}: {seq['sequence']}")
print(f" Score: {seq['score']:.3f}")
print(f" Recovery: {seq['recovery']:.1%}")
print()Step-by-Step Protein Design Workflow
Step 1: Obtain or Design a Backbone
You need a target backbone structure in PDB format. This can come from experimental data (X-ray crystallography, cryo-EM), from a structure prediction tool like ESMFold, or from de novo backbone generation tools like RFdiffusion.
Step 2: Run ProteinMPNN
Submit the backbone to ProteinMPNN with your desired parameters. Generate multiple sequence variants (8 to 64 is typical) to have a diverse set for experimental testing.
Step 3: Validate with Structure Prediction
For each designed sequence, run structure prediction with ESMFold to verify the sequence folds into the target shape. Compare the predicted structure to your target backbone using RMSD (root-mean-square deviation). Designs with RMSD below 2 angstroms are strong candidates.
import time
# For each designed sequence, predict its structure
for seq_data in result["sequences"][:3]: # validate top 3
fold_job = requests.post(f"{BASE}/proteins/fold", headers=HEADERS, json={
"sequence": seq_data["sequence"],
"model": "esmfold",
}).json()
while True:
fold_result = requests.get(
f"{BASE}/proteins/fold/{fold_job['job_id']}", headers=HEADERS
).json()
if fold_result["status"] == "completed":
break
time.sleep(3)
print(f"Sequence: {seq_data['sequence'][:30]}...")
print(f" pLDDT: {fold_result['mean_plddt']:.1f}")
print()Common Use Cases
- Enzyme engineering: Redesign enzyme surfaces for thermostability while preserving the active site
- Therapeutic proteins: Optimize protein drugs for stability, expression, and reduced immunogenicity
- De novo protein design: Design sequences for entirely new backbone geometries created by generative models
- Interface design: Design sequences at protein-protein interfaces for tighter or more specific binding
Next Steps
ProteinMPNN pairs naturally with structure prediction tools on SciRouter. Use ESMFold to validate that designed sequences fold correctly, or combine ProteinMPNN with the Antibody Design Lab for CDR sequence optimization on antibody scaffolds.
Sign up for a free SciRouter API key and start designing proteins in minutes. With 500 free credits per month and no infrastructure to manage, it is the fastest path from backbone to sequence.