ProteinsProtein Engineering

Predict Protein Stability: How to Score Mutations Before You Make Them

Score protein mutations for stability with ThermoMPNN. DDG prediction explained, hands-on scoring 10 mutations, and combining with solubility for optimal engineering.

Ryan Bethencourt
April 8, 2026
9 min read

Why Predict Protein Stability?

Every protein engineering project eventually hits the same bottleneck: you have a list of candidate mutations and need to know which ones will make your protein more stable, which will be neutral, and which will cause it to misfold. Traditionally, this meant weeks of cloning, expression, and thermal shift assays to test each variant.

Computational stability prediction flips this workflow. Instead of testing mutations blind, you score them in silico first, rank by predicted effect, and only take the most promising candidates to the lab. A typical directed evolution campaign might start with 1,000 candidate mutations and use stability prediction to narrow to 20–50 variants for experimental validation – a 20x reduction in wet lab effort.

In this guide, we walk through the fundamentals of protein stability prediction, introduce ThermoMPNN as a practical tool, and demonstrate hands-on scoring of 10 real mutations using the SciRouter API.

DDG: The Language of Protein Stability

Protein stability is quantified by the Gibbs free energy of folding (DG). A protein folds spontaneously when DG is negative – meaning the folded state is thermodynamically favored over the unfolded state. When we introduce a mutation, the change in folding free energy is called DDG (delta delta G):

  • DDG < 0 kcal/mol: Stabilizing mutation – the mutant folds more tightly than wild type
  • DDG = 0 kcal/mol: Neutral mutation – no effect on stability
  • DDG > 0 kcal/mol: Destabilizing mutation – the mutant is less stable than wild type

Most random mutations are destabilizing. In a typical protein, roughly 70% of single-point mutations have a positive DDG (destabilizing), 20% are neutral, and only 10% are stabilizing. This is why computational pre-screening is so valuable – it helps you find the rare stabilizing mutations without testing all possibilities experimentally.

Note
A DDG of -1.0 kcal/mol corresponds roughly to a 5–10 degree Celsius increase in melting temperature (Tm), depending on the protein. Even small DDG improvements can significantly impact shelf life, expression yield, and activity at elevated temperatures.

ThermoMPNN: Structure-Based Stability Prediction

ThermoMPNN is a deep learning model for predicting mutation stability effects. It builds on the ProteinMPNN architecture – the same model used for protein sequence design – and fine-tunes it on experimental DDG data from databases like ProTherm and Megascale.

The model takes a protein structure (PDB format) and a mutation specification as input, then predicts the DDG in kcal/mol. Because it operates on 3D structure rather than sequence alone, it captures the geometric context of each mutation – whether the residue is buried or exposed, what contacts it makes, and how the local backbone geometry constrains substitutions.

How ThermoMPNN Works

  • Input: Protein 3D structure (PDB) + mutation(s) specified as position and substitution
  • Encoding: Structure is encoded as a graph with residues as nodes and spatial contacts as edges
  • Message passing: Graph neural network propagates information between neighboring residues
  • Prediction: A regression head outputs the predicted DDG value for the mutation
  • Speed: Under 1 second per mutation on CPU, enabling rapid screening of large mutation libraries

Hands-On: Scoring 10 Mutations on T4 Lysozyme

T4 lysozyme (PDB: 2LZM) is one of the most extensively studied proteins for stability mutations, with hundreds of experimentally measured DDG values available in the literature. This makes it an ideal test case for validating computational predictions against known results.

We will score 10 well-characterized mutations and compare our predictions to the published experimental values.

Step 1: Get the Structure

First, we need a 3D structure. We can either download the crystal structure from the PDB or predict it with ESMFold. For this example, we will use ESMFold to demonstrate the full computational workflow:

Predict T4 lysozyme structure with ESMFold
import scirouter

client = scirouter.SciRouter()

# T4 Lysozyme sequence
t4l_seq = (
    "MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELD"
    "KAIGRNCNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAV"
    "RRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYND"
    "QTPNRAKRVITTFRTGTWDAYKNL"
)

# Predict structure
fold_job = client.proteins.fold(sequence=t4l_seq, model="esmfold")
fold_result = client.proteins.fold_result(fold_job.job_id, poll=True)

print(f"Mean pLDDT: {fold_result.mean_plddt:.1f}")
pdb_string = fold_result.pdb

Step 2: Score Individual Mutations

Now we score 10 well-known T4 lysozyme mutations. These include both stabilizing mutations (like the engineered cavity-filling mutations) and destabilizing mutations (like the alanine-scanning mutations that remove side chain contacts):

Score 10 mutations with ThermoMPNN
# Define mutations to score
# Format: (wild_type_aa, position, mutant_aa, expected_ddg_kcal_mol)
mutations = [
    ("L", 99, "A", 5.1),    # Destabilizing: removes buried leucine
    ("A", 98, "V", -1.2),   # Stabilizing: fills cavity
    ("T", 157, "I", -1.4),  # Stabilizing: hydrophobic packing
    ("S", 117, "A", 0.3),   # Mildly destabilizing
    ("A", 42, "G", 2.8),    # Destabilizing: glycine flexibility
    ("V", 149, "I", -0.6),  # Mildly stabilizing
    ("I", 3, "A", 3.9),     # Destabilizing: core disruption
    ("L", 133, "A", 4.2),   # Destabilizing: removes buried contact
    ("M", 102, "L", -0.3),  # Near-neutral
    ("A", 82, "P", 1.7),    # Destabilizing: proline strain
]

print(f"{'Mutation':<12} {'Predicted DDG':>14} {'Experimental':>13} {'Match':>6}")
print("-" * 50)

for wt, pos, mut, exp_ddg in mutations:
    result = client.design.stability(
        pdb=pdb_string,
        mutations=[f"{wt}{pos}{mut}"],
    )

    pred_ddg = result.ddg
    direction_match = (pred_ddg > 0 and exp_ddg > 0) or (pred_ddg < 0 and exp_ddg < 0)

    print(
        f"{wt}{pos}{mut:<8} {pred_ddg:>+10.2f}      {exp_ddg:>+9.2f}    "
        f"{'Yes' if direction_match else 'No':>5}"
    )
Tip
The most important metric is whether the predicted sign (stabilizing vs. destabilizing) matches the experimental result. Getting the exact DDG magnitude right is less critical than correctly ranking mutations relative to each other.

Step 3: Interpret the Results

When interpreting ThermoMPNN stability predictions, focus on these practical guidelines:

  • DDG below -1.0: Strongly predicted as stabilizing. High confidence, good candidate for experimental validation.
  • DDG between -1.0 and 0: Mildly stabilizing or neutral. Could go either way experimentally. Test if the mutation is otherwise beneficial (e.g., improves solubility).
  • DDG between 0 and +1.0: Mildly destabilizing. May be tolerable if the protein is already very stable.
  • DDG above +1.0: Likely destabilizing. Avoid unless there is a strong functional reason to make this mutation.
  • DDG above +3.0: Severely destabilizing. Will likely cause misfolding or aggregation.

Scanning an Entire Position

A common workflow in protein engineering is saturation mutagenesis scanning – testing all 20 amino acid substitutions at a single position. ThermoMPNN makes this trivial computationally:

Saturation scanning at position 99 of T4 lysozyme
amino_acids = "ACDEFGHIKLMNPQRSTVWY"
position = 99
wild_type = "L"

results = []
for aa in amino_acids:
    if aa == wild_type:
        continue
    result = client.design.stability(
        pdb=pdb_string,
        mutations=[f"{wild_type}{position}{aa}"],
    )
    results.append({"mutation": f"{wild_type}{position}{aa}", "ddg": result.ddg})

# Sort by DDG (most stabilizing first)
results.sort(key=lambda x: x["ddg"])

print(f"Saturation scan at position {position} (wild type: {wild_type})")
print(f"{'Mutation':<10} {'DDG (kcal/mol)':>15}")
print("-" * 28)
for r in results:
    marker = " <-- stabilizing" if r["ddg"] < -0.5 else ""
    print(f"{r['mutation']:<10} {r['ddg']:>+12.2f}{marker}")

Combining Stability with Sequence Design

ThermoMPNN stability predictions pair naturally with ProteinMPNN sequence design. A typical workflow is:

  • Design: Use ProteinMPNN to generate candidate sequences for a target backbone structure
  • Score: Run ThermoMPNN on each designed sequence to predict stability relative to wild type
  • Filter: Keep only designs with predicted DDG below a threshold (e.g., -0.5 kcal/mol for each mutation)
  • Validate: Predict structures of top candidates with ESMFold to verify they fold correctly
Design + stability scoring pipeline
# Step 1: Design sequences with ProteinMPNN
design_result = client.design.proteinmpnn(
    pdb=pdb_string,
    num_designs=50,
    temperature=0.1,
)

# Step 2: Score each design for stability
scored_designs = []
for design in design_result.designs:
    # Identify mutations relative to wild type
    mutations = []
    for i, (wt, mut) in enumerate(zip(t4l_seq, design.sequence)):
        if wt != mut:
            mutations.append(f"{wt}{i+1}{mut}")

    if not mutations:
        continue

    stability = client.design.stability(
        pdb=pdb_string,
        mutations=mutations,
    )

    scored_designs.append({
        "sequence": design.sequence,
        "mutations": mutations,
        "total_ddg": stability.ddg,
        "recovery": design.sequence_recovery,
    })

# Step 3: Filter and rank
stable_designs = [d for d in scored_designs if d["total_ddg"] < 0]
stable_designs.sort(key=lambda x: x["total_ddg"])

print(f"Stabilizing designs: {len(stable_designs)} / {len(scored_designs)}")
for d in stable_designs[:5]:
    print(f"  DDG: {d['total_ddg']:+.2f} kcal/mol, "
          f"{len(d['mutations'])} mutations, "
          f"recovery: {d['recovery']:.1%}")

Combining Stability with Solubility

A mutation that improves thermostability but causes the protein to aggregate in solution is not useful in practice. This is a common failure mode – hydrophobic core-packing mutations often improve DDG but can expose hydrophobic surface area that promotes aggregation.

The solution is to combine stability prediction with solubility prediction. Run both ThermoMPNN and SoluProt on your candidate mutations and only advance variants that score well on both metrics:

Combined stability + solubility screening
# Score mutations for both stability and solubility
candidates = [
    ("A", 98, "V"),
    ("T", 157, "I"),
    ("V", 149, "I"),
    ("M", 102, "L"),
    ("S", 117, "V"),
]

print(f"{'Mutation':<12} {'DDG':>8} {'Solubility':>11} {'Verdict':>10}")
print("-" * 45)

for wt, pos, mut in candidates:
    # Stability
    stab = client.design.stability(
        pdb=pdb_string,
        mutations=[f"{wt}{pos}{mut}"],
    )

    # Solubility (on mutant sequence)
    mutant_seq = t4l_seq[:pos-1] + mut + t4l_seq[pos:]
    sol = client.design.solubility(sequence=mutant_seq)

    verdict = "PASS" if stab.ddg < 0 and sol.score > 0.5 else "REVIEW"
    print(f"{wt}{pos}{mut:<8} {stab.ddg:>+7.2f} {sol.score:>10.2f} {verdict:>10}")

Common Pitfalls in Stability Prediction

1. Ignoring Structural Context

Stability predictions are only as good as the input structure. If your structure has low confidence in the region around the mutation site (pLDDT below 70), the DDG prediction will be unreliable. Always check the local confidence score before trusting a stability prediction.

2. Assuming Additivity for Multi-Mutants

Summing individual DDG values for multi-mutant designs works as a rough approximation, but can fail when mutations are structurally coupled. Two mutations that each stabilize by -1.0 kcal/mol individually might only stabilize by -1.2 kcal/mol when combined (negative epistasis) or by -2.5 kcal/mol (positive epistasis). Experimental validation of multi-mutant combinations is essential.

3. Over-Optimizing for Stability Alone

Maximally stable proteins are not always the most useful. Over-stabilization can reduce conformational flexibility needed for catalysis, reduce solubility, or impair protein-protein interactions. Target a modest stability improvement (DDG of -2 to -5 kcal/mol total) rather than engineering a thermophilic brick.

Next Steps

Stability prediction is one component of a complete protein engineering toolkit. To build a full optimization pipeline, combine it with:

Sign up for a free SciRouter API key and start scoring mutations today. 500 free credits per month – enough to screen hundreds of candidate mutations.

Frequently Asked Questions

What is DDG in protein stability prediction?

DDG (delta delta G) is the predicted change in folding free energy caused by a mutation. A negative DDG value indicates the mutation stabilizes the protein (makes it more thermostable), while a positive DDG indicates destabilization. Values are typically reported in kcal/mol. A DDG of -1.0 kcal/mol or lower is generally considered a significantly stabilizing mutation.

How accurate is ThermoMPNN?

ThermoMPNN achieves a Pearson correlation of 0.6 to 0.7 with experimental DDG values on standard benchmarks like S669 and Megascale. This is comparable to or better than physics-based methods like Rosetta ddg_monomer and FoldX. The model is most accurate for buried mutations and less reliable for surface-exposed positions where solvent effects dominate.

Do I need a crystal structure to predict stability?

ThermoMPNN requires a 3D structure as input, but this does not need to be an experimental crystal structure. You can use an ESMFold or AlphaFold2 predicted structure. For best results, use a predicted structure with high confidence (mean pLDDT above 80). SciRouter can chain ESMFold structure prediction with ThermoMPNN stability scoring automatically.

What is the difference between DDG and melting temperature?

DDG predicts the change in thermodynamic stability (folding free energy) from a single mutation, measured in kcal/mol. Melting temperature (Tm) is the temperature at which 50% of the protein is unfolded, measured in degrees Celsius. While correlated, they are not the same measurement. A stabilizing mutation (negative DDG) generally increases Tm, but the exact Tm change depends on the protein context.

Can I predict stability for multiple simultaneous mutations?

ThermoMPNN predicts DDG for single-point mutations. For multiple simultaneous mutations, the simplest approach is to sum the individual DDG values (additive approximation), which works well when mutations are not adjacent in structure. For closely spaced mutations, the additive model can break down due to epistatic interactions. Always validate multi-mutant designs experimentally.

How do I combine stability and solubility predictions?

Run stability prediction (ThermoMPNN) to identify stabilizing mutations, then run solubility prediction (SoluProt) on the mutant sequences to ensure the stabilized variants will still express well in your host organism. Mutations that improve stability but destroy solubility are not useful in practice. SciRouter lets you chain both predictions through the same API.

Try It Free

No Login Required

Try this yourself

500 free credits. No credit card required.