ProteinsESMFold

Understanding pLDDT Scores: Protein Folding Confidence

What pLDDT scores mean in protein structure prediction, how to interpret them, color-coding conventions, and when to trust (or distrust) your folding results.

Ryan Bethencourt
April 6, 2026
8 min read

What Is pLDDT?

When a protein structure prediction model like ESMFold or AlphaFold2 outputs a 3D structure, it also tells you how confident it is about each part of that structure. This confidence metric is called pLDDT – the predicted Local Distance Difference Test.

pLDDT is a per-residue score ranging from 0 to 100. It estimates how accurately the model has placed each amino acid in three-dimensional space. A score of 95 means the model is highly confident that the residue's position is correct; a score of 30 means it is essentially guessing. Understanding pLDDT is not optional – it is the single most important factor in deciding whether you can trust a predicted structure.

The name comes from the LDDT-Cα metric, which was originally developed to evaluate how well a predicted structure matches an experimentally determined reference. Structure prediction models adapted this into a self-assessment: the model predicts its own LDDT score for each residue, hence the "p" for predicted. This clever approach means you get reliability estimates without needing a reference structure.

How pLDDT Is Calculated

The underlying LDDT metric works by examining local distance relationships. For each residue, it looks at all atoms within a 15-angstrom radius and checks whether the pairwise distances in the predicted structure match those in a reference structure. Distances that fall within defined tolerance thresholds (0.5, 1, 2, and 4 angstroms) contribute to the score.

In the predicted version (pLDDT), the model estimates this score without a reference. During training, the model learns to predict its own accuracy by comparing its outputs to known experimental structures. The result is a well-calibrated confidence estimate – when a model says pLDDT = 90, the actual LDDT-Cα against experimental data is typically close to 0.90. This calibration is what makes pLDDT genuinely useful rather than just a vague quality indicator.

Note
pLDDT is a local confidence metric. It tells you whether individual residues are placed correctly relative to their neighbors, but it does not assess the global arrangement of domains. Two well-predicted domains might be oriented incorrectly relative to each other even if both have high pLDDT scores. For multi-domain proteins, also check the PAE (Predicted Aligned Error) matrix if available.

The pLDDT Color Scale

The structural biology community has adopted a standard color scheme for visualizing pLDDT scores. When you view a predicted structure in PyMOL, ChimeraX, Mol*, or the AlphaFold database, the coloring follows this convention:

  • Dark blue (pLDDT > 90) – Very high confidence. The backbone conformation is accurate and side-chain rotamers are likely correct. These regions are suitable for detailed structural analysis, docking studies, and mutation effect predictions.
  • Cyan / light blue (70 < pLDDT ≤ 90) – High confidence. The backbone trace is reliable, but side-chain positions may have some uncertainty. These regions are trustworthy for most applications.
  • Orange (50 < pLDDT ≤ 70) – Low confidence. The general fold may be approximately correct, but specific atom positions are uncertain. Treat these regions with caution – they often correspond to flexible loops or regions with limited evolutionary information.
  • Red (pLDDT ≤ 50) – Very low confidence. The predicted structure in these regions should not be interpreted as meaningful. These typically correspond to intrinsically disordered regions (IDRs), unstructured termini, or regions where the model lacks sufficient training data.

This color scale is stored in the B-factor column of PDB files generated by prediction models. Visualization software can apply the coloring automatically by selecting "color by B-factor" with the appropriate spectrum.

What Each pLDDT Range Means in Practice

Very High Confidence (> 90): Trust the Details

Regions with pLDDT above 90 are predicted with near-experimental accuracy. Multiple benchmarks have shown that these regions typically have Cα RMSD below 1 angstrom compared to experimental structures. You can confidently use these regions for:

  • Molecular docking and virtual screening
  • Active site analysis and substrate binding predictions
  • Mutation effect analysis (how point mutations alter structure)
  • Homology-based functional annotation

High Confidence (70–90): Trust the Backbone

The backbone fold is correct, but individual side-chain positions have meaningful uncertainty. This is common for surface-exposed residues where multiple rotamer states are energetically similar. These regions are suitable for:

  • Overall fold classification and topology analysis
  • Identifying secondary structure elements (α-helices, β-sheets)
  • Protein-protein interface identification (at the domain level)
  • Homology modeling template selection

Low Confidence (50–70): Interpret with Caution

These regions often correspond to flexible loops connecting secondary structure elements, or to regions where the model has limited evolutionary signal. The general backbone path may be approximately correct, but specific coordinates are unreliable. Common causes include limited homologous sequences in training data, genuine conformational flexibility, or crystal contacts that stabilize a loop in experimental structures but are absent in isolated prediction.

Very Low Confidence (< 50): Likely Disordered

Regions below pLDDT 50 almost always indicate intrinsically disordered regions – stretches of protein that do not adopt a stable three-dimensional structure. This is not a failure of the prediction model; the model is correctly recognizing that these regions are natively unstructured. Approximately 30–40% of the human proteome contains disordered regions, so encountering red regions is common and expected.

Tip
Low pLDDT is information, not noise. If you are studying intrinsically disordered proteins or identifying disordered linker regions, pLDDT below 50 is exactly the signal you want. Multiple studies have shown that pLDDT-based disorder prediction is competitive with dedicated tools like IUPred2A.

Using pLDDT in Research

Identifying Disordered Regions

One of the most impactful applications of pLDDT is rapid disorder prediction. Rather than running a separate disorder predictor, you can fold your protein with ESMFold and extract regions where pLDDT drops below 50. This is particularly useful when studying multi-domain proteins where you need to distinguish structured domains from disordered linkers.

Validating Predictions Before Downstream Analysis

Before using a predicted structure for docking, molecular dynamics, or any quantitative analysis, check the pLDDT in the regions that matter for your question. If you want to dock a ligand into a binding pocket, every residue lining that pocket should have pLDDT above 70 (ideally above 90). If key binding residues fall in low-confidence regions, the docking results will be unreliable regardless of how sophisticated the docking algorithm is.

Comparing Models

When you have predictions from multiple tools – say ESMFold for speed and AlphaFold2 for accuracy – pLDDT provides a consistent basis for comparison. If both models agree on high confidence for a region, you can be especially confident. If they disagree, the region warrants closer inspection. For a detailed comparison of these models, see our guide to ESMFold.

Extracting pLDDT Scores from the SciRouter API

When you fold a protein using SciRouter's API, the response includes per-residue pLDDT scores alongside the predicted structure. Here is a complete example that folds a protein and analyzes its confidence profile:

Fold a protein and extract pLDDT scores
import requests

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"

# Fold a short protein sequence
sequence = "MKWVTFISLLFLFSSAYSRGVFRRDAHKSEVAHRFKDLGE"

response = requests.post(
    f"{BASE}/proteins/fold",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "sequence": sequence,
        "model": "esmfold"
    }
)

result = response.json()
plddt_scores = result["plddt_scores"]
avg_plddt = result["average_plddt"]

print(f"Average pLDDT: {avg_plddt:.1f}")
print(f"Residues predicted: {len(plddt_scores)}")

# Classify regions by confidence
very_high = sum(1 for s in plddt_scores if s > 90)
high = sum(1 for s in plddt_scores if 70 < s <= 90)
low = sum(1 for s in plddt_scores if 50 < s <= 70)
very_low = sum(1 for s in plddt_scores if s <= 50)

print(f"Very high confidence (>90): {very_high} residues")
print(f"High confidence (70-90):    {high} residues")
print(f"Low confidence (50-70):     {low} residues")
print(f"Very low confidence (<50):  {very_low} residues")

Finding Disordered Regions Programmatically

You can use the pLDDT array to identify contiguous stretches of disorder automatically:

Identify disordered regions from pLDDT
def find_disordered_regions(plddt_scores, threshold=50, min_length=5):
    """Find contiguous regions where pLDDT is below threshold."""
    regions = []
    start = None

    for i, score in enumerate(plddt_scores):
        if score < threshold:
            if start is None:
                start = i
        else:
            if start is not None and (i - start) >= min_length:
                regions.append((start + 1, i))  # 1-indexed
            start = None

    # Handle region at the end of the sequence
    if start is not None and (len(plddt_scores) - start) >= min_length:
        regions.append((start + 1, len(plddt_scores)))

    return regions

# Using the pLDDT scores from the fold result
disordered = find_disordered_regions(plddt_scores)
for start, end in disordered:
    avg = sum(plddt_scores[start-1:end]) / (end - start + 1)
    print(f"Disordered region: residues {start}-{end} "
          f"(avg pLDDT: {avg:.1f})")
Tip
For a complete walkthrough of the protein folding API including job submission and polling for async results, see our protein structure prediction API guide.

pLDDT Across Different Models

Different prediction models produce different pLDDT distributions for the same protein. This is important to understand when comparing results:

  • AlphaFold2 – Tends to produce the highest pLDDT scores because it uses multiple sequence alignments (MSAs) that provide rich evolutionary context. Average pLDDT across the human proteome is approximately 80.
  • ESMFold – Operates on single sequences without MSAs, resulting in slightly lower pLDDT scores (typically 5–15 points lower than AlphaFold2 for equivalent regions). The trade-off is dramatically faster inference – seconds rather than minutes.
  • Boltz-2 – Designed for protein complexes and multi-chain predictions. pLDDT scores at interfaces between chains tend to be lower than for isolated chains, reflecting genuine uncertainty about binding geometry.
  • OmegaFold – Similar single-sequence approach to ESMFold with comparable pLDDT distributions. Useful as an independent validation when ESMFold results are borderline.

For a comprehensive comparison of these tools, see our ESMFold vs AlphaFold comparison.

Common Mistakes When Interpreting pLDDT

Even experienced researchers misinterpret pLDDT scores. Here are the most common pitfalls:

  • Treating low pLDDT as a model failure – Low scores in disordered regions are correct behavior. The model is telling you something real about the protein's biology.
  • Comparing pLDDT across different models without calibration – A pLDDT of 75 from ESMFold is not equivalent to 75 from AlphaFold2. Always compare within the same model or account for systematic differences.
  • Ignoring domain boundaries – pLDDT measures local confidence. Two domains can each have pLDDT above 90 while their relative orientation is completely wrong. Check the PAE matrix for inter-domain confidence.
  • Using low-confidence regions for docking – If your target binding site includes residues below pLDDT 70, docking results are unreliable. Either use an experimental structure for that region or acknowledge the limitation.
  • Averaging pLDDT as a single quality score – A protein with average pLDDT of 75 could have a well-folded core at 95 and disordered tails at 30. The average hides critical regional variation. Always examine the per-residue profile.

Next Steps

pLDDT is your most important guide when working with predicted protein structures. By understanding what the scores mean and applying them systematically, you can separate trustworthy structural insights from unreliable noise.

To put this into practice, fold your protein of interest with ESMFold and examine the per-residue pLDDT profile. Use the code examples above to identify disordered regions and validate that the regions relevant to your research question fall within the confidence range you need.

Ready to start folding? Sign up for a free SciRouter API key and predict your first protein structure in seconds.

Frequently Asked Questions

What does pLDDT stand for?

pLDDT stands for predicted Local Distance Difference Test. It is a per-residue confidence metric ranging from 0 to 100 that estimates how accurately a protein structure prediction model has placed each amino acid residue. The score is derived from the LDDT-Cα metric used to evaluate experimentally determined structures, adapted to work as a self-assessment during prediction.

What is a good pLDDT score?

A pLDDT score above 90 (colored dark blue) indicates very high confidence — the backbone and side-chain positions are likely correct. Scores between 70 and 90 (cyan) indicate a confident backbone prediction. Scores between 50 and 70 (orange) suggest low confidence, often corresponding to flexible loops or domains with limited evolutionary information. Scores below 50 (red) indicate very low confidence and the predicted structure should not be trusted.

Do ESMFold and AlphaFold2 use the same pLDDT scale?

Yes, both ESMFold and AlphaFold2 report pLDDT on the same 0–100 scale with the same interpretation. However, the two models produce different score distributions for the same proteins. AlphaFold2 tends to produce higher pLDDT scores overall because it uses multiple sequence alignments for additional evolutionary context. ESMFold scores are typically 5–15 points lower for equivalent regions, so a pLDDT of 75 from ESMFold may represent a prediction of similar quality to a pLDDT of 85 from AlphaFold2.

Can pLDDT identify intrinsically disordered regions?

Yes, pLDDT is one of the most effective computational tools for identifying intrinsically disordered regions (IDRs). Disordered regions lack a stable 3D structure by nature, so structure prediction models correctly assign them low confidence scores — typically below 50. Multiple studies have shown that pLDDT-based disorder prediction rivals or exceeds purpose-built disorder predictors like IUPred and MobiDB-lite.

Where are pLDDT scores stored in a PDB file?

In PDB files generated by AlphaFold2, ESMFold, and other structure prediction tools, pLDDT scores are stored in the B-factor column (columns 61–66 of ATOM records). This is a practical convention — the B-factor field was originally designed for crystallographic temperature factors, but since predicted structures have no experimental B-factors, the field is repurposed for confidence scores. Visualization tools like PyMOL and ChimeraX can color structures by B-factor to display pLDDT directly.

How do I get pLDDT scores from the SciRouter API?

When you call the /v1/proteins/fold endpoint, the response includes a plddt_scores array containing one confidence value per residue. The response also includes an average_plddt field for a quick overall assessment. You can also download the full PDB file from the pdb_url field, where pLDDT values are encoded in the B-factor column.

Try It Free

No Login Required

Try this yourself

500 free credits. No credit card required.