ProteinsAPI Guides

How to Predict Protein Structure from Sequence Using an API

Learn why protein structure matters, compare ESMFold, AlphaFold, and Boltz-2, and predict your first structure in 10 lines of Python via SciRouter's API.

Ryan Bethencourt
March 29, 2026
10 min read

Why Protein Structure Matters

Every protein in your body is a molecular machine, and its function is determined almost entirely by its three-dimensional shape. Hemoglobin carries oxygen because of the precise geometry of its heme-binding pocket. Antibodies recognize pathogens because their variable loops fold into complementary surfaces. Enzymes catalyze reactions because their active sites position substrates with sub-angstrom precision.

For decades, determining protein structure required experimental methods like X-ray crystallography or cryo-EM – techniques that cost thousands of dollars per structure and take months to years. The computational revolution that began with AlphaFold2 in 2020 changed this fundamentally. Today, you can predict a protein's structure from its amino acid sequence in seconds, and you can do it with a single API call.

This guide walks you through the landscape of protein structure prediction tools, explains the key concepts you need to understand, and shows you how to predict your first structure using SciRouter's API in about ten lines of Python.

The Protein Structure Prediction Landscape

Three tools dominate the current landscape. Each takes a fundamentally different approach to the same problem, and understanding those differences will help you choose the right tool for your work.

AlphaFold2: The MSA-Based Gold Standard

AlphaFold2, developed by DeepMind, uses multiple sequence alignments (MSAs) as its primary input signal. It searches large databases of protein sequences (UniRef90, MGnify, BFD) to find evolutionary relatives of your query protein, aligns them, and extracts co-evolutionary patterns. These patterns reveal which residues are spatially close in the 3D structure, because residues that contact each other tend to co-evolve.

The result is exceptional accuracy – AlphaFold2 achieved a median GDT-TS above 90 on CASP14 targets, essentially solving the protein folding problem for single-chain proteins with known homologs. The trade-off is speed: MSA construction requires searching terabytes of sequence databases, which takes minutes to hours per protein.

ESMFold: Speed Through Language Models

ESMFold from Meta AI takes a radically different approach. Instead of building MSAs, it uses ESM-2, a protein language model trained on millions of protein sequences. The model learns evolutionary information implicitly during pre-training, so at inference time it needs only the single input sequence – no database search required.

This makes ESMFold dramatically faster: a typical prediction completes in 5 to 15 seconds rather than minutes to hours. Accuracy is within striking distance of AlphaFold2 for proteins with many homologs, though it drops off for orphan proteins where the language model has less implicit evolutionary context. For high-throughput screening and rapid prototyping, ESMFold is often the best first choice.

Boltz-2: Complex Prediction for the Real World

Boltz-2 from MIT addresses a limitation of both AlphaFold2 and ESMFold: predicting multi-chain complexes. Proteins rarely act alone. They bind other proteins, small-molecule ligands, DNA, and RNA. Boltz-2 can model all of these interactions in a single prediction.

Boltz-2 accepts multiple chains as input and predicts how they arrange in space relative to each other. It handles protein-protein interfaces, protein-ligand binding, and protein-nucleic acid complexes. Prediction time ranges from 30 seconds to several minutes depending on the number and size of chains. For a deeper comparison of all three tools, see our ESMFold vs AlphaFold2 vs Boltz-2 comparison.

Key Concepts Before You Start

Amino Acid Sequences

Protein structure prediction starts with a sequence of amino acids represented as a string of single-letter codes. For example, the first 20 residues of human hemoglobin subunit alpha are MVLSPADKTNVKAAWGKVGA. There are 20 standard amino acids, each with a one-letter code (A for alanine, M for methionine, and so on). Your input sequence must use this standard alphabet.

pLDDT: Your Confidence Metric

Every prediction comes with a per-residue confidence score called pLDDT (predicted Local Distance Difference Test). This score ranges from 0 to 100 and tells you how reliable each part of the predicted structure is:

  • Above 90: High confidence. The backbone and side-chain positions are likely accurate.
  • 70 to 90: Moderate confidence. The backbone fold is probably correct, but side-chain details may vary.
  • 50 to 70: Low confidence. The predicted structure in this region should be interpreted with caution.
  • Below 50: Very low confidence. These regions are likely intrinsically disordered and do not adopt a stable 3D structure.
Tip
Low pLDDT scores are not failures – they are information. Intrinsically disordered regions play critical roles in signaling, regulation, and phase separation. A pLDDT map is one of the fastest ways to identify disordered regions in a protein.

PDB Format

Predicted structures are returned in PDB (Protein Data Bank) format, a text-based format that lists the 3D coordinates of every atom in the protein. PDB files can be visualized in tools like PyMOL, ChimeraX, Mol*, or any molecular viewer. The API returns the PDB content as a string that you can save directly to a file.

Your First Structure Prediction in 10 Lines of Python

Let's predict the structure of a real protein. We'll use the sequence of human ubiquitin, a small, well-characterized protein with 76 residues. This is a great test case because its structure is known experimentally, so you can verify the prediction.

Predict protein structure with ESMFold
import requests, time

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}

# Human ubiquitin sequence (76 residues)
sequence = "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG"

# Submit the folding job
job = requests.post(f"{BASE}/proteins/fold",
                    headers=headers,
                    json={"sequence": sequence, "model": "esmfold"}).json()

# Poll until complete
while job["status"] != "completed":
    time.sleep(2)
    job = requests.get(f"{BASE}/proteins/fold/{job['job_id']}",
                       headers=headers).json()

# Save the predicted structure
with open("ubiquitin_predicted.pdb", "w") as f:
    f.write(job["result"]["pdb_string"])

print(f"Mean pLDDT: {job['result']['mean_plddt']:.1f}")
print(f"Structure saved to ubiquitin_predicted.pdb")

That's it. The response includes the predicted 3D coordinates in PDB format, a mean pLDDT score for overall confidence, and per-residue pLDDT values so you can identify which regions are well-predicted and which are uncertain.

Understanding Your Results

Interpreting pLDDT Scores

For ubiquitin, you should see a mean pLDDT in the high 80s or 90s – this is a well-studied protein with many homologs, so ESMFold predicts it with high confidence. Here is how to extract and analyze the per-residue scores:

Analyze per-residue confidence
import requests

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}

# After job completes, inspect per-residue pLDDT
plddt_scores = job["result"]["plddt_per_residue"]

# Find high-confidence and low-confidence regions
high_conf = [i+1 for i, s in enumerate(plddt_scores) if s > 90]
low_conf = [i+1 for i, s in enumerate(plddt_scores) if s < 50]

print(f"High-confidence residues (>90): {len(high_conf)} of {len(plddt_scores)}")
print(f"Potentially disordered residues (<50): {len(low_conf)}")

# Identify stretches of disorder
if low_conf:
    print(f"Low-confidence positions: {low_conf}")
else:
    print("No disordered regions detected")

Visualizing the Structure

The PDB file you saved can be loaded into any molecular visualization tool. For quick inspection, web-based viewers like Mol* (used by the RCSB PDB) work well. For publication figures, PyMOL or ChimeraX give you more control over rendering. Color by pLDDT to see confidence mapped directly onto the structure – blue for high confidence, red for low.

Going Further: Batch Processing

One of the biggest advantages of API-based structure prediction is automation. Instead of submitting one sequence at a time through a web form, you can script batch predictions over hundreds or thousands of sequences:

Batch structure prediction
import requests, time

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}

sequences = {
    "ubiquitin": "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG",
    "insulin_a": "GIVEQCCTSICSLYQLENYCN",
    "lysozyme_fragment": "KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAK",
}

jobs = {}
for name, seq in sequences.items():
    resp = requests.post(f"{BASE}/proteins/fold", headers=headers,
                         json={"sequence": seq, "model": "esmfold"}).json()
    jobs[name] = resp["job_id"]
    print(f"Submitted {name}: job {resp['job_id']}")

# Poll all jobs
for name, job_id in jobs.items():
    while True:
        result = requests.get(f"{BASE}/proteins/fold/{job_id}",
                              headers=headers).json()
        if result["status"] == "completed":
            with open(f"{name}.pdb", "w") as f:
                f.write(result["result"]["pdb_string"])
            print(f"{name}: pLDDT = {result['result']['mean_plddt']:.1f}")
            break
        time.sleep(2)
Note
SciRouter's free tier includes 500 credits per month, which is enough for hundreds of ESMFold predictions. For larger batches, the Pro tier provides higher throughput and priority queuing.

When to Use Complex Prediction

If your protein interacts with other molecules – another protein chain, a small-molecule drug, or a nucleic acid – single-chain prediction only tells part of the story. The Boltz-2 endpoint on SciRouter lets you submit multiple chains and predict how they assemble into a complex:

Predict a protein complex with Boltz-2
import requests

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}

# Predict a two-chain complex
job = requests.post(f"{BASE}/proteins/fold", headers=headers,
                    json={
                        "sequences": [
                            "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH",
                            "MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLST"
                        ],
                        "model": "boltz2"
                    }).json()

print(f"Complex prediction submitted: {job['job_id']}")

Practical Tips for Better Predictions

  • Sequence length matters. ESMFold handles sequences up to about 1024 residues well. For longer proteins, consider splitting into domains.
  • Remove signal peptides. If your sequence includes a signal peptide or transit peptide, remove it before prediction. These regions are cleaved in vivo and will produce low-confidence noise.
  • Check for non-standard residues. Replace selenomethionine (U) with methionine (M) and pyrrolysine (O) with lysine (K) before submitting.
  • Use pLDDT for triage. If mean pLDDT is below 60, the prediction may not be reliable enough for downstream analysis like docking or active site characterization.
  • Validate against known structures. When possible, compare predictions against experimental structures in the PDB to calibrate your expectations.

Next Steps

Now that you can predict structures, consider what you can do with them. Predicted structures are inputs to molecular docking (finding how drugs bind), binding site analysis, protein engineering, and phylogenetic studies. Read our comparison of ESMFold, AlphaFold2, and Boltz-2 to understand which tool to use for different scenarios, or explore the ESMFold and Boltz-2 tool pages for detailed API documentation.

Sign up for a free API key and start predicting structures today. No GPU, no database downloads, no Docker containers – just send a sequence and get a structure back.

Frequently Asked Questions

What is protein structure prediction?

Protein structure prediction is the computational process of determining a protein's three-dimensional shape from its amino acid sequence. Since a protein's function depends on its 3D structure, predicting structure from sequence is one of the most important problems in computational biology.

How accurate is ESMFold compared to AlphaFold2?

ESMFold achieves accuracy within 5-10% of AlphaFold2 for well-studied protein families. On proteins with many known homologs, the difference is often negligible. ESMFold is less accurate on orphan proteins with few evolutionary relatives, where AlphaFold2's MSA-based approach provides more signal.

What is a pLDDT score?

pLDDT (predicted Local Distance Difference Test) is a per-residue confidence score ranging from 0 to 100. Scores above 90 indicate high confidence, 70-90 is moderate confidence, 50-70 suggests low confidence, and below 50 typically indicates disordered regions. It tells you how much to trust each part of the predicted structure.

How long does protein structure prediction take via API?

ESMFold predictions typically complete in 5-15 seconds depending on sequence length. Boltz-2 predictions for complexes take 30 seconds to several minutes. Both are significantly faster than running AlphaFold2 locally, which requires MSA construction taking minutes to hours.

Can I predict protein complexes through the API?

Yes. SciRouter's Boltz-2 endpoint supports multi-chain complex prediction, including protein-protein, protein-ligand, and protein-nucleic acid interactions. ESMFold is limited to single-chain prediction. Use ESMFold for fast single-chain work and Boltz-2 when you need complex modeling.

Do I need a GPU to predict protein structure?

Not when using an API. SciRouter runs ESMFold and Boltz-2 on hosted GPU infrastructure, so you just send a sequence and receive the predicted structure back. If running locally, ESMFold requires at least one GPU with 16 GB of VRAM, and AlphaFold2 typically needs an A100 or equivalent.

Try It Free

No Login Required

Try this yourself

500 free credits. No credit card required.