ProteinsESMFold

ESMFold Tutorial: Predict Protein Structure in 10 Lines of Python

Step-by-step tutorial to predict protein structure from amino acid sequence using ESMFold via SciRouter's API. Includes working Python code.

Ryan Bethencourt
March 18, 2026
6 min read

Prerequisites

Before you begin, you need two things: Python 3.7 or later and a SciRouter API key. If you do not have an API key yet, sign up at scirouter.ai/register to get 500 free credits per month with no credit card required.

We will use the requests library, which is the most common HTTP client for Python. Install it if you have not already:

Install dependencies
pip install requests

Step 1: Set Up Your API Key

Store your API key as an environment variable rather than hardcoding it in your scripts. This is a security best practice that prevents accidental exposure in version control.

Set your API key
export SCIROUTER_API_KEY="sk-sci-your-api-key-here"

Step 2: The Minimal Example

Here is the simplest possible ESMFold prediction in Python. This submits a sequence, waits for the result, and saves the predicted structure as a PDB file:

Minimal ESMFold prediction (10 lines)
import os, requests, time

API_KEY = os.environ["SCIROUTER_API_KEY"]
BASE = "https://api.scirouter.ai/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}

sequence = "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
job = requests.post(f"{BASE}/proteins/fold", headers=headers,
                    json={"sequence": sequence, "model": "esmfold"}).json()

while (r := requests.get(f"{BASE}/proteins/fold/{job['job_id']}", headers=headers).json())["status"] != "completed":
    time.sleep(2)

print(f"Mean pLDDT: {r['mean_plddt']:.1f}")
open("structure.pdb", "w").write(r["pdb"])
Tip
The sequence used here is a fragment of human hemoglobin alpha chain. Replace it with any valid amino acid sequence using the standard 20 single-letter codes.

Step 3: Production-Ready Example

The minimal example works but lacks error handling, timeouts, and input validation. Here is a robust version suitable for production scripts and automated pipelines:

Full example with error handling
import os
import requests
import time
import sys

API_KEY = os.environ.get("SCIROUTER_API_KEY")
if not API_KEY:
    print("Error: Set the SCIROUTER_API_KEY environment variable")
    sys.exit(1)

BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
VALID_AA = set("ACDEFGHIKLMNPQRSTVWY")
MAX_LENGTH = 1024
POLL_INTERVAL = 3  # seconds
TIMEOUT = 120      # seconds

def validate_sequence(seq: str) -> str:
    seq = seq.strip().upper()
    invalid = set(seq) - VALID_AA
    if invalid:
        raise ValueError(f"Invalid amino acids: {invalid}")
    if len(seq) > MAX_LENGTH:
        raise ValueError(f"Sequence too long ({len(seq)} > {MAX_LENGTH})")
    if len(seq) < 10:
        raise ValueError("Sequence too short (minimum 10 residues)")
    return seq

def fold_protein(sequence: str) -> dict:
    sequence = validate_sequence(sequence)

    # Submit job
    resp = requests.post(
        f"{BASE}/proteins/fold",
        headers=HEADERS,
        json={"sequence": sequence, "model": "esmfold"},
        timeout=30,
    )
    resp.raise_for_status()
    job_id = resp.json()["job_id"]
    print(f"Job submitted: {job_id}")

    # Poll for results
    start = time.time()
    while time.time() - start < TIMEOUT:
        result = requests.get(
            f"{BASE}/proteins/fold/{job_id}",
            headers=HEADERS,
            timeout=30,
        ).json()

        if result["status"] == "completed":
            return result
        elif result["status"] == "failed":
            raise RuntimeError(f"Folding failed: {result.get('error', 'unknown')}")

        time.sleep(POLL_INTERVAL)

    raise TimeoutError(f"Job {job_id} did not complete within {TIMEOUT}s")

if __name__ == "__main__":
    sequence = "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
    result = fold_protein(sequence)
    print(f"Mean pLDDT: {result['mean_plddt']:.1f}")

    with open("prediction.pdb", "w") as f:
        f.write(result["pdb"])
    print("Structure saved to prediction.pdb")

Understanding the Output

The API response includes three key pieces of data:

  • pdb: A PDB-format string containing 3D atomic coordinates for the predicted structure.
  • mean_plddt: The average confidence score across all residues (0 to 100).
  • plddt_per_residue: An array of per-residue confidence scores, useful for identifying well-folded vs. disordered regions.
Note
A mean pLDDT above 70 generally indicates a reliable fold. Scores above 90 suggest the prediction is accurate at near-experimental resolution. Scores below 50 often point to intrinsically disordered regions.

Step 4: Saving and Visualizing Results

The PDB file you saved can be visualized with any standard molecular viewer. For quick inspection in a Jupyter notebook, you can use NGLview:

Visualize in Jupyter with NGLview
# pip install nglview
import nglview as nv

view = nv.show_file("prediction.pdb")
view.add_representation("cartoon", color="bfactor")  # color by pLDDT
view

Coloring by B-factor (which stores the pLDDT values in ESMFold output) gives you an immediate visual indication of prediction confidence: blue regions are high-confidence and red regions are low-confidence.

Next Steps

Now that you can predict structures, consider exploring related tools on SciRouter. Use ESMFold for rapid single-chain prediction, or read our complete ESMFold guide to understand the science behind the model. If you need to model protein complexes, check out the ESMFold vs AlphaFold vs Boltz-2 comparison to find the right tool for your use case.

Sign up for a free SciRouter API key and start predicting protein structures in minutes. With 500 free credits per month and no infrastructure to manage, it is the fastest way to go from sequence to structure.

Frequently Asked Questions

How long does protein folding take with ESMFold?

Most sequences under 500 residues complete in 5 to 15 seconds. Longer sequences (up to 1024 residues) may take 15 to 30 seconds. The API returns a job ID immediately so your application is not blocked during inference.

What format is the output?

ESMFold returns a PDB (Protein Data Bank) file as a text string, along with a mean pLDDT confidence score and per-residue pLDDT values. The PDB file contains 3D atomic coordinates that can be opened in any molecular viewer.

What are the rate limits for the SciRouter API?

Free tier accounts receive 500 credits per month. Each ESMFold prediction costs 1 credit. Pro tier accounts have higher limits. Rate limiting is applied per API key with a sliding window to prevent burst abuse.

How do I visualize the PDB output?

You can view PDB files using free tools like PyMOL, ChimeraX, Mol* (web-based), or NGLview in Jupyter notebooks. For quick visualization, upload your PDB file to the RCSB 3D viewer at rcsb.org.

Can I fold multiple sequences in a batch?

Yes, submit multiple folding jobs in parallel by making separate POST requests for each sequence. Each returns an independent job ID that you can poll individually. This is the recommended approach for batch processing.

Try It Free

No Login Required

Try this yourself

500 free credits. No credit card required.