The Problem: Protein Folding Requires Serious Hardware
Running ESMFold locally means downloading a 700 million parameter model, installing PyTorch with CUDA support, provisioning a GPU with at least 16 GB of VRAM, and debugging cryptic dependency conflicts between torch, fair-esm, and openfold. On a typical workstation without an NVIDIA A100 or equivalent, inference either fails outright or takes minutes per sequence instead of seconds.
For researchers who just want a PDB file from a sequence, this setup overhead is a barrier. For software engineers integrating structure prediction into a pipeline, it is a maintenance burden. And for AI agents that need to call protein folding as a tool, it is simply not practical.
The Solution: Call ESMFold as an API
SciRouter hosts ESMFold on dedicated A100 GPUs and exposes it as a simple REST endpoint. You send an amino acid sequence, the server runs inference, and you get back a PDB structure file with per-residue confidence scores. No GPU provisioning, no model downloads, no dependency management.
Here is what the full workflow looks like: install the SDK, set your API key, and call one function. Three lines of meaningful code.
Prerequisites
You need Python 3.7 or later and a SciRouter API key. Sign up at scirouter.ai/register to get 500 free credits per month with no credit card required.
Step 1: Install the SciRouter SDK
The SDK is a thin wrapper around the REST API that handles authentication, polling for async job results, and type-safe response parsing.
pip install scirouterStep 2: Set Your API Key
Store your API key as an environment variable. The SDK reads it automatically from SCIROUTER_API_KEY so you never need to hardcode it.
export SCIROUTER_API_KEY="sk-sci-your-api-key-here"Step 3: Fold a Protein in Three Lines
This is the minimal example. It sends a hemoglobin alpha chain fragment to ESMFold and saves the predicted structure as a PDB file.
from scirouter import SciRouter
client = SciRouter() # reads SCIROUTER_API_KEY from env
result = client.proteins.fold(sequence="MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH")
print(f"Confidence (pLDDT): {result.mean_plddt:.1f}")
open("structure.pdb", "w").write(result.pdb)What You Get Back
The result object contains everything you need to work with the predicted structure:
- result.pdb: A PDB-format string with 3D atomic coordinates for every atom in the predicted structure.
- result.mean_plddt: The average predicted Local Distance Difference Test score across all residues, ranging from 0 to 100.
- result.plddt_per_residue: A list of per-residue confidence scores, useful for identifying well-folded regions versus disordered loops.
- result.job_id: A unique identifier you can use to retrieve the result later.
Comparison: Local ESMFold vs API
To put the convenience in perspective, here is what running ESMFold locally requires compared to the API approach:
Local Setup
- NVIDIA GPU with 16 GB+ VRAM (A100 recommended, consumer GPUs may run out of memory on longer sequences)
- CUDA 11.7 or 12.x installed and configured
- PyTorch 2.x with matching CUDA version
- fair-esm library with ESMFold dependencies (openfold, biopython, einops)
- 700 MB+ of model weights downloaded on first run
- Docker or conda environment to isolate dependencies
- 15 to 60 minutes of setup time for an experienced engineer
API Setup
- Python 3.7+ on any machine (no GPU needed)
- One pip install command
- One environment variable
- Under 2 minutes from start to first prediction
Production-Ready Example with Error Handling
The minimal example works for quick experiments, but production code should validate input, handle errors, and manage timeouts. Here is a complete example:
import os
import sys
from scirouter import SciRouter
from scirouter.exceptions import SciRouterError, ValidationError, TimeoutError
# Validate environment
api_key = os.environ.get("SCIROUTER_API_KEY")
if not api_key:
print("Error: Set the SCIROUTER_API_KEY environment variable")
sys.exit(1)
client = SciRouter(api_key=api_key)
VALID_AA = set("ACDEFGHIKLMNPQRSTVWY")
MAX_LENGTH = 1024
def fold_protein(sequence: str) -> dict:
"""Fold a protein sequence and return structured results."""
# Validate input
sequence = sequence.strip().upper()
invalid = set(sequence) - VALID_AA
if invalid:
raise ValueError(f"Invalid amino acids: {invalid}")
if len(sequence) > MAX_LENGTH:
raise ValueError(f"Sequence too long: {len(sequence)} > {MAX_LENGTH}")
if len(sequence) < 10:
raise ValueError("Sequence too short: minimum 10 residues")
# Call the API
try:
result = client.proteins.fold(
sequence=sequence,
model="esmfold",
timeout=120,
)
except ValidationError as e:
print(f"Input rejected by API: {e}")
raise
except TimeoutError:
print("Prediction timed out — try a shorter sequence")
raise
except SciRouterError as e:
print(f"API error: {e}")
raise
return {
"pdb": result.pdb,
"mean_plddt": result.mean_plddt,
"residue_scores": result.plddt_per_residue,
"job_id": result.job_id,
}
if __name__ == "__main__":
seq = "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH"
output = fold_protein(seq)
print(f"Mean pLDDT: {output['mean_plddt']:.1f}")
with open("prediction.pdb", "w") as f:
f.write(output["pdb"])
print("Structure saved to prediction.pdb")Batch Folding: Multiple Sequences
For batch processing, submit multiple fold requests concurrently. The API handles each as an independent job. Here is a pattern using concurrent futures:
from concurrent.futures import ThreadPoolExecutor, as_completed
from scirouter import SciRouter
client = SciRouter()
sequences = {
"hemoglobin_alpha": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH",
"insulin_b_chain": "FVNQHLCGSHLVEALYLVCGERGFFYTPKT",
"lysozyme_fragment": "KVFGRCELAAALKRHGLDNYRGYSLGNWVCAAK",
}
def fold_one(name, seq):
result = client.proteins.fold(sequence=seq)
return name, result
results = {}
with ThreadPoolExecutor(max_workers=5) as pool:
futures = {pool.submit(fold_one, n, s): n for n, s in sequences.items()}
for future in as_completed(futures):
name, result = future.result()
results[name] = result
print(f"{name}: pLDDT = {result.mean_plddt:.1f}")
with open(f"{name}.pdb", "w") as f:
f.write(result.pdb)
print(f"Folded {len(results)} proteins")Visualizing the Output
The PDB file you receive can be opened in any molecular viewer. For quick inspection in a Jupyter notebook, NGLview renders the structure inline:
# pip install nglview
import nglview as nv
view = nv.show_file("prediction.pdb")
view.add_representation("cartoon", color="bfactor") # color by pLDDT
viewColoring by B-factor maps the pLDDT values to a color gradient: blue regions are high-confidence, red regions are low-confidence. This gives you immediate visual feedback on which parts of the structure are reliable.
Using the REST API Directly
If you prefer not to use the SDK, you can call the REST endpoint directly with any HTTP client. This is useful for non-Python environments or when integrating into existing infrastructure:
import os, requests, time
API_KEY = os.environ["SCIROUTER_API_KEY"]
BASE = "https://api.scirouter.ai/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}
# Submit the folding job
resp = requests.post(
f"{BASE}/proteins/fold",
headers=headers,
json={"sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH", "model": "esmfold"},
)
job_id = resp.json()["job_id"]
# Poll until complete
while True:
result = requests.get(f"{BASE}/proteins/fold/{job_id}", headers=headers).json()
if result["status"] == "completed":
break
if result["status"] == "failed":
raise RuntimeError(result.get("error", "Unknown error"))
time.sleep(3)
print(f"pLDDT: {result['mean_plddt']:.1f}")
open("structure.pdb", "w").write(result["pdb"])When to Use ESMFold vs Other Models
ESMFold is the right choice when you need fast, single-chain structure prediction and do not require multi-chain complex modeling. Here is a quick decision guide:
- ESMFold: Single chains, fast turnaround (5-15s), no MSA needed. Best for screening, pipelines, and quick checks.
- Boltz-2: Multi-chain complexes, protein-ligand interactions, protein-DNA/RNA. Slower but handles complex inputs.
- AlphaFold2: Highest accuracy single-chain prediction when you can wait for MSA computation. Not available via SciRouter yet.
Next Steps
Now that you can predict protein structures programmatically, explore related tools on SciRouter. Use ESMFold for structure prediction, then feed the PDB output into molecular docking with DiffDock or design optimized sequences with ProteinMPNN.
Sign up for a free SciRouter API key at scirouter.ai/register and start predicting protein structures in under two minutes. No GPU required.