Two Paths to the Same Architecture
AlphaFold2 changed structural biology overnight when it won CASP14 in 2020. But the original codebase only included inference code and pre-trained weights. The training pipeline — the part that lets you build the model from scratch on your own data — was never released. OpenFold fills that gap. Developed at Columbia University, it is a faithful reimplementation of AlphaFold2 that includes the full training stack, released under a permissive open-source license.
This comparison breaks down the practical differences between OpenFold and AlphaFold for researchers and engineers deciding which to use in their protein structure prediction workflows.
What Is AlphaFold?
AlphaFold is Google DeepMind's protein structure prediction system. AlphaFold2, published in 2021, uses an Evoformer attention architecture combined with multiple sequence alignments (MSAs) to predict 3D protein structure from amino acid sequence with near-experimental accuracy.
- Developer: Google DeepMind
- Architecture: Evoformer with MSA and pair representations
- Training code: not released
- Inference code: open source (Apache 2.0 as of 2022)
- Pre-trained weights: available for download
- AlphaFold DB: 200M+ predicted structures freely available
AlphaFold2 vs AlphaFold3
AlphaFold3, published in May 2024, extends the system to predict biomolecular complexes including proteins, DNA, RNA, small molecules, and ions. It uses a diffusion-based architecture instead of the Evoformer structure module. However, AlphaFold3 weights are not publicly available, and the AlphaFold Server restricts usage to non-commercial academic research with daily prediction limits.
What Is OpenFold?
OpenFold is a complete, trainable reimplementation of AlphaFold2 built by researchers at Columbia University. It reproduces the Evoformer architecture in PyTorch (AlphaFold2 uses JAX) and includes the full training pipeline, data processing scripts, and pre-trained weights.
- Developer: Columbia University (Gustaf Ahdritz, Nazim Bouatta et al.)
- Architecture: Evoformer (same as AlphaFold2), implemented in PyTorch
- Training code: fully released and documented
- License: Apache 2.0 — no restrictions on commercial use
- Fine-tuning: supported on custom datasets
- Framework: PyTorch (vs AlphaFold2's JAX)
Head-to-Head Comparison
Accuracy
OpenFold matches AlphaFold2 in prediction accuracy. Both achieve median GDT-TS scores above 90 on CASP14 targets and comparable lDDT scores on CAMEO continuous evaluation. Since OpenFold faithfully reproduces the Evoformer architecture, the accuracy differences are within noise for most practical applications.
Speed
Inference speed is similar for both models — typically 30 seconds to several minutes per protein depending on sequence length and MSA depth. Both require a GPU with at least 16 GB of VRAM for standard-length proteins. OpenFold's PyTorch implementation can take advantage of PyTorch- native optimizations and mixed-precision training.
Licensing and Commercial Use
This is where OpenFold has a clear advantage. The Apache 2.0 license permits unrestricted commercial use, modification, and redistribution. AlphaFold2's inference code is also now Apache 2.0, but the training pipeline remains proprietary. For organizations that need to retrain or fine-tune the model, OpenFold is the only viable option.
Training and Customization
OpenFold's full training stack means you can retrain the model from scratch on proprietary datasets, fine-tune on specific protein families, or experiment with architectural modifications. This is critical for pharmaceutical companies working with confidential structural data or researchers studying specialized protein classes like membrane proteins or disordered regions.
Community and Ecosystem
AlphaFold has the larger community, more published benchmarks, and the AlphaFold Protein Structure Database with over 200 million predicted structures. OpenFold benefits from being part of the broader PyTorch ecosystem and is actively maintained with contributions from academic and industry researchers.
When to Use OpenFold
- You need to fine-tune a protein structure model on proprietary or specialized data
- You are building a commercial product and need clear licensing terms
- You want to modify the architecture for research purposes
- Your team works in PyTorch and wants native integration
- You need to retrain from scratch on a curated training set
When to Use AlphaFold
- You need a quick prediction using pre-trained weights without customization
- You want to query the AlphaFold DB for pre-computed structures (no inference needed)
- You are doing non-commercial academic research and want the most widely cited tool
- You need complex prediction (AlphaFold3 via the AlphaFold Server, with restrictions)
Other Alternatives Worth Knowing
OpenFold and AlphaFold are not the only options. Several other models offer different tradeoffs:
- ESMFold: uses a protein language model instead of MSAs, producing predictions in seconds rather than minutes. Less accurate than AlphaFold2 on difficult targets but dramatically faster for high-throughput screening
- Boltz-2: open-source complex prediction model comparable to AlphaFold3, supporting proteins, ligands, DNA, and RNA. Best choice for protein-ligand binding pose prediction
- OmegaFold: another single-sequence structure prediction model, similar in concept to ESMFold with competitive accuracy on well-folded proteins
Access Protein Folding via SciRouter API
Instead of managing GPU infrastructure and model installations locally, you can call protein structure prediction models through the SciRouter API. No local GPU, no Docker setup, no model weights to download.
import requests
API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}
# Predict structure from amino acid sequence
response = requests.post(
f"{BASE}/proteins/fold",
headers=headers,
json={
"model": "esmfold",
"sequence": "MKFLILLFNILCLFPVLAADNHGVSLHCTTATAKALQE"
}
)
result = response.json()
print(f"Model: {result['model']}")
print(f"Mean pLDDT: {result['mean_plddt']:.1f}")
# Save the predicted structure
with open("predicted.pdb", "w") as f:
f.write(result["pdb_string"])SciRouter provides API access to ESMFold for fast single-chain folding, Boltz-2 for complex prediction, and protein embeddings for downstream ML tasks. All models run on managed GPU infrastructure with no setup required.
The Bottom Line
OpenFold and AlphaFold2 produce equivalent results for protein structure prediction. The choice between them comes down to what you need beyond inference. If you need to retrain, fine-tune, or deploy commercially, OpenFold's Apache 2.0 license and full training pipeline make it the clear choice. If you just need a quick prediction from pre-trained weights and the AlphaFold DB has your protein already, AlphaFold is the path of least resistance.
For teams that want accurate protein folding without managing any infrastructure, SciRouter provides API access to ESMFold and Boltz-2 with a single API key. Check out the protein structure API tutorial to get started.