What Are Proteins?
Proteins are the molecular machines that do almost everything in living organisms. They catalyze chemical reactions, fight infections, carry oxygen through your blood, give structure to your muscles, and send signals between your cells. Your body contains roughly 20,000 different types of proteins, and each one has a specific job.
At the most basic level, a protein is a chain of small building blocks called amino acids. There are 20 different amino acids, and a typical protein is a chain of 200 to 500 of them strung together in a specific order. This order – determined by your DNA – is called the protein's sequence. You can write a protein sequence as a string of letters, where each letter represents one amino acid. For example, the sequence MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH is the beginning of human hemoglobin, the protein that carries oxygen in your red blood cells.
Why Shape Matters
Here is the crucial insight: a protein's function is determined not just by what amino acids it contains, but by the 3D shape it folds into. Think of it like origami – the same flat sheet of paper can become a crane, a frog, or a box depending on how you fold it. Similarly, the same chain of amino acids does completely different things depending on its final 3D shape.
This shape-function relationship is why proteins are at the center of drug discovery. Most drugs work by binding to a specific protein in the body and changing its behavior. To design a drug that fits a protein target, you need to know the protein's 3D structure – just like you need to know the shape of a lock before you can cut a key for it. Without the structure, drug design is largely guesswork.
Examples of Shape Determining Function
- Antibodies fold into a Y-shape that lets them grab onto viruses and bacteria. The tips of the Y are uniquely shaped to recognize specific threats
- Enzymes fold to create a pocket (the active site) with exactly the right shape and chemistry to catalyze a specific reaction
- Hemoglobin folds into a quaternary structure with four subunits, each containing an iron-holding heme group perfectly positioned to bind oxygen
- Collagen folds into a triple helix that provides structural strength to skin, tendons, and bones
The Protein Folding Problem
If a protein's shape determines its function, then predicting the shape from the sequence should be straightforward, right? Scientists have known protein sequences since the 1950s. But predicting how those sequences fold into 3D structures turned out to be one of the hardest problems in all of biology.
The difficulty is captured by what is known as Levinthal's paradox, proposed by molecular biologist Cyrus Levinthal in 1969. Consider a small protein with just 100 amino acids. Each amino acid can rotate around two chemical bonds, giving it roughly 3 possible orientations. That means the total number of possible configurations for the whole protein is approximately 3 raised to the power of 200 – a number so large that it dwarfs the number of atoms in the observable universe.
If a protein tried one configuration every picosecond (a trillionth of a second), it would take longer than the age of the universe to sample them all. Yet real proteins fold into their correct shape in milliseconds to seconds. Nature has clearly found a shortcut that does not involve trying every possibility. For over 50 years, scientists tried to figure out what that shortcut is and how to replicate it computationally.
How AI Solved It: The AlphaFold Breakthrough
The breakthrough came in November 2020 at CASP14, a biennial competition where teams predict protein structures from sequence alone and are judged against experimentally determined structures. DeepMind's AlphaFold2 achieved accuracy comparable to experimental methods – essentially solving the structure prediction problem for most single-chain proteins.
AlphaFold2 works by combining two key ideas. First, it uses multiple sequence alignments (MSAs): by comparing a target protein's sequence with thousands of related sequences from other organisms, it identifies which amino acid positions evolve together, which reveals which positions are close together in the 3D structure. Second, it uses a deep neural network (specifically, a transformer architecture) trained on approximately 170,000 known protein structures from the Protein Data Bank.
The result was transformative. Before AlphaFold2, determining a single protein structure experimentally could take months to years and cost tens of thousands of dollars. Now, a computational prediction of comparable quality takes minutes and costs essentially nothing.
The Tools: ESMFold, AlphaFold, and Boltz-2
Since AlphaFold2, several other tools have emerged, each with different strengths. Here are the three most important ones to know:
ESMFold – Speed Champion
ESMFold from Meta AI takes a fundamentally different approach. Instead of building MSAs (which is slow), it uses a protein language model – a neural network trained on millions of protein sequences that has learned to understand protein “grammar” the way GPT understands human language. Because it only needs the input sequence (no database search), ESMFold predicts structures in seconds, not minutes. The trade-off is slightly lower accuracy – typically 5 to 15 percent below AlphaFold2.
- Speed: 1–5 seconds per protein
- Best for: Rapid screening, proteome-scale analysis, interactive exploration
- Limitation: Single chains only, somewhat less accurate
AlphaFold2 – Accuracy Champion
AlphaFold2 remains the gold standard for prediction accuracy on single-chain proteins. The AlphaFold Protein Structure Database now covers over 200 million proteins – essentially every known protein sequence. For custom sequences not in the database, you can run AlphaFold2 through ColabFold.
- Speed: Minutes to hours (MSA step is the bottleneck)
- Best for: High-stakes predictions where accuracy is paramount
- Limitation: Slow, heavy infrastructure requirements
Boltz-2 – Complex Prediction Champion
Boltz-2 from MIT extends prediction beyond single chains. It can predict the structures of protein-protein complexes, protein-ligand complexes (a protein bound to a drug molecule), protein-DNA complexes, and more. This fills the gap left by AlphaFold3, which is not fully open source.
- Speed: Minutes per complex
- Best for: Multi-chain complexes, protein-ligand structures, drug target analysis
- Limitation: Newer tool, requires significant GPU resources locally
Try It Yourself: Fold a Protein in 10 Seconds
You do not need a biology lab, a GPU, or even a deep understanding of protein science to fold a protein. With SciRouter's API, you can send an amino acid sequence and receive a 3D structure back in seconds. Here is a complete example:
import requests
API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
# Human insulin B-chain (30 amino acids)
sequence = "FVNQHLCGSHLVEALYLVCGERGFFYTPKT"
response = requests.post(
f"{BASE}/proteins/fold",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"sequence": sequence,
"model": "esmfold"
}
)
result = response.json()
print(f"Structure predicted in seconds!")
print(f"Average confidence (pLDDT): {result['average_plddt']:.1f}/100")
print(f"Download PDB: {result['pdb_url']}")
# Interpret confidence
plddt = result["average_plddt"]
if plddt > 90:
print("Excellent - high confidence prediction")
elif plddt > 70:
print("Good - backbone likely correct")
elif plddt > 50:
print("Fair - use with caution")
else:
print("Low confidence - likely disordered region")The pLDDT score (predicted Local Distance Difference Test) tells you how confident the model is in its prediction on a scale of 0 to 100. Scores above 90 indicate high confidence, 70–90 is good, and below 50 suggests the region may be naturally disordered (floppy, without a fixed structure). For a detailed guide to interpreting these scores, see our ESMFold guide.
What Happens When Folding Goes Wrong
When proteins misfold – adopting the wrong 3D shape – the consequences can be severe. Misfolded proteins can aggregate into clumps that damage cells and tissues. Several major diseases are caused by protein misfolding:
- Alzheimer's disease: Amyloid-beta proteins misfold and aggregate into plaques in the brain
- Parkinson's disease: Alpha-synuclein proteins misfold and form Lewy bodies in neurons
- Cystic fibrosis: A single amino acid deletion causes the CFTR protein to misfold and be degraded before reaching the cell surface
- Sickle cell disease: A single amino acid change causes hemoglobin to misfold under low-oxygen conditions, distorting red blood cells
- Prion diseases: Normal prion proteins refold into a pathogenic conformation that is infectious and transmissible
Understanding protein folding is not just an academic exercise – it is directly relevant to understanding and treating disease.
What Comes Next
Protein structure prediction is a solved problem for most practical purposes, but the field is still advancing rapidly. Current frontiers include:
- Protein design: Instead of predicting the fold of a natural protein, design entirely new proteins with desired functions (pioneered by David Baker's lab)
- Complex prediction: Predicting how multiple proteins interact with each other and with small molecules (Boltz-2, AlphaFold3)
- Dynamics: Proteins are not static – they flex and move. Predicting protein motion and conformational changes is the next frontier
- Drug discovery integration: Using predicted structures directly for drug design, from target identification through lead optimization
Getting Started
Protein folding has gone from an unsolvable mystery to a tool you can use in seconds. Whether you are a biology student trying to understand your first protein, a developer building a biotech application, or a researcher screening drug targets, the tools are now accessible to everyone.
To fold your first protein, grab a free SciRouter API key and try the code example above. Sign up here – no credit card required. For a deeper dive into the tools, see our comparison of ESMFold and Boltz-2.