ProteinsESMFold

Protein Folding Explained: A Complete Beginner's Guide

What is protein folding, why it matters, and how AI solved one of biology's grand challenges. A friendly guide for non-scientists with interactive examples.

Ryan Bethencourt
May 3, 2026
10 min read

What Are Proteins?

Proteins are the molecular machines that do almost everything in living organisms. They catalyze chemical reactions, fight infections, carry oxygen through your blood, give structure to your muscles, and send signals between your cells. Your body contains roughly 20,000 different types of proteins, and each one has a specific job.

At the most basic level, a protein is a chain of small building blocks called amino acids. There are 20 different amino acids, and a typical protein is a chain of 200 to 500 of them strung together in a specific order. This order – determined by your DNA – is called the protein's sequence. You can write a protein sequence as a string of letters, where each letter represents one amino acid. For example, the sequence MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH is the beginning of human hemoglobin, the protein that carries oxygen in your red blood cells.

Why Shape Matters

Here is the crucial insight: a protein's function is determined not just by what amino acids it contains, but by the 3D shape it folds into. Think of it like origami – the same flat sheet of paper can become a crane, a frog, or a box depending on how you fold it. Similarly, the same chain of amino acids does completely different things depending on its final 3D shape.

This shape-function relationship is why proteins are at the center of drug discovery. Most drugs work by binding to a specific protein in the body and changing its behavior. To design a drug that fits a protein target, you need to know the protein's 3D structure – just like you need to know the shape of a lock before you can cut a key for it. Without the structure, drug design is largely guesswork.

Examples of Shape Determining Function

  • Antibodies fold into a Y-shape that lets them grab onto viruses and bacteria. The tips of the Y are uniquely shaped to recognize specific threats
  • Enzymes fold to create a pocket (the active site) with exactly the right shape and chemistry to catalyze a specific reaction
  • Hemoglobin folds into a quaternary structure with four subunits, each containing an iron-holding heme group perfectly positioned to bind oxygen
  • Collagen folds into a triple helix that provides structural strength to skin, tendons, and bones

The Protein Folding Problem

If a protein's shape determines its function, then predicting the shape from the sequence should be straightforward, right? Scientists have known protein sequences since the 1950s. But predicting how those sequences fold into 3D structures turned out to be one of the hardest problems in all of biology.

The difficulty is captured by what is known as Levinthal's paradox, proposed by molecular biologist Cyrus Levinthal in 1969. Consider a small protein with just 100 amino acids. Each amino acid can rotate around two chemical bonds, giving it roughly 3 possible orientations. That means the total number of possible configurations for the whole protein is approximately 3 raised to the power of 200 – a number so large that it dwarfs the number of atoms in the observable universe.

If a protein tried one configuration every picosecond (a trillionth of a second), it would take longer than the age of the universe to sample them all. Yet real proteins fold into their correct shape in milliseconds to seconds. Nature has clearly found a shortcut that does not involve trying every possibility. For over 50 years, scientists tried to figure out what that shortcut is and how to replicate it computationally.

How AI Solved It: The AlphaFold Breakthrough

The breakthrough came in November 2020 at CASP14, a biennial competition where teams predict protein structures from sequence alone and are judged against experimentally determined structures. DeepMind's AlphaFold2 achieved accuracy comparable to experimental methods – essentially solving the structure prediction problem for most single-chain proteins.

AlphaFold2 works by combining two key ideas. First, it uses multiple sequence alignments (MSAs): by comparing a target protein's sequence with thousands of related sequences from other organisms, it identifies which amino acid positions evolve together, which reveals which positions are close together in the 3D structure. Second, it uses a deep neural network (specifically, a transformer architecture) trained on approximately 170,000 known protein structures from the Protein Data Bank.

The result was transformative. Before AlphaFold2, determining a single protein structure experimentally could take months to years and cost tens of thousands of dollars. Now, a computational prediction of comparable quality takes minutes and costs essentially nothing.

Note
The AlphaFold breakthrough was recognized with the 2024 Nobel Prize in Chemistry, awarded to Demis Hassabis and John Jumper of DeepMind alongside David Baker of the University of Washington for their work on protein structure prediction and design.

The Tools: ESMFold, AlphaFold, and Boltz-2

Since AlphaFold2, several other tools have emerged, each with different strengths. Here are the three most important ones to know:

ESMFold – Speed Champion

ESMFold from Meta AI takes a fundamentally different approach. Instead of building MSAs (which is slow), it uses a protein language model – a neural network trained on millions of protein sequences that has learned to understand protein “grammar” the way GPT understands human language. Because it only needs the input sequence (no database search), ESMFold predicts structures in seconds, not minutes. The trade-off is slightly lower accuracy – typically 5 to 15 percent below AlphaFold2.

  • Speed: 1–5 seconds per protein
  • Best for: Rapid screening, proteome-scale analysis, interactive exploration
  • Limitation: Single chains only, somewhat less accurate

AlphaFold2 – Accuracy Champion

AlphaFold2 remains the gold standard for prediction accuracy on single-chain proteins. The AlphaFold Protein Structure Database now covers over 200 million proteins – essentially every known protein sequence. For custom sequences not in the database, you can run AlphaFold2 through ColabFold.

  • Speed: Minutes to hours (MSA step is the bottleneck)
  • Best for: High-stakes predictions where accuracy is paramount
  • Limitation: Slow, heavy infrastructure requirements

Boltz-2 – Complex Prediction Champion

Boltz-2 from MIT extends prediction beyond single chains. It can predict the structures of protein-protein complexes, protein-ligand complexes (a protein bound to a drug molecule), protein-DNA complexes, and more. This fills the gap left by AlphaFold3, which is not fully open source.

  • Speed: Minutes per complex
  • Best for: Multi-chain complexes, protein-ligand structures, drug target analysis
  • Limitation: Newer tool, requires significant GPU resources locally

Try It Yourself: Fold a Protein in 10 Seconds

You do not need a biology lab, a GPU, or even a deep understanding of protein science to fold a protein. With SciRouter's API, you can send an amino acid sequence and receive a 3D structure back in seconds. Here is a complete example:

Fold your first protein with ESMFold via SciRouter
import requests

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"

# Human insulin B-chain (30 amino acids)
sequence = "FVNQHLCGSHLVEALYLVCGERGFFYTPKT"

response = requests.post(
    f"{BASE}/proteins/fold",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "sequence": sequence,
        "model": "esmfold"
    }
)

result = response.json()
print(f"Structure predicted in seconds!")
print(f"Average confidence (pLDDT): {result['average_plddt']:.1f}/100")
print(f"Download PDB: {result['pdb_url']}")

# Interpret confidence
plddt = result["average_plddt"]
if plddt > 90:
    print("Excellent - high confidence prediction")
elif plddt > 70:
    print("Good - backbone likely correct")
elif plddt > 50:
    print("Fair - use with caution")
else:
    print("Low confidence - likely disordered region")

The pLDDT score (predicted Local Distance Difference Test) tells you how confident the model is in its prediction on a scale of 0 to 100. Scores above 90 indicate high confidence, 70–90 is good, and below 50 suggests the region may be naturally disordered (floppy, without a fixed structure). For a detailed guide to interpreting these scores, see our ESMFold guide.

What Happens When Folding Goes Wrong

When proteins misfold – adopting the wrong 3D shape – the consequences can be severe. Misfolded proteins can aggregate into clumps that damage cells and tissues. Several major diseases are caused by protein misfolding:

  • Alzheimer's disease: Amyloid-beta proteins misfold and aggregate into plaques in the brain
  • Parkinson's disease: Alpha-synuclein proteins misfold and form Lewy bodies in neurons
  • Cystic fibrosis: A single amino acid deletion causes the CFTR protein to misfold and be degraded before reaching the cell surface
  • Sickle cell disease: A single amino acid change causes hemoglobin to misfold under low-oxygen conditions, distorting red blood cells
  • Prion diseases: Normal prion proteins refold into a pathogenic conformation that is infectious and transmissible

Understanding protein folding is not just an academic exercise – it is directly relevant to understanding and treating disease.

What Comes Next

Protein structure prediction is a solved problem for most practical purposes, but the field is still advancing rapidly. Current frontiers include:

  • Protein design: Instead of predicting the fold of a natural protein, design entirely new proteins with desired functions (pioneered by David Baker's lab)
  • Complex prediction: Predicting how multiple proteins interact with each other and with small molecules (Boltz-2, AlphaFold3)
  • Dynamics: Proteins are not static – they flex and move. Predicting protein motion and conformational changes is the next frontier
  • Drug discovery integration: Using predicted structures directly for drug design, from target identification through lead optimization
Tip
If you are new to protein folding, the best way to build intuition is to try it yourself. Fold a few proteins, look at the structures, and compare the confidence scores. SciRouter's free tier gives you 5,000 API calls per month – enough to explore extensively.

Getting Started

Protein folding has gone from an unsolvable mystery to a tool you can use in seconds. Whether you are a biology student trying to understand your first protein, a developer building a biotech application, or a researcher screening drug targets, the tools are now accessible to everyone.

To fold your first protein, grab a free SciRouter API key and try the code example above. Sign up here – no credit card required. For a deeper dive into the tools, see our comparison of ESMFold and Boltz-2.

Frequently Asked Questions

What is protein folding in simple terms?

Protein folding is the process by which a long chain of amino acids (think of it like a string of beads) spontaneously crumples into a specific 3D shape. This shape determines what the protein does in your body. Just like a key must have the right shape to fit a lock, a protein must fold into the right shape to do its job. Misfolded proteins can cause diseases like Alzheimer’s and Parkinson’s.

Why was the protein folding problem so hard to solve?

The protein folding problem was hard because of the astronomical number of possible shapes a protein chain could take. A typical protein with 100 amino acids could theoretically fold into more configurations than there are atoms in the universe. Even the fastest computers could not test all possibilities. This is known as Levinthal’s paradox: if a protein tried one configuration per picosecond, it would take longer than the age of the universe to find the right fold — yet real proteins fold in milliseconds.

How did AI solve protein folding?

AI solved protein folding by learning patterns from known protein structures rather than simulating physics. AlphaFold2, developed by DeepMind, was trained on approximately 170,000 experimentally determined protein structures from the Protein Data Bank. It learned to predict the 3D coordinates of every atom in a protein from its amino acid sequence alone. At the CASP14 competition in 2020, AlphaFold2 achieved accuracy comparable to experimental methods, effectively solving the structure prediction problem for most single-chain proteins.

What is the difference between ESMFold and AlphaFold?

The main difference is speed versus accuracy. AlphaFold2 uses multiple sequence alignments (MSAs) — it searches databases of related protein sequences to build an evolutionary profile, which takes minutes to hours. ESMFold uses a protein language model that encodes evolutionary information directly in its neural network weights, requiring only the single input sequence. This makes ESMFold 100 to 1000 times faster (seconds instead of minutes) but typically 5 to 15 percent less accurate than AlphaFold2.

Can I try protein folding myself without a science background?

Yes. You can fold a protein with just a few lines of Python code using SciRouter’s API. You do not need a biology degree, a GPU, or any special software. Just sign up for a free API key, send an amino acid sequence (which is just a string of letters), and receive a 3D structure back. The entire process takes under 10 seconds. The hardest part is interpreting the results, but confidence scores (pLDDT) make that straightforward too.

Try It Free

No Login Required

Try this yourself

500 free credits. No credit card required.