What Is DiffDock?
DiffDock is a molecular docking method developed at MIT that uses diffusion generative models to predict how a small molecule (ligand) binds to a protein. Published in 2023 by Corso et al., it represents a fundamentally different approach from traditional docking software. Instead of systematically searching a predefined binding box and scoring poses with a physics energy function, DiffDock generates binding poses through a learned diffusion process over the space of ligand positions, orientations, and torsion angles.
The result is a blind docking method that does not require you to know where the binding site is. You provide a protein structure and a ligand, and DiffDock explores the entire protein surface to find likely binding modes.
How Traditional Docking Works
To understand why DiffDock is significant, it helps to know how conventional docking tools operate. Traditional methods like AutoDock Vina, GLIDE, and GOLD follow a two-step process:
- Search: Define a 3D box around the suspected binding site. The tool samples thousands of ligand poses within this box using stochastic or systematic search algorithms.
- Score: Each pose is evaluated using a scoring function that approximates binding free energy, typically combining van der Waals, electrostatic, and desolvation terms.
This approach works well when you already know where the ligand binds. But it has significant limitations: you must specify the search box, the scoring functions are approximate, and exploring large or novel binding sites is computationally expensive.
How DiffDock Is Different
DiffDock reframes docking as a generative modeling problem. Rather than searching and scoring, it learns a probability distribution over ligand poses given a protein-ligand pair. The model is trained on thousands of experimentally determined protein-ligand complexes from the PDB.
The Diffusion Process
DiffDock applies a diffusion process over three degrees of freedom simultaneously:
- Translational diffusion: Where the ligand center of mass is located relative to the protein.
- Rotational diffusion: The overall orientation of the ligand.
- Torsional diffusion: The internal rotatable bond conformations of the ligand.
During inference, the model starts from random noise and iteratively denoises to produce realistic binding poses. It generates multiple poses in a single forward pass and ranks them using a learned confidence score.
Confidence Scoring
Each generated pose comes with a confidence score that predicts how close the pose is to the true binding mode. Unlike traditional scoring functions that estimate binding affinity in kcal/mol, DiffDock's confidence score is trained to predict RMSD accuracy. Higher confidence means the model believes the pose is geometrically closer to the real answer.
When DiffDock Excels
- Unknown binding site: When you do not know where on the protein the ligand binds, DiffDock's blind docking is invaluable.
- Novel targets: For proteins with no known ligand-bound structures, DiffDock can discover binding modes without prior knowledge.
- Pose diversity: DiffDock generates multiple distinct poses with confidence rankings, giving you a richer picture than a single best pose.
- Cryptic sites: The generative approach can identify non-obvious binding pockets that grid-based search might miss.
When Traditional Docking May Be Better
- Known binding site: If you have a co-crystal structure, traditional docking in a focused search box can be faster and equally accurate.
- Binding affinity ranking: Vina and GLIDE produce energy-based scores that correlate (imperfectly) with binding affinity. DiffDock does not.
- Large-scale virtual screening: Traditional tools can screen millions of compounds per day. DiffDock is slower per ligand.
- Regulatory workflows: Traditional docking is more established in pharmaceutical regulatory submissions.
Using DiffDock via the SciRouter API
SciRouter hosts DiffDock on GPU infrastructure so you can run docking predictions without installing anything locally. Here is a working example:
import requests
import time
API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}
# Submit a docking job
response = requests.post(
f"{BASE}/docking/predict",
headers=headers,
json={
"protein_pdb": open("target.pdb").read(),
"ligand_smiles": "CC(=O)Oc1ccccc1C(=O)O", # Aspirin
"model": "diffdock",
"num_poses": 5
}
)
job_id = response.json()["job_id"]
# Poll for results
while True:
result = requests.get(f"{BASE}/docking/predict/{job_id}", headers=headers).json()
if result["status"] == "completed":
for i, pose in enumerate(result["poses"]):
print(f"Pose {i+1}: confidence={pose['confidence']:.3f}")
with open(f"pose_{i+1}.sdf", "w") as f:
f.write(pose["ligand_sdf"])
break
elif result["status"] == "failed":
print(f"Docking failed: {result['error']}")
break
time.sleep(3)DiffDock in the Broader Docking Landscape
DiffDock is part of a wave of AI-powered approaches transforming computational chemistry. For a detailed head-to-head analysis, read our DiffDock vs AutoDock Vina comparison. You can access both DiffDock and AutoDock Vina through the SciRouter API with a single API key.
Ready to try AI-powered molecular docking? Sign up for a free SciRouter API key and run your first DiffDock prediction in minutes. No GPU setup required, and you get 500 free credits to experiment with.