What is Molecular Docking?
Molecular docking predicts how a small molecule (a drug candidate) binds to a protein target. The output is a 3D binding pose — the position, orientation, and conformation of the ligand within the protein binding site. This is a fundamental step in computational drug discovery, used to screen compounds, understand binding mechanisms, and guide lead optimization.
Traditional docking tools like AutoDock Vina require you to define a search box, prepare receptor files in PDBQT format, and install complex software stacks. DiffDock replaces this entire pipeline with a diffusion generative model that predicts binding poses end-to-end, without a predefined search box.
Why Use the DiffDock API?
Running DiffDock locally requires cloning the repository, setting up a specific conda environment with PyTorch Geometric, torch-scatter, torch-sparse, and dozens of other dependencies, plus a GPU with at least 8 GB of VRAM. The environment setup alone takes 30 to 60 minutes and is notoriously fragile across different CUDA versions.
The SciRouter API eliminates all of this. You send a protein structure and a SMILES string over HTTPS, and you get back predicted binding poses with confidence scores. The inference runs on A100 GPUs in the cloud.
Prerequisites
You need Python 3.7 or later and a SciRouter API key. Sign up at scirouter.ai/register to get 500 free credits per month. Install the SDK:
pip install scirouterexport SCIROUTER_API_KEY="sk-sci-your-api-key-here"Step 1: Prepare Your Inputs
DiffDock needs two inputs: a protein structure in PDB format and a ligand as a SMILES string. If you already have a PDB file from experiment or another prediction tool, read it from disk. If you only have a protein sequence, fold it first with ESMFold:
from scirouter import SciRouter
client = SciRouter()
# Option A: Load PDB from file
with open("target_protein.pdb") as f:
protein_pdb = f.read()
# Option B: Fold the protein first (if you only have a sequence)
fold_result = client.proteins.fold(
sequence="MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSY" # KRAS fragment
)
protein_pdb = fold_result.pdb
# Define the ligand as a SMILES string
# This is sotorasib (Lumakras), a KRAS G12C inhibitor
ligand_smiles = "C=CC(=O)N1CCN(CC1)c1c(F)cc(NC(=O)c2cc(OC)c(N3CCN(C)CC3)nc2)c(F)c1"Step 2: Submit the Docking Job
With inputs prepared, submit the docking job. The SDK handles the async job lifecycle internally — it submits the request, polls for completion, and returns the result:
# Run DiffDock docking
docking_result = client.docking.diffdock(
protein_pdb=protein_pdb,
ligand_smiles=ligand_smiles,
num_poses=5, # number of binding pose predictions to return
)
print(f"Returned {len(docking_result.poses)} binding poses")
print(f"Top pose confidence: {docking_result.poses[0].confidence:.3f}")Step 3: Interpret the Results
DiffDock returns multiple predicted binding poses, each with a confidence score and a ligand PDB structure. The poses are ranked by confidence, with the most likely binding mode listed first.
# Examine each predicted pose
for i, pose in enumerate(docking_result.poses):
print(f"Pose {i+1}: confidence = {pose.confidence:.3f}")
# Save the docked ligand pose as a PDB file
with open(f"pose_{i+1}.pdb", "w") as f:
f.write(pose.ligand_pdb)
# Save the top-ranked pose alongside the protein
with open("complex_top_pose.pdb", "w") as f:
f.write(protein_pdb)
f.write(docking_result.poses[0].ligand_pdb)
print("Docking complete. Open complex_top_pose.pdb in PyMOL or ChimeraX.")Full Working Example: End-to-End Docking Pipeline
Here is a complete script that takes a protein sequence and a ligand SMILES, folds the protein, docks the ligand, and saves the results:
import os
import sys
from scirouter import SciRouter
from scirouter.exceptions import SciRouterError
api_key = os.environ.get("SCIROUTER_API_KEY")
if not api_key:
print("Error: Set SCIROUTER_API_KEY environment variable")
sys.exit(1)
client = SciRouter(api_key=api_key)
# Step 1: Fold the target protein
print("Folding protein...")
fold = client.proteins.fold(
sequence="MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSY",
model="esmfold",
)
print(f"Fold complete. pLDDT: {fold.mean_plddt:.1f}")
# Step 2: Dock a ligand
print("Docking ligand...")
try:
dock = client.docking.diffdock(
protein_pdb=fold.pdb,
ligand_smiles="C=CC(=O)N1CCN(CC1)c1c(F)cc(NC(=O)c2cc(OC)c(N3CCN(C)CC3)nc2)c(F)c1",
num_poses=5,
)
except SciRouterError as e:
print(f"Docking failed: {e}")
sys.exit(1)
# Step 3: Save results
print(f"Got {len(dock.poses)} poses")
for i, pose in enumerate(dock.poses):
with open(f"pose_{i+1}.pdb", "w") as f:
f.write(pose.ligand_pdb)
print(f" Pose {i+1}: confidence = {pose.confidence:.3f}")
with open("protein.pdb", "w") as f:
f.write(fold.pdb)
print("Done. Visualize protein.pdb + pose_1.pdb together in PyMOL.")Virtual Screening: Docking Multiple Ligands
The real power of API-based docking is throughput. Instead of docking one molecule at a time, you can screen hundreds of compounds against the same target using concurrent requests:
from concurrent.futures import ThreadPoolExecutor, as_completed
from scirouter import SciRouter
client = SciRouter()
# Load protein structure
with open("target.pdb") as f:
protein_pdb = f.read()
# List of candidate ligands
ligands = {
"aspirin": "CC(=O)Oc1ccccc1C(=O)O",
"ibuprofen": "CC(C)Cc1ccc(cc1)C(C)C(=O)O",
"caffeine": "Cn1c(=O)c2c(ncn2C)n(C)c1=O",
"acetaminophen": "CC(=O)Nc1ccc(O)cc1",
}
def dock_one(name, smiles):
result = client.docking.diffdock(
protein_pdb=protein_pdb,
ligand_smiles=smiles,
num_poses=3,
)
return name, result.poses[0].confidence
results = {}
with ThreadPoolExecutor(max_workers=4) as pool:
futures = {pool.submit(dock_one, n, s): n for n, s in ligands.items()}
for future in as_completed(futures):
name, confidence = future.result()
results[name] = confidence
print(f"{name}: top-pose confidence = {confidence:.3f}")
# Rank by confidence
ranked = sorted(results.items(), key=lambda x: x[1], reverse=True)
print("\nRanking:")
for rank, (name, conf) in enumerate(ranked, 1):
print(f" {rank}. {name} ({conf:.3f})")Comparison: Local DiffDock vs API
Local Setup
- Clone the DiffDock repository and resolve submodules
- Create a conda environment with Python 3.9 and specific PyTorch version
- Install PyTorch Geometric, torch-scatter, torch-sparse, torch-cluster (version-locked to CUDA)
- Download pretrained model weights
- Prepare input files in the exact expected directory structure
- GPU with 8 GB+ VRAM required
- 30 to 90 minutes of setup time
API Setup
- pip install scirouter (30 seconds)
- One environment variable for authentication
- Send JSON, receive JSON — no file format gymnastics
- Works from any machine, including serverless functions
Visualizing Docking Results
The best way to evaluate docking results is visual inspection. Open the protein PDB and the top ligand pose PDB together in PyMOL:
pymol protein.pdb pose_1.pdbIn PyMOL, show the protein as a cartoon representation and the ligand as sticks. Check whether the ligand sits in a plausible binding pocket and whether key interactions (hydrogen bonds, hydrophobic contacts) are present. For Jupyter notebooks, use NGLview or py3Dmol for inline 3D rendering.
Next Steps
Now that you can dock molecules programmatically, combine it with other SciRouter tools for a complete drug discovery workflow. Use DiffDock for AI docking, screen compounds for drug-likeness with ADMET screening, or predict protein structures with ESMFold before docking.
Sign up at scirouter.ai/register for 500 free credits and start docking molecules in minutes.