Why Molecular Docking Matters
Every drug works by binding to a protein target. A cancer drug binds to an overactive kinase. An anti-inflammatory binds to cyclooxygenase. An antiviral binds to a viral protease. Before a drug candidate ever reaches a test tube, computational molecular docking predicts whether and how a small molecule will fit into a protein's binding pocket. This single calculation sits at the heart of modern drug discovery.
Molecular docking takes two inputs – a protein structure and a small molecule – and predicts the 3D binding pose: exactly where the molecule sits in the protein, how it is oriented, and how tightly it binds. This information is used to screen thousands of compounds virtually before synthesizing any of them in a lab, saving months of work and millions of dollars in a typical drug discovery campaign.
Beyond drug discovery, docking is used in agrochemical design, toxicology risk assessment, enzyme engineering, and academic research on protein-ligand recognition. If you work with proteins and small molecules, docking is one of the most practically useful computational tools available.
The challenge has always been accessibility. Until recently, running docking required installing specialized software, downloading and preparing protein structures, converting file formats, defining search boxes, and having a GPU or large compute cluster. That is no longer the case. In this tutorial, you will learn how to dock any molecule to any protein in your browser or from a Python script, in about two minutes.
The Traditional Approach: Hours of Setup
To understand why online docking is such a leap forward, consider what the traditional workflow looks like. If you wanted to dock ibuprofen to COX-2 using AutoDock Vina locally, here is what you would need to do:
- Install AutoDock Vina, AutoDockTools (ADT), Open Babel, and Python 2.7 (yes, some scripts still require Python 2)
- Download the COX-2 crystal structure (PDB: 4PH9) from the RCSB Protein Data Bank
- Remove water molecules, ions, and co-crystallized ligands manually using ADT or PyMOL
- Add hydrogens and compute Gasteiger partial charges on the receptor
- Convert the receptor from PDB to PDBQT format
- Draw or download the ibuprofen structure and convert it to PDBQT format
- Define a search box around the active site (you need to know the coordinates of the binding pocket)
- Write a configuration file specifying the grid center, dimensions, and exhaustiveness
- Run the docking job from the command line
- Parse the output log file to extract binding affinities and pose coordinates
This process takes 30 minutes to an hour for someone who has done it before, and potentially an entire day for a beginner encountering file format issues, missing dependencies, or coordinate system confusion. For DiffDock, the local setup is even more involved – it requires PyTorch, PyTorch Geometric, torch-scatter, torch-sparse, a specific CUDA version, and a GPU with at least 8 GB of VRAM.
The SciRouter approach reduces this to a single API call. You provide a SMILES string and a PDB structure, and you get back binding poses with scores. The entire infrastructure – file preparation, format conversion, GPU inference, and result parsing – is handled server-side.
The SciRouter Approach: Two Minutes, No Installs
SciRouter exposes both DiffDock (AI-powered docking) and AutoDock Vina (physics-based docking) through a unified API. You send a JSON request with your protein and ligand, and you receive structured results with binding poses, scores, and downloadable structure files. There is nothing to install beyond the Python SDK, and the free tier includes 500 credits per month.
Here is the setup. It takes about 30 seconds:
pip install scirouterexport SCIROUTER_API_KEY="sk-sci-your-api-key-here"If you do not have an API key yet, sign up at scirouter.ai/register – the free tier requires no credit card.
Step-by-Step: Docking Ibuprofen to COX-2 with DiffDock
Let us start with a concrete, real-world example. Ibuprofen (Advil, Motrin) is a non-steroidal anti-inflammatory drug (NSAID) that works by inhibiting cyclooxygenase-2 (COX-2), an enzyme that produces prostaglandins involved in inflammation and pain. The crystal structure of COX-2 with a bound inhibitor is available as PDB entry 4PH9. The SMILES string for ibuprofen is CC(C)Cc1ccc(cc1)C(C)C(=O)O.
DiffDock uses a diffusion generative model to predict binding poses. Unlike traditional docking, it does not require you to define a search box around the binding site. The model learns to place the ligand in the correct pocket directly from training data. This makes it ideal for blind docking, exploratory studies, or situations where you are not certain where the ligand binds.
import os, requests, time
API_KEY = os.environ["SCIROUTER_API_KEY"]
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# Ibuprofen SMILES
ligand_smiles = "CC(C)Cc1ccc(cc1)C(C)C(=O)O"
# Fetch COX-2 structure from RCSB PDB
pdb_response = requests.get("https://files.rcsb.org/download/4PH9.pdb")
protein_pdb = pdb_response.text
# Submit DiffDock docking job
job = requests.post(f"{BASE}/docking/diffdock", headers=HEADERS, json={
"protein_pdb": protein_pdb,
"ligand_smiles": ligand_smiles,
"num_poses": 5,
}).json()
print(f"Job submitted: {job['job_id']}")
# Poll for results
while True:
result = requests.get(
f"{BASE}/docking/diffdock/{job['job_id']}", headers=HEADERS
).json()
if result["status"] == "completed":
break
if result["status"] == "failed":
raise RuntimeError(result.get("error", "Job failed"))
time.sleep(5)
# Display results
print(f"\nDiffDock returned {len(result['poses'])} binding poses:\n")
for i, pose in enumerate(result["poses"]):
print(f" Pose {i+1}: confidence = {pose['confidence']:.3f}")
# Save the top pose
with open("ibuprofen_cox2_diffdock_pose1.pdb", "w") as f:
f.write(result["poses"][0]["ligand_pdb"])
print("\nTop pose saved to ibuprofen_cox2_diffdock_pose1.pdb")The DiffDock result includes multiple ranked binding poses. The confidence score indicates how certain the model is about each pose. For ibuprofen binding to COX-2, you should expect the top pose to place the carboxylic acid group near Arg120 and Tyr355, which are the key residues for NSAID binding in the COX-2 active site. You can visualize the result by loading both the protein PDB and the docked ligand PDB into PyMOL, ChimeraX, or the free NGL Viewer.
Step-by-Step: Docking Ibuprofen to COX-2 with AutoDock Vina
AutoDock Vina is the industry-standard physics-based docking engine. It uses empirical scoring functions that estimate binding free energy based on hydrogen bonds, hydrophobic contacts, torsional strain, and desolvation penalties. Vina requires a search box that defines the region of the protein to explore, but the SciRouter API can auto-detect the binding pocket for you.
The Vina approach is complementary to DiffDock. Where DiffDock excels at blind docking and novel binding mode discovery, Vina provides physically interpretable binding energy scores in kcal/mol that are widely used in published literature. Running both tools on the same target-ligand pair and comparing results is a best practice in computational drug discovery.
import os, requests, time
API_KEY = os.environ["SCIROUTER_API_KEY"]
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
ligand_smiles = "CC(C)Cc1ccc(cc1)C(C)C(=O)O"
# Fetch COX-2 structure
pdb_response = requests.get("https://files.rcsb.org/download/4PH9.pdb")
protein_pdb = pdb_response.text
# Submit AutoDock Vina docking job
# The API auto-detects the binding pocket if no box is specified
job = requests.post(f"{BASE}/docking/vina", headers=HEADERS, json={
"protein_pdb": protein_pdb,
"ligand_smiles": ligand_smiles,
"exhaustiveness": 16, # higher = more thorough search (default 8)
"num_poses": 5,
}).json()
print(f"Job submitted: {job['job_id']}")
# Poll for results
while True:
result = requests.get(
f"{BASE}/docking/vina/{job['job_id']}", headers=HEADERS
).json()
if result["status"] == "completed":
break
if result["status"] == "failed":
raise RuntimeError(result.get("error", "Job failed"))
time.sleep(5)
# Display results with binding energies
print(f"\nVina returned {len(result['poses'])} binding poses:\n")
for i, pose in enumerate(result["poses"]):
print(f" Pose {i+1}: binding affinity = {pose['affinity_kcal_mol']:.1f} kcal/mol")
# Save the top pose
with open("ibuprofen_cox2_vina_pose1.pdb", "w") as f:
f.write(result["poses"][0]["ligand_pdb"])
print("\nTop pose saved to ibuprofen_cox2_vina_pose1.pdb")For ibuprofen docking to COX-2, you should expect binding affinities in the range of -6.5 to -8.0 kcal/mol. The experimental IC50 of ibuprofen for COX-2 is approximately 13 micromolar, which corresponds to a binding free energy of about -6.7 kcal/mol. A Vina score in that range indicates the docking is producing physically reasonable results.
Interpreting Docking Scores
Docking tools produce scores that estimate how well a ligand fits into a protein binding site, but interpreting these scores correctly requires understanding what they measure and what they do not. Here is a guide to the key metrics.
Binding Affinity (kcal/mol) – AutoDock Vina
Vina reports binding affinity in kilocalories per mole. More negative values indicate stronger predicted binding. As a rough guide: scores below -5.0 kcal/mol suggest weak binding, -5.0 to -7.0 indicates moderate binding, -7.0 to -9.0 indicates strong binding, and below -9.0 indicates very strong binding. These thresholds are approximate – the absolute accuracy of Vina scores is about 2 kcal/mol, so they are best used for relative ranking rather than absolute affinity prediction.
Confidence Score – DiffDock
DiffDock returns a confidence score for each predicted pose. This is not a binding energy but a model confidence that the predicted pose is geometrically correct. Higher confidence means the model is more certain that the ligand placement is accurate. Use confidence to rank poses and identify the most reliable prediction. A high confidence score does not guarantee tight binding – it means the pose geometry is likely correct.
RMSD (Root Mean Square Deviation)
When you have a known experimental binding pose (from a co-crystal structure), RMSD measures how far the predicted pose deviates from the experimental one. An RMSD below 2.0 angstroms is generally considered a successful prediction. Below 1.0 angstroms is excellent. If you do not have an experimental reference, RMSD is calculated between alternative predicted poses to assess pose diversity.
Protein-Ligand Contacts
Beyond scores, examine the specific interactions between the ligand and protein residues. Key contacts to look for include hydrogen bonds (especially to backbone NH and CO groups), salt bridges (charged residue to charged ligand group), hydrophobic packing (aromatic rings stacking or alkyl groups in hydrophobic pockets), and pi-cation interactions. A pose with a good score but no sensible contacts is likely an artifact.
DiffDock vs AutoDock Vina: When to Use Each
Both tools are available through the SciRouter API, and in practice you should often run both. Here is a comparison to help you decide which to prioritize for a given task.
Comparison Table
Approach: DiffDock uses a diffusion generative model trained on protein-ligand complexes. Vina uses physics-based empirical scoring functions with stochastic local search.
Search box required: DiffDock does not require a search box – it performs blind docking over the entire protein surface. Vina requires a defined search box, though the SciRouter API can auto-detect the binding pocket.
Best for blind docking: DiffDock significantly outperforms Vina when the binding site is unknown, achieving success rates above 35% compared to under 10% for Vina on the PDBBind blind docking benchmark.
Best for known pockets: Vina is highly competitive when you know where the ligand binds and can define a tight search box. It is the more established method for focused docking.
Score interpretation: Vina gives binding affinity in kcal/mol, which maps to experimental binding constants. DiffDock gives a confidence score that reflects pose reliability, not binding strength.
Speed: Both complete in 15 to 60 seconds through the API. Locally, Vina is faster (seconds on CPU) while DiffDock requires GPU and takes 30 to 90 seconds.
Literature support: Vina has been cited in over 15,000 publications and is the most widely used docking tool in academic research. DiffDock is newer (published 2023) but rapidly gaining adoption.
Batch Docking for Virtual Screening with the Python SDK
The real power of API-based docking is throughput. In a virtual screening campaign, you test hundreds or thousands of compounds against a target protein to identify the best binders. The SciRouter API handles this naturally – each docking job is an independent API call, so you can parallelize across as many compounds as your rate limit allows.
Here is a complete example that screens a small library of known NSAIDs against COX-2 using DiffDock, ranks them by confidence, and saves the top hits:
import os, requests, time
from concurrent.futures import ThreadPoolExecutor, as_completed
API_KEY = os.environ["SCIROUTER_API_KEY"]
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# Fetch COX-2 structure once
pdb_response = requests.get("https://files.rcsb.org/download/4PH9.pdb")
protein_pdb = pdb_response.text
# Compound library: known NSAIDs and related molecules
library = {
"ibuprofen": "CC(C)Cc1ccc(cc1)C(C)C(=O)O",
"naproxen": "COc1ccc2cc(CC(C)C(=O)O)ccc2c1",
"celecoxib": "Cc1ccc(-c2cc(C(F)(F)F)nn2-c2ccc(S(N)(=O)=O)cc2)cc1",
"diclofenac": "OC(=O)Cc1ccccc1Nc1c(Cl)cccc1Cl",
"aspirin": "CC(=O)Oc1ccccc1C(=O)O",
"indomethacin": "COc1ccc2c(c1)c(CC(=O)O)c(C)n2C(=O)c1ccc(Cl)cc1",
"piroxicam": "OC1=C(C(=O)N2CCCCCC2)N(C)S(=O)(=O)c2ccccc21",
"meloxicam": "Cc1cnc(NC(=O)C2=C(O)c3ccccc3S(=O)(=O)N2C)s1",
}
def dock_compound(name, smiles):
"""Submit a DiffDock job and wait for the result."""
job = requests.post(f"{BASE}/docking/diffdock", headers=HEADERS, json={
"protein_pdb": protein_pdb,
"ligand_smiles": smiles,
"num_poses": 3,
}).json()
# Poll until complete
for _ in range(60):
result = requests.get(
f"{BASE}/docking/diffdock/{job['job_id']}", headers=HEADERS
).json()
if result["status"] == "completed":
top_conf = result["poses"][0]["confidence"]
return name, smiles, top_conf, result["poses"][0]["ligand_pdb"]
if result["status"] == "failed":
return name, smiles, None, None
time.sleep(3)
return name, smiles, None, None
# Run docking jobs in parallel (4 at a time)
results = []
with ThreadPoolExecutor(max_workers=4) as pool:
futures = {
pool.submit(dock_compound, name, smi): name
for name, smi in library.items()
}
for future in as_completed(futures):
name, smiles, confidence, ligand_pdb = future.result()
if confidence is not None:
results.append((name, smiles, confidence, ligand_pdb))
print(f" {name}: confidence = {confidence:.3f}")
else:
print(f" {name}: FAILED")
# Rank by confidence
results.sort(key=lambda x: x[2], reverse=True)
print("\n--- Virtual Screening Results (ranked) ---\n")
for rank, (name, smiles, conf, _) in enumerate(results, 1):
print(f" {rank}. {name:15s} confidence={conf:.3f} SMILES={smiles}")
# Save top 3 poses
for i, (name, _, _, ligand_pdb) in enumerate(results[:3]):
with open(f"hit_{i+1}_{name}.pdb", "w") as f:
f.write(ligand_pdb)
print(f"\nTop {min(3, len(results))} poses saved to disk.")This workflow scales linearly. To screen 1,000 compounds, simply load your SMILES from a CSV file and adjust the worker count. At 4 concurrent workers, a library of 1,000 compounds completes in roughly 1 to 2 hours through the API. For larger campaigns, contact SciRouter for higher rate limits and batch pricing.
Real Example: Docking Ibuprofen to COX-2 – What to Expect
COX-2 (cyclooxygenase-2, PDB: 4PH9) is one of the most-studied drug targets in history. The 4PH9 crystal structure was solved at 2.04 angstrom resolution and contains the COX-2 homodimer with a bound celecoxib-analog inhibitor. The active site is a long hydrophobic channel leading to a catalytic pocket containing Arg120, Tyr355, Ser530, and the selectivity pocket unique to COX-2 (Val523, Arg513).
Ibuprofen binds in the main hydrophobic channel with its carboxylic acid group forming hydrogen bonds to Arg120 and Tyr355. The isobutyl group extends into a hydrophobic sub-pocket. When you dock ibuprofen to 4PH9, the top DiffDock pose should place the molecule in this channel with the carboxylate oriented toward the arginine. The Vina binding energy should be in the -6.5 to -8.0 kcal/mol range.
To validate your docking result, compare the predicted pose to the known binding mode from crystal structures of ibuprofen bound to COX enzymes (e.g., PDB: 4PH9 contains a related inhibitor that occupies the same channel). If the RMSD between your predicted pose and the experimental reference is below 2.0 angstroms, the docking was successful.
You can also dock selective COX-2 inhibitors like celecoxib (SMILES:Cc1ccc(-c2cc(C(F)(F)F)nn2-c2ccc(S(N)(=O)=O)cc2)cc1) and compare the binding mode. Celecoxib is larger and extends into the COX-2 selectivity pocket, which explains its COX-2 selectivity over COX-1. This kind of comparative docking analysis is a standard approach in medicinal chemistry.
Beyond Two Molecules: Building a Full Drug Discovery Pipeline
Molecular docking is rarely used in isolation. In a real drug discovery workflow, docking is one step in a multi-stage pipeline. SciRouter provides all the tools you need to build this pipeline through a single API:
- Target preparation: Predict protein structure from sequence with ESMFold if no experimental structure is available
- Pocket detection: Identify druggable binding sites on the protein surface automatically
- Docking: Dock compounds with DiffDock or AutoDock Vina
- Scoring and filtering: Calculate molecular properties, drug-likeness, and ADMET predictions for top hits
- Lead generation: Generate novel molecules around your best scaffolds using REINVENT4
- Complex prediction: Refine top candidates with Chai-1 or Boltz-2 for higher-accuracy complex structures
The Drug Discovery Studio on SciRouter chains these tools together in a visual interface, or you can build custom pipelines using the Python SDK. Every tool in the pipeline is accessible through the same API key and the same authentication flow.
Tips for Better Docking Results
After running hundreds of docking jobs, here are the most important practical tips for getting reliable results from online molecular docking.
Choose the Right Protein Structure
The quality of your protein structure directly affects docking accuracy. Prefer experimental crystal structures over predicted models when available. Use structures solved at 2.5 angstroms or better. If the protein has been co-crystallized with a similar ligand, that structure will have the binding site in a relevant conformation. Search the RCSB PDB at rcsb.org for structures of your target.
Validate with Known Binders
Before docking unknown compounds, dock a known binder first as a positive control. If your docking tool cannot reproduce the known binding mode of a well-characterized ligand, the results for unknown molecules will not be reliable. The ibuprofen-to-COX-2 example in this tutorial serves as exactly this kind of validation.
Run Both DiffDock and Vina
Each tool has different strengths and failure modes. Consensus results – where both tools agree – are more reliable than single-tool predictions. When the tools disagree, investigate both poses carefully.
Always Inspect Poses Visually
Scores are useful for ranking, but they are not infallible. A high-scoring pose that places the ligand in a physically impossible orientation (clashing with protein atoms, or burying polar groups in hydrophobic regions) is a false positive. Spend 60 seconds looking at each top pose in a molecular viewer before trusting the score.
Next Steps
You now know how to run molecular docking online with both DiffDock and AutoDock Vina, interpret the results, and scale to virtual screening campaigns. To go deeper, explore these related guides:
- DiffDock vs AutoDock Vina: Full Comparison – detailed benchmarks and accuracy analysis
- Virtual Screening Tutorial – scaling docking to thousands of compounds
- De Novo Drug Design with REINVENT4 – generating novel molecules to dock
- Predict Protein Structure via API – fold targets from sequence before docking
Sign up for a free SciRouter API key and dock your first molecule in under two minutes. No GPU required, no software to install, no file format headaches. Just molecules and proteins.