DockingDiffDock

How to Screen 1,000 Molecules in 10 Minutes

Tutorial for virtual screening at scale using SciRouter. Batch dock molecules, filter by drug-likeness, rank by binding affinity — all through API calls.

Ryan Bethencourt
April 8, 2026
10 min read

What Is Virtual Screening?

Drug discovery starts with a simple but expensive problem: you have a protein target implicated in a disease, and you need to find small molecules that bind to it. The brute-force approach – synthesizing and testing every candidate in the lab – is prohibitively slow and costly. A typical high-throughput screen tests hundreds of thousands of compounds over weeks at a cost of hundreds of thousands of dollars.

Virtual screening flips the economics. Instead of testing molecules physically, you evaluate them computationally – filtering by drug-likeness, predicting binding affinity through molecular docking, and assessing safety with ADMET models. The result is a short list of high-probability candidates that you then validate experimentally. A well-designed virtual screen achieves hit rates of 5–20%, compared to 0.01–0.1% for random experimental screening.

This tutorial walks you through a complete virtual screening pipeline using SciRouter's API. By the end, you will have working Python code that screens 1,000 molecules through property filters, molecular docking, and ADMET prediction – all in about 10 minutes.

Two Approaches: Ligand-Based vs Structure-Based

Before diving into code, it helps to understand the two main strategies for virtual screening:

Ligand-Based Screening

If you know molecules that are active against your target, you can use their properties to find similar compounds. This approach compares molecular fingerprints, physicochemical properties, and pharmacophore features to rank candidates by similarity to known actives. It does not require a 3D protein structure.

  • Strengths: Fast, simple, works without a protein structure
  • Weaknesses: Biased toward known chemical scaffolds, may miss novel chemotypes
  • SciRouter tools: Molecular Properties for filtering, similarity search for ranking

Structure-Based Screening

If you have a 3D structure of the target protein (experimental or predicted), you can computationally dock each candidate molecule into the binding site and score the predicted interaction. This can discover entirely novel scaffolds because it evaluates the physical complementarity between molecule and target.

  • Strengths: Can find novel scaffolds, physically motivated scoring
  • Weaknesses: Requires a protein structure, more computationally expensive
  • SciRouter tools: DiffDock for AI-powered docking
Note
The most effective virtual screening pipelines combine both approaches: ligand-based filters to rapidly eliminate unsuitable compounds, followed by structure-based docking on the survivors. This is the approach we will build in this tutorial.

The Pipeline: Five Steps to Screen 1,000 Molecules

Our screening funnel has five stages, each progressively more expensive but more informative:

  • Step 1: Prepare your molecule library as SMILES strings
  • Step 2: Filter by drug-likeness using molecular properties
  • Step 3: Dock surviving candidates against the target protein
  • Step 4: Rank by predicted binding affinity
  • Step 5: Run ADMET prediction on top hits

Each stage eliminates compounds, so by the time you reach the expensive docking step, your library is already significantly reduced. This cascading filter design is what makes the entire pipeline fast.

Step 1: Prepare Your Molecule Library

Your starting point is a list of SMILES strings representing the molecules you want to screen. These can come from public databases like ZINC (over 200 million purchasable compounds), ChEMBL (bioactive molecules with assay data), or your own proprietary library.

Load a molecule library from a CSV file
import csv

def load_library(filepath):
    """Load SMILES from a CSV file with columns: name, smiles"""
    molecules = []
    with open(filepath) as f:
        reader = csv.DictReader(f)
        for row in reader:
            molecules.append({
                "name": row["name"],
                "smiles": row["smiles"]
            })
    return molecules

# Load your library
library = load_library("compound_library.csv")
print(f"Loaded {len(library)} molecules")
Tip
If your library is in SDF or MOL2 format, use SciRouter's /v1/chemistry/convert endpoint to convert to SMILES first. For details on SMILES syntax, see our DiffDock tutorial which covers input preparation.

Step 2: Filter by Drug-Likeness

The first filter eliminates molecules that are unlikely to become oral drugs. We use Lipinski's Rule of Five and related property filters via the molecular properties endpoint. This step is fast (sub-second per molecule) and typically eliminates 30–50% of a random library.

Filter molecules by drug-likeness properties
import requests
from concurrent.futures import ThreadPoolExecutor, as_completed

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"

def check_properties(molecule):
    """Calculate properties and check drug-likeness."""
    resp = requests.post(
        f"{BASE}/chemistry/properties",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={"smiles": molecule["smiles"]}
    )
    if resp.status_code != 200:
        return None

    props = resp.json()
    molecule["properties"] = props

    # Lipinski's Rule of Five + extra filters
    passes = (
        props["molecular_weight"] <= 500
        and props["logp"] <= 5
        and props["h_bond_donors"] <= 5
        and props["h_bond_acceptors"] <= 10
        and props["tpsa"] <= 140
        and props["rotatable_bonds"] <= 10
    )
    molecule["passes_druglikeness"] = passes
    return molecule

# Screen in parallel (20 concurrent requests)
druglike = []
with ThreadPoolExecutor(max_workers=20) as executor:
    futures = {
        executor.submit(check_properties, mol): mol
        for mol in library
    }
    for future in as_completed(futures):
        result = future.result()
        if result and result["passes_druglikeness"]:
            druglike.append(result)

print(f"Drug-like compounds: {len(druglike)}/{len(library)} "
      f"({100*len(druglike)/len(library):.0f}%)")

With 20 concurrent threads, this step processes 1,000 molecules in under a minute. The property calculation is CPU-bound and returns in milliseconds per compound.

Step 3: Dock Candidates Against the Target

Now we dock the surviving compounds against our protein target using DiffDock. Unlike traditional docking tools that require you to define a search box, DiffDock uses a diffusion model to explore the entire protein surface – finding binding sites automatically.

Dock molecules with DiffDock via SciRouter
def dock_molecule(molecule, protein_pdb_path):
    """Dock a single molecule against the target protein."""
    with open(protein_pdb_path) as f:
        pdb_content = f.read()

    resp = requests.post(
        f"{BASE}/docking/diffdock",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "protein_pdb": pdb_content,
            "ligand_smiles": molecule["smiles"],
            "num_poses": 5
        }
    )
    if resp.status_code != 200:
        return None

    result = resp.json()
    molecule["docking"] = result
    # Best pose score (lower is better for DiffDock confidence)
    molecule["best_score"] = min(
        pose["confidence"] for pose in result["poses"]
    )
    return molecule

# Dock all drug-like compounds (10 concurrent to manage GPU load)
target_pdb = "target_protein.pdb"
docked = []

with ThreadPoolExecutor(max_workers=10) as executor:
    futures = {
        executor.submit(dock_molecule, mol, target_pdb): mol
        for mol in druglike
    }
    for future in as_completed(futures):
        result = future.result()
        if result is not None:
            docked.append(result)

print(f"Successfully docked: {len(docked)}/{len(druglike)}")
Warning
Docking is the most time-consuming step. Each DiffDock prediction takes 5–30 seconds depending on protein and ligand size. With 10 concurrent requests and 500 compounds, expect approximately 5–8 minutes. Adjust max_workers based on your API tier rate limits.

Step 4: Rank by Binding Affinity

With docking complete, we rank all compounds by their predicted binding score and select the top candidates for the final ADMET assessment:

Rank compounds by docking score
# Sort by best docking score (lower confidence = better binding)
docked.sort(key=lambda x: x["best_score"])

# Take top 50 candidates
top_candidates = docked[:50]

print("Top 10 candidates by docking score:")
print(f"{'Rank':<6}{'Name':<25}{'Score':<10}{'MW':<8}{'LogP':<6}")
print("-" * 55)
for i, mol in enumerate(top_candidates[:10], 1):
    print(f"{i:<6}{mol['name']:<25}"
          f"{mol['best_score']:<10.3f}"
          f"{mol['properties']['molecular_weight']:<8.1f}"
          f"{mol['properties']['logp']:<6.2f}")

Step 5: ADMET Prediction on Top Hits

The final filter checks whether your top-ranked binders have acceptable pharmacokinetic and safety profiles. A compound that binds perfectly but is toxic or not absorbed is useless as a drug. We use the ADMET prediction endpoint to flag potential liabilities.

Run ADMET prediction on top candidates
def predict_admet(molecule):
    """Get ADMET profile for a compound."""
    resp = requests.post(
        f"{BASE}/pharma/adme",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={"smiles": molecule["smiles"]}
    )
    if resp.status_code != 200:
        return None

    admet = resp.json()
    molecule["admet"] = admet

    # Flag critical safety issues
    molecule["safety_flags"] = []
    if admet["toxicity"]["herg_inhibitor"]:
        molecule["safety_flags"].append("hERG")
    if admet["toxicity"]["ames_mutagenicity"]:
        molecule["safety_flags"].append("Ames")
    if admet["toxicity"]["hepatotoxicity"] == "high_risk":
        molecule["safety_flags"].append("hepatotox")
    if admet["absorption"]["caco2_class"] == "low":
        molecule["safety_flags"].append("low_absorption")

    return molecule

# Run ADMET on top 50 candidates
with ThreadPoolExecutor(max_workers=20) as executor:
    futures = {
        executor.submit(predict_admet, mol): mol
        for mol in top_candidates
    }
    admet_results = []
    for future in as_completed(futures):
        result = future.result()
        if result is not None:
            admet_results.append(result)

# Filter: keep compounds with no critical safety flags
clean_hits = [m for m in admet_results if len(m["safety_flags"]) == 0]

print(f"\nFinal hits (no safety flags): {len(clean_hits)}/{len(admet_results)}")
print(f"\nFinal shortlist:")
for i, mol in enumerate(clean_hits[:10], 1):
    print(f"  {i}. {mol['name']} "
          f"(dock: {mol['best_score']:.3f}, "
          f"bioavail: {mol['admet']['absorption']['bioavailability']:.2f})")

Putting It All Together

Here is the complete pipeline assembled as a single reusable script:

Complete virtual screening pipeline
import requests
import csv
import time
from concurrent.futures import ThreadPoolExecutor, as_completed

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

def screen_library(library_path, protein_pdb_path, top_n=50):
    """Full virtual screening pipeline."""
    start = time.time()

    # Step 1: Load library
    library = load_library(library_path)
    print(f"[1/5] Loaded {len(library)} molecules")

    # Step 2: Drug-likeness filter
    druglike = parallel_map(check_properties, library, workers=20)
    druglike = [m for m in druglike if m and m["passes_druglikeness"]]
    print(f"[2/5] Drug-like: {len(druglike)}")

    # Step 3: Dock against target
    dock_fn = lambda mol: dock_molecule(mol, protein_pdb_path)
    docked = parallel_map(dock_fn, druglike, workers=10)
    docked = [m for m in docked if m is not None]
    print(f"[3/5] Docked: {len(docked)}")

    # Step 4: Rank by binding score
    docked.sort(key=lambda x: x["best_score"])
    top = docked[:top_n]
    print(f"[4/5] Top {len(top)} selected")

    # Step 5: ADMET filter
    admet_done = parallel_map(predict_admet, top, workers=20)
    hits = [m for m in admet_done if m and len(m["safety_flags"]) == 0]
    print(f"[5/5] Clean hits: {len(hits)}")

    elapsed = time.time() - start
    print(f"\nDone in {elapsed/60:.1f} minutes")
    return hits

def parallel_map(fn, items, workers=10):
    """Run fn on items in parallel and collect results."""
    results = []
    with ThreadPoolExecutor(max_workers=workers) as executor:
        futures = {executor.submit(fn, item): item for item in items}
        for future in as_completed(futures):
            try:
                results.append(future.result())
            except Exception:
                pass
    return results

# Run the pipeline
hits = screen_library(
    "compound_library.csv",
    "target_protein.pdb",
    top_n=50
)

With a 1,000-molecule library, this pipeline typically completes in 8–12 minutes depending on molecule sizes and API tier. The bottleneck is the docking step; the property and ADMET calculations are fast.

Optimizing Your Screen

Several strategies can improve both the speed and quality of your virtual screening results:

  • Pre-filter aggressively – The more compounds you eliminate before docking, the faster the pipeline runs. Consider adding filters for PAINS (pan-assay interference compounds), reactive functional groups, and synthetic accessibility.
  • Use diversity selection – If your library has many similar molecules, cluster them by fingerprint similarity and pick representatives. This avoids docking redundant compounds.
  • Adjust concurrency to your tier – The free tier supports 5 concurrent requests; the Pro tier supports 50. Scale max_workers accordingly to maximize throughput without hitting rate limits.
  • Cache property results – If you screen the same library against multiple targets, cache the drug-likeness and ADMET results. Only the docking step changes with different targets.
  • Consider predicted structures – If no experimental structure exists for your target, fold it first with ESMFold and use the predicted structure for docking. Check that binding site residues have pLDDT above 70.
Tip
For more on interpreting docking results, see our DiffDock tutorial. For understanding the ADMET output, check our ADMET prediction guide.

What Comes After Virtual Screening

Virtual screening produces a ranked shortlist, not a confirmed drug. The next steps in a typical workflow are:

  • Visual inspection – Examine the top binding poses in a molecular viewer (PyMOL, ChimeraX) to check whether the predicted interactions make chemical sense
  • Compound procurement – Order the top hits from chemical vendors (most ZINC compounds are purchasable from Enamine, MolPort, or other suppliers)
  • Experimental validation – Test binding experimentally with biophysical assays (SPR, ITC, thermal shift) or functional assays
  • Hit expansion – For confirmed hits, search for analogs and repeat the screening cycle to optimize potency, selectivity, and drug-likeness

Next Steps

You now have a complete, working virtual screening pipeline that takes a molecular library from raw SMILES to a prioritized shortlist of drug candidates. The combination of property filters, AI-powered docking, and ADMET prediction creates a multi-layered funnel that catches different failure modes at each stage.

Start by screening your own compound library. If you do not have one, download a subset from ZINC15 or ChEMBL and use the code above to identify potential hits against your target of interest.

Ready to run your first screen? Sign up for a free SciRouter API key and start screening today – 5,000 API calls per month on the free tier.

Frequently Asked Questions

What is virtual screening in drug discovery?

Virtual screening is a computational approach to identify promising drug candidates from large molecular libraries without synthesizing and testing every compound in the lab. It uses computational filters — such as drug-likeness rules, molecular docking, and ADMET prediction — to narrow thousands or millions of molecules down to a short list of candidates for experimental validation. It dramatically reduces the time and cost of early-stage drug discovery.

What is the difference between ligand-based and structure-based virtual screening?

Ligand-based screening uses properties of known active molecules to find similar compounds. It relies on molecular fingerprints and similarity metrics without needing a 3D protein structure. Structure-based screening uses the 3D structure of the target protein to computationally dock candidate molecules and predict binding affinity. Structure-based methods are more computationally expensive but can discover novel scaffolds that ligand-based approaches might miss.

How many molecules can I screen through the SciRouter API?

The SciRouter API supports batch processing of up to 1,000 molecules per request for property calculation and ADMET prediction endpoints. For molecular docking (DiffDock), you can submit molecules individually or in small batches. Using concurrent API calls as shown in this tutorial, you can realistically screen 1,000 molecules through the full pipeline — properties, docking, and ADMET — in approximately 10 minutes on the Pro tier.

Do I need a protein structure for virtual screening?

It depends on your approach. Ligand-based screening (filtering by molecular properties, similarity, and ADMET) does not require a protein structure. Structure-based screening (molecular docking) does require a 3D structure of the target protein. If no experimental structure is available, you can predict one using ESMFold or AlphaFold2 via the SciRouter API and use that predicted structure for docking.

How accurate is virtual screening compared to wet-lab screening?

Virtual screening typically achieves hit rates of 5–20% in the final shortlist, compared to 0.01–0.1% for random experimental screening. It is not a replacement for wet-lab validation but a powerful filter that concentrates your experimental resources on the most promising candidates. The combination of property filters, docking scores, and ADMET predictions creates a multi-layered funnel that captures diverse failure modes.

What molecular file formats does SciRouter accept?

SciRouter accepts SMILES strings as the primary input format for all chemistry and docking endpoints. SMILES is a compact text representation of molecular structures. If your library is in SDF, MOL2, or InChI format, you can convert to SMILES using the /v1/chemistry/convert endpoint before screening. Most chemical databases (ZINC, ChEMBL, PubChem) provide SMILES exports directly.

Try It Free

No Login Required

Try this yourself

500 free credits. No credit card required.