What does 'lead compound' actually mean?

A lead compound is a molecule that has demonstrated enough activity, selectivity, and developability to justify further investment. It is not a drug yet — many more steps lie between a lead and a clinical candidate — but it is a real starting point. A generative run that produces lead-like candidates gives a medicinal chemist something to build on.

What is Boltz-2 doing in this pipeline?

Boltz-2 is a structural prediction model that takes a protein sequence and predicts its structure, often including bound ligand poses. In this tutorial we use it to identify the binding pocket geometry on our target protein. That pocket is then the conditioning input to DiffSBDD. You could substitute a crystal structure or a pocket-detection tool if you prefer, but Boltz-2 gives you a fast path when no structure exists.

Why filter with ADMET after generation?

DiffSBDD does not know about pharmacokinetics. It generates molecules that fit the pocket but may be insoluble, metabolically unstable, or toxic. A post-generation ADMET filter removes candidates that would fail in vivo no matter how well they bind. This is standard hygiene in any generative drug design pipeline.

How many candidates should I generate?

A few hundred is a good starting point for a single pocket. That gives you enough diversity to filter aggressively and still have dozens of survivors. If you need more, run multiple seeds or multiple pocket definitions. If you only have a few minutes of GPU budget, generate fifty and review them carefully.

Do I need to run all four steps?

For a proper pipeline, yes. You can skip Boltz-2 if you already have a pocket from a crystal structure. You can skip the ADMET filter if you are only exploring pocket fit. But for candidates that could actually become leads, run the full chain.

Is this ready for drug discovery in practice?

It is ready for hypothesis generation and triage. The candidates are starting points, not finished molecules. A chemist still has to review them, propose refinements, and send them to wet lab. What this pipeline changes is the throughput: you can go from target to curated candidate list in minutes instead of weeks.

Does SciRouter handle the GPU orchestration?

Yes. All three models — Boltz-2, DiffSBDD, and the ADMET predictor — run on GPU on SciRouter's side. Your client is a plain HTTP call for each step. You do not provision anything.

From Protein Pocket to Lead Compound: A Tutorial with DiffSBDD

This tutorial walks through a full structure-based drug design pipeline end to end. We start with a protein target sequence, use Boltz-2 to predict its structure and identify the binding pocket, use DiffSBDD to generate 3D lead candidates conditioned on the pocket, and apply an ADMET filter to narrow the list. Every step runs on SciRouter, so you do not need any local GPU or model-weight management.

The goal is not to produce a drug — that is still a years- long wet-lab effort — but to produce a curated list of candidates that a medicinal chemist could actually start working on tomorrow.

Note

This pipeline produces hypotheses, not clinical candidates. Every output needs chemist review and experimental validation before it means anything.

The pipeline at a glance

Step 1 — Pocket. Boltz-2 predicts the protein structure and identifies the binding pocket.
Step 2 — Generate. DiffSBDD produces 3D candidates conditioned on the pocket.
Step 3 — Filter. An ADMET predictor removes candidates that would fail in vivo.
Step 4 — Review. A chemistry LLM like TxGemma writes a rationale for each survivor, and a human reviews the shortlist.

Step 1: Pocket detection with Boltz-2

We start by calling SciRouter's Boltz-2 endpoint with the target sequence. Boltz-2 returns the predicted structure along with pocket definitions.

python

import os
import httpx

API_KEY = os.environ["SCIROUTER_API_KEY"]
BASE = "https://scirouter-gateway-production.up.railway.app"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

def predict_structure_and_pocket(sequence: str) -> dict:
    resp = httpx.post(
        f"{BASE}/v1/complexes/boltz2",
        headers=HEADERS,
        json={"sequence": sequence, "detect_pockets": True},
        timeout=600.0,
    )
    resp.raise_for_status()
    return resp.json()

target_sequence = "MKTIIALSYIFCLVFADYKDDDDK..."  # your sequence
result = predict_structure_and_pocket(target_sequence)
pocket = result["pockets"][0]  # take the top pocket
print(f"Pocket residues: {pocket['residues']}")
print(f"Pocket center: {pocket['center']}")

The pocket object includes the set of residues forming the pocket, the predicted pocket center, and the atoms the generator will use for conditioning. If Boltz-2 finds multiple pockets, pick the one that matches the biology of your target — the active site, the allosteric site, or whichever cavity you care about.

Step 2: Generating candidates with DiffSBDD

Next we pass the pocket to DiffSBDD and ask it to generate candidate molecules. The generator produces 3D atoms directly, and we return a list of candidate structures.

python

def generate_candidates(pocket: dict, n_candidates: int = 200) -> list[dict]:
    resp = httpx.post(
        f"{BASE}/v1/design/diffsbdd",
        headers=HEADERS,
        json={
            "pocket": pocket,
            "n_candidates": n_candidates,
            "seed": 42,
        },
        timeout=900.0,
    )
    resp.raise_for_status()
    return resp.json()["candidates"]

candidates = generate_candidates(pocket, n_candidates=200)
print(f"Got {len(candidates)} candidates from DiffSBDD")

Each candidate includes a SMILES string, the 3D atom positions, and a generator score. At this stage you have a pool of pocket-aware molecules, but nothing in that pool has been evaluated for developability.

Step 3: Filtering with ADMET

This is the step that separates a generative dump from a curated candidate list. We send each candidate through SciRouter's ADMET panel and keep the ones that look developable.

python

def admet_profile(smiles: str) -> dict:
    resp = httpx.post(
        f"{BASE}/v1/chemistry/admet",
        headers=HEADERS,
        json={"smiles": smiles},
        timeout=60.0,
    )
    resp.raise_for_status()
    return resp.json()

def is_drug_like(profile: dict) -> bool:
    return (
        profile["qed"] > 0.5
        and profile["logp"] > 0
        and profile["logp"] < 5
        and profile["herg"]["verdict"] != "high"
        and profile["cyp_inhibition"]["verdict"] != "high"
    )

survivors = []
for cand in candidates:
    profile = admet_profile(cand["smiles"])
    if is_drug_like(profile):
        cand["admet"] = profile
        survivors.append(cand)

print(f"{len(survivors)} candidates survived ADMET filtering")

The thresholds above are illustrative. In practice you tune them based on what you want from the leads — more permissive filters early, more restrictive filters late. For CNS targets you would add BBB penetration. For oral drugs you would add Lipinski and bioavailability.

Step 4: Ranking and rationale

From the survivors, we pick the top candidates by pocket fit score and ask TxGemma for a rationale on each one. This produces a shortlist that a chemist can review with context, not just a list of numbers.

python

def txgemma_rationale(smiles: str, target: str) -> str:
    resp = httpx.post(
        f"{BASE}/v1/interpret/txgemma",
        headers=HEADERS,
        json={
            "smiles": smiles,
            "question": (
                f"This molecule was generated as a candidate for {target}. "
                "Comment on its pharmacophore, likely ADMET profile, and "
                "any structural liabilities. Keep the answer under 120 words."
            ),
        },
        timeout=90.0,
    )
    resp.raise_for_status()
    return resp.json()["answer"]

survivors.sort(key=lambda c: c["score"], reverse=True)
shortlist = survivors[:10]
for cand in shortlist:
    cand["rationale"] = txgemma_rationale(cand["smiles"], target="your target")

The shortlist now contains 3D coordinates, SMILES, an ADMET profile, a pocket-fit score, and a written rationale for each candidate. This is the format a chemist can actually work with.

Common pitfalls

Bad pocket input

A wrong pocket leads to wrong candidates. If your target has multiple cavities, make sure you are conditioning on the right one. When in doubt, inspect the pocket definition visually before handing it to DiffSBDD.

Over-aggressive filtering

Filter for developability, not for perfection. If your ADMET filter cuts 99% of candidates, you have lost the benefit of exploration. Start with loose filters and tighten them based on what the surviving set looks like.

Skipping chemist review

LLM rationales and ADMET predictions are triage tools. They are not a substitute for a human chemist reviewing the top candidates and asking the hard questions. Always end with a review step.

Treating the shortlist as a hit list

The candidates are hypotheses. They still need synthesis, biophysical assays, and cell-based validation before they mean anything. This pipeline moves you from zero to “worth testing.” It does not move you from zero to drug.

Warning

Every molecule that comes out of a generative pipeline is a starting point, not a finished product. Real drug discovery still happens in the wet lab, with careful medicinal chemistry and iterative optimization. This pipeline makes the starting point dramatically better, not shorter.

Bottom line

A few years ago, a pipeline like this one — pocket detection, pocket-conditioned 3D generation, ADMET filtering, LLM rationale — would have required a team of specialists and a lot of infrastructure. Today it is a single Python script against SciRouter's hosted API. What the team used to produce in weeks can be produced in an afternoon, with a chemist then taking over for the interesting part of the work.

Try DiffSBDD on SciRouter →

From Protein Pocket to Lead Compound: A DiffSBDD Tutorial

The pipeline at a glance

Step 1: Pocket detection with Boltz-2

Step 2: Generating candidates with DiffSBDD

Step 3: Filtering with ADMET

Step 4: Ranking and rationale

Common pitfalls

Bad pocket input

Over-aggressive filtering

Skipping chemist review

Treating the shortlist as a hit list

Bottom line

Frequently Asked Questions

What does 'lead compound' actually mean?

What is Boltz-2 doing in this pipeline?

Why filter with ADMET after generation?

How many candidates should I generate?

Do I need to run all four steps?

Is this ready for drug discovery in practice?

Does SciRouter handle the GPU orchestration?

Related Tools

DiffSBDD — 3D Structure-Based Drug Design

Boltz-2 — Complex Prediction

More in the Structure-Based Drug Design Series

DiffSBDD: 3D Pocket-Conditioned Molecule Generation Explained

Diffusion Models vs Reinforcement Learning for Drug Design (2026)

3D Drug Design with Diffusion Models: The 2026 Guide

Try this yourself