This tutorial walks through a full structure-based drug design pipeline end to end. We start with a protein target sequence, use Boltz-2 to predict its structure and identify the binding pocket, use DiffSBDD to generate 3D lead candidates conditioned on the pocket, and apply an ADMET filter to narrow the list. Every step runs on SciRouter, so you do not need any local GPU or model-weight management.
The goal is not to produce a drug — that is still a years- long wet-lab effort — but to produce a curated list of candidates that a medicinal chemist could actually start working on tomorrow.
The pipeline at a glance
- Step 1 — Pocket. Boltz-2 predicts the protein structure and identifies the binding pocket.
- Step 2 — Generate. DiffSBDD produces 3D candidates conditioned on the pocket.
- Step 3 — Filter. An ADMET predictor removes candidates that would fail in vivo.
- Step 4 — Review. A chemistry LLM like TxGemma writes a rationale for each survivor, and a human reviews the shortlist.
Step 1: Pocket detection with Boltz-2
We start by calling SciRouter's Boltz-2 endpoint with the target sequence. Boltz-2 returns the predicted structure along with pocket definitions.
import os
import httpx
API_KEY = os.environ["SCIROUTER_API_KEY"]
BASE = "https://scirouter-gateway-production.up.railway.app"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
def predict_structure_and_pocket(sequence: str) -> dict:
resp = httpx.post(
f"{BASE}/v1/complexes/boltz2",
headers=HEADERS,
json={"sequence": sequence, "detect_pockets": True},
timeout=600.0,
)
resp.raise_for_status()
return resp.json()
target_sequence = "MKTIIALSYIFCLVFADYKDDDDK..." # your sequence
result = predict_structure_and_pocket(target_sequence)
pocket = result["pockets"][0] # take the top pocket
print(f"Pocket residues: {pocket['residues']}")
print(f"Pocket center: {pocket['center']}")The pocket object includes the set of residues forming the pocket, the predicted pocket center, and the atoms the generator will use for conditioning. If Boltz-2 finds multiple pockets, pick the one that matches the biology of your target — the active site, the allosteric site, or whichever cavity you care about.
Step 2: Generating candidates with DiffSBDD
Next we pass the pocket to DiffSBDD and ask it to generate candidate molecules. The generator produces 3D atoms directly, and we return a list of candidate structures.
def generate_candidates(pocket: dict, n_candidates: int = 200) -> list[dict]:
resp = httpx.post(
f"{BASE}/v1/design/diffsbdd",
headers=HEADERS,
json={
"pocket": pocket,
"n_candidates": n_candidates,
"seed": 42,
},
timeout=900.0,
)
resp.raise_for_status()
return resp.json()["candidates"]
candidates = generate_candidates(pocket, n_candidates=200)
print(f"Got {len(candidates)} candidates from DiffSBDD")Each candidate includes a SMILES string, the 3D atom positions, and a generator score. At this stage you have a pool of pocket-aware molecules, but nothing in that pool has been evaluated for developability.
Step 3: Filtering with ADMET
This is the step that separates a generative dump from a curated candidate list. We send each candidate through SciRouter's ADMET panel and keep the ones that look developable.
def admet_profile(smiles: str) -> dict:
resp = httpx.post(
f"{BASE}/v1/chemistry/admet",
headers=HEADERS,
json={"smiles": smiles},
timeout=60.0,
)
resp.raise_for_status()
return resp.json()
def is_drug_like(profile: dict) -> bool:
return (
profile["qed"] > 0.5
and profile["logp"] > 0
and profile["logp"] < 5
and profile["herg"]["verdict"] != "high"
and profile["cyp_inhibition"]["verdict"] != "high"
)
survivors = []
for cand in candidates:
profile = admet_profile(cand["smiles"])
if is_drug_like(profile):
cand["admet"] = profile
survivors.append(cand)
print(f"{len(survivors)} candidates survived ADMET filtering")The thresholds above are illustrative. In practice you tune them based on what you want from the leads — more permissive filters early, more restrictive filters late. For CNS targets you would add BBB penetration. For oral drugs you would add Lipinski and bioavailability.
Step 4: Ranking and rationale
From the survivors, we pick the top candidates by pocket fit score and ask TxGemma for a rationale on each one. This produces a shortlist that a chemist can review with context, not just a list of numbers.
def txgemma_rationale(smiles: str, target: str) -> str:
resp = httpx.post(
f"{BASE}/v1/interpret/txgemma",
headers=HEADERS,
json={
"smiles": smiles,
"question": (
f"This molecule was generated as a candidate for {target}. "
"Comment on its pharmacophore, likely ADMET profile, and "
"any structural liabilities. Keep the answer under 120 words."
),
},
timeout=90.0,
)
resp.raise_for_status()
return resp.json()["answer"]
survivors.sort(key=lambda c: c["score"], reverse=True)
shortlist = survivors[:10]
for cand in shortlist:
cand["rationale"] = txgemma_rationale(cand["smiles"], target="your target")The shortlist now contains 3D coordinates, SMILES, an ADMET profile, a pocket-fit score, and a written rationale for each candidate. This is the format a chemist can actually work with.
Common pitfalls
Bad pocket input
A wrong pocket leads to wrong candidates. If your target has multiple cavities, make sure you are conditioning on the right one. When in doubt, inspect the pocket definition visually before handing it to DiffSBDD.
Over-aggressive filtering
Filter for developability, not for perfection. If your ADMET filter cuts 99% of candidates, you have lost the benefit of exploration. Start with loose filters and tighten them based on what the surviving set looks like.
Skipping chemist review
LLM rationales and ADMET predictions are triage tools. They are not a substitute for a human chemist reviewing the top candidates and asking the hard questions. Always end with a review step.
Treating the shortlist as a hit list
The candidates are hypotheses. They still need synthesis, biophysical assays, and cell-based validation before they mean anything. This pipeline moves you from zero to “worth testing.” It does not move you from zero to drug.
Bottom line
A few years ago, a pipeline like this one — pocket detection, pocket-conditioned 3D generation, ADMET filtering, LLM rationale — would have required a team of specialists and a lot of infrastructure. Today it is a single Python script against SciRouter's hosted API. What the team used to produce in weeks can be produced in an afternoon, with a chemist then taking over for the interesting part of the work.