Three-dimensional drug design is the hardest problem in computational chemistry, and for years the dominant approach was “generate a SMILES, then dock it.” That approach is now being replaced by generative models that work directly in 3D — specifically, diffusion models that operate on atoms and coordinates in a binding pocket. This guide is an overview of the state of the field in 2026: the models, the training data, the evaluation, and the practical integration patterns.
The 3D generative landscape
The leading 3D pocket-aware generators are all diffusion models with equivariant backbones. The most widely cited are DiffSBDD and TargetDiff, but the field also includes Pocket2Mol, GraphBP, and several variants built for specific sub-problems like fragment growing or scaffold hopping. All of them share the same basic loop: fix the pocket, start from noise in the ligand coordinates, and iteratively denoise.
The reason diffusion won this generational race is the combination of three properties:
- Native 3D support. The denoising target is 3D coordinates, so the generator never has to defer geometry to a downstream step.
- Clean symmetry handling. Combined with E(3)-equivariant networks, diffusion handles rotation and translation in a principled way.
- Scalability. Diffusion models scale well with training data, which matters as pocket-ligand datasets grow.
Training data
Good pocket-ligand training data is the bottleneck for the entire field. The Protein Data Bank is the ultimate source of truth, but the number of high-quality protein-ligand complexes in it is small compared to what you would need for a large generative model.
- CrossDocked2020. The de facto pretraining dataset. It augments PDB complexes with docking poses across related pockets, which produces a much larger training set at the cost of introducing some docking artifacts.
- PDBbind. A curated subset of the PDB with binding affinities, widely used for affinity prediction and as a cleaner training source.
- BindingMOAD. Another curated complex dataset with a focus on biologically relevant interactions.
Building a better training dataset is one of the quiet infrastructure improvements that would move the field the most. Every generator is partly a reflection of the biases in its training data.
Evaluation is hard
3D generators are notoriously difficult to evaluate. There is no single metric that captures “this is a good drug design generator.” The standard evaluation protocol pulls together several complementary metrics:
- Validity. Does the generated structure parse into a valid molecule with reasonable geometry?
- Drug-likeness (QED). Is it in the chemical space associated with real drugs?
- Synthetic accessibility (SAS). Could a chemist actually make it?
- Docking score. When redocked into the pocket, does it bind? This is the closest thing to a task-specific metric.
- Diversity. Do repeated runs explore different chemistry or do they collapse to the same mode?
- Uniqueness from training. Are the generated molecules novel, or is the model just recalling its training set?
A model can do well on validity and QED while doing poorly on diversity and novelty, or vice versa. When you read a paper that reports a single number, ask what else it is hiding. When you run your own evaluation, compute all of the metrics above and look at them together.
How to integrate diffusion into a real pipeline
Diffusion generators work best as the first stage of a longer pipeline. On their own they produce interesting candidates but do not account for developability, synthesis, or specific property targets. The pipeline pattern that works in practice:
- Step 1: Define the pocket. Use a crystal structure if you have one, or a prediction from a structural model like Boltz-2 if you do not.
- Step 2: Generate with the diffusion model. Sample a few hundred candidates per pocket.
- Step 3: Filter aggressively on drug-likeness, synthesis score, and forbidden motifs.
- Step 4: Dock survivors with a traditional docking program to validate pose.
- Step 5: Rank with an ML binding predictor or a physics-based scoring function.
- Step 6: Pass the top 10 through a chemistry LLM like TxGemma for a written rationale.
- Step 7: Human chemist review.
For a hands-on walkthrough of this exact pipeline on SciRouter, see the pocket-to-lead tutorial.
Open problems
Property control
Diffusion generators do not let you say “please produce molecules with logP between 2 and 4.” Classifier-guided diffusion offers a path but is computationally expensive and still in active research.
Explicit multi-objective optimization
Real drug discovery has many competing objectives (potency, selectivity, solubility, metabolism). Diffusion models do not handle multi-objective optimization as naturally as RL methods do.
Synthesis-aware generation
Building synthetic accessibility into the generator itself rather than filtering after the fact is an active research area. Early results suggest it is possible but not yet production-ready.
Rare and novel chemistries
Generators are biased toward the training distribution. If your target needs unusual chemistry that is under- represented in CrossDocked, the generator will under-serve you.
Where SciRouter fits
SciRouter's job is to take all of this infrastructure off your plate. You call DiffSBDD through a managed endpoint, pass a pocket, and get candidates. You chain into Boltz-2, an ADMET predictor, TxGemma, and whatever else you need — all through a single API. You do not provision GPUs, you do not manage weights, and you do not write glue code between four different research repositories.
For a broader overview of the structure-based AI playbook, see our structure-based drug discovery AI playbook.
Bottom line
3D drug design with diffusion models is real, mature enough to be useful in production, and still rapidly improving. The winning pattern is to treat the generator as the first stage of a longer pipeline, to evaluate across multiple metrics rather than chasing a single number, and to keep humans and physics-based models in the loop for validation.