Why diffusion and not VAEs or flow matching?

Diffusion has become the dominant 3D generative paradigm for drug design because it handles the symmetry group of 3D space cleanly when combined with equivariant networks, it scales to large training sets, and it produces high-quality samples with controllable denoising steps. VAEs and flow matching both have their proponents and in some cases comparable results. Diffusion is winning the adoption race in 2026.

What datasets are used to train 3D diffusion models?

The most common is CrossDocked2020, which provides pocket-ligand pairs derived from the Protein Data Bank with additional docking-based augmentation. Some models also train on PDBbind and BindingMOAD. The availability of good pocket-ligand training data is one of the bottlenecks for the field.

How do you evaluate a 3D generator?

With difficulty. Common metrics include validity (does the SMILES parse), drug-likeness (QED), synthetic accessibility (SAS), pocket binding score from a docking program, structural similarity to the training set, and diversity within the generated batch. Each of these captures a different aspect of quality and none of them alone tells you whether the generator is actually useful.

Are the generated molecules synthesizable?

Sometimes, sometimes not. 3D diffusion generators are not trained with synthesis in mind, so some candidates are impractical to make. Always run a synthesis-accessibility score or a retrosynthesis check after generation and filter accordingly.

Do I need an equivariant network?

Strongly recommended. Without equivariance the model has to relearn the same chemistry in every rotation, which wastes capacity and hurts generalization. E(3)-equivariant graph networks are the standard choice for the denoising backbone in 3D SBDD diffusion.

Is this production-ready?

For hypothesis generation and triage, yes. For end-to-end automated drug discovery, no. The right way to think about it is as the first stage of a pipeline that ends with chemists, assays, and iterative optimization. Diffusion models change the throughput and quality of the first stage, not the entire pipeline.

How do I actually try it?

SciRouter exposes DiffSBDD as a managed tool. You send a pocket definition, get back 3D candidates, and chain into an ADMET filter and LLM rationale. No GPU setup required on your side.

3D Drug Design with Diffusion Models: The 2026 Guide

Three-dimensional drug design is the hardest problem in computational chemistry, and for years the dominant approach was “generate a SMILES, then dock it.” That approach is now being replaced by generative models that work directly in 3D — specifically, diffusion models that operate on atoms and coordinates in a binding pocket. This guide is an overview of the state of the field in 2026: the models, the training data, the evaluation, and the practical integration patterns.

Note

For a deeper dive into the mechanics of one specific model, see DiffSBDD explained. For a head-to-head with reinforcement learning, see Diffusion vs RL for drug design.

The 3D generative landscape

The leading 3D pocket-aware generators are all diffusion models with equivariant backbones. The most widely cited are DiffSBDD and TargetDiff, but the field also includes Pocket2Mol, GraphBP, and several variants built for specific sub-problems like fragment growing or scaffold hopping. All of them share the same basic loop: fix the pocket, start from noise in the ligand coordinates, and iteratively denoise.

The reason diffusion won this generational race is the combination of three properties:

Native 3D support. The denoising target is 3D coordinates, so the generator never has to defer geometry to a downstream step.
Clean symmetry handling. Combined with E(3)-equivariant networks, diffusion handles rotation and translation in a principled way.
Scalability. Diffusion models scale well with training data, which matters as pocket-ligand datasets grow.

Training data

Good pocket-ligand training data is the bottleneck for the entire field. The Protein Data Bank is the ultimate source of truth, but the number of high-quality protein-ligand complexes in it is small compared to what you would need for a large generative model.

CrossDocked2020. The de facto pretraining dataset. It augments PDB complexes with docking poses across related pockets, which produces a much larger training set at the cost of introducing some docking artifacts.
PDBbind. A curated subset of the PDB with binding affinities, widely used for affinity prediction and as a cleaner training source.
BindingMOAD. Another curated complex dataset with a focus on biologically relevant interactions.

Building a better training dataset is one of the quiet infrastructure improvements that would move the field the most. Every generator is partly a reflection of the biases in its training data.

Evaluation is hard

3D generators are notoriously difficult to evaluate. There is no single metric that captures “this is a good drug design generator.” The standard evaluation protocol pulls together several complementary metrics:

Validity. Does the generated structure parse into a valid molecule with reasonable geometry?
Drug-likeness (QED). Is it in the chemical space associated with real drugs?
Synthetic accessibility (SAS). Could a chemist actually make it?
Docking score. When redocked into the pocket, does it bind? This is the closest thing to a task-specific metric.
Diversity. Do repeated runs explore different chemistry or do they collapse to the same mode?
Uniqueness from training. Are the generated molecules novel, or is the model just recalling its training set?

A model can do well on validity and QED while doing poorly on diversity and novelty, or vice versa. When you read a paper that reports a single number, ask what else it is hiding. When you run your own evaluation, compute all of the metrics above and look at them together.

How to integrate diffusion into a real pipeline

Diffusion generators work best as the first stage of a longer pipeline. On their own they produce interesting candidates but do not account for developability, synthesis, or specific property targets. The pipeline pattern that works in practice:

Step 1: Define the pocket. Use a crystal structure if you have one, or a prediction from a structural model like Boltz-2 if you do not.
Step 2: Generate with the diffusion model. Sample a few hundred candidates per pocket.
Step 3: Filter aggressively on drug-likeness, synthesis score, and forbidden motifs.
Step 4: Dock survivors with a traditional docking program to validate pose.
Step 5: Rank with an ML binding predictor or a physics-based scoring function.
Step 6: Pass the top 10 through a chemistry LLM like TxGemma for a written rationale.
Step 7: Human chemist review.

For a hands-on walkthrough of this exact pipeline on SciRouter, see the pocket-to-lead tutorial.

Open problems

Property control

Diffusion generators do not let you say “please produce molecules with logP between 2 and 4.” Classifier-guided diffusion offers a path but is computationally expensive and still in active research.

Explicit multi-objective optimization

Real drug discovery has many competing objectives (potency, selectivity, solubility, metabolism). Diffusion models do not handle multi-objective optimization as naturally as RL methods do.

Synthesis-aware generation

Building synthetic accessibility into the generator itself rather than filtering after the fact is an active research area. Early results suggest it is possible but not yet production-ready.

Rare and novel chemistries

Generators are biased toward the training distribution. If your target needs unusual chemistry that is under- represented in CrossDocked, the generator will under-serve you.

Warning

3D diffusion generators are powerful but not magical. They produce hypotheses that still need expert review, experimental validation, and iterative optimization. The right mental model is a very capable junior medicinal chemist — fast, prolific, and needing supervision.

Where SciRouter fits

SciRouter's job is to take all of this infrastructure off your plate. You call DiffSBDD through a managed endpoint, pass a pocket, and get candidates. You chain into Boltz-2, an ADMET predictor, TxGemma, and whatever else you need — all through a single API. You do not provision GPUs, you do not manage weights, and you do not write glue code between four different research repositories.

For a broader overview of the structure-based AI playbook, see our structure-based drug discovery AI playbook.

Bottom line

3D drug design with diffusion models is real, mature enough to be useful in production, and still rapidly improving. The winning pattern is to treat the generator as the first stage of a longer pipeline, to evaluate across multiple metrics rather than chasing a single number, and to keep humans and physics-based models in the loop for validation.

Try DiffSBDD on SciRouter →