What Is Generative Chemistry?
Traditional drug discovery starts with a known active molecule and makes incremental modifications — changing a methyl group here, adding a fluorine there. This works, but it is limited to the chemical neighborhood of your starting point.
Generative chemistry takes a fundamentally different approach. AI models learn the grammar of molecular structure from millions of known compounds and then generate entirely new molecules optimized for specific objectives. Instead of searching near a known compound, generative models can explore vast regions of chemical space that medicinal chemists would never consider.
The result is a shift from molecule optimization to molecule invention. De novo drug design produces novel scaffolds, unexpected chemotypes, and patent-free chemical matter — all driven by computational objectives rather than human intuition alone.
How REINVENT4 Works
REINVENT4, developed by AstraZeneca, is one of the most widely used generative chemistry platforms in pharmaceutical research. It uses a recurrent neural network (RNN) trained on SMILES strings — the text-based molecular representation — and optimizes it through reinforcement learning.
The Three Phases
- Pre-training: An RNN learns the syntax and statistics of valid SMILES strings from a large dataset of known drug-like molecules (typically ChEMBL or ZINC)
- Transfer learning (optional): The model is fine-tuned on a focused set of molecules related to your target or chemical series to bias generation toward relevant chemical space
- Reinforcement learning: A multi-objective scoring function rewards molecules with desired properties. The model iteratively generates, scores, and updates to produce increasingly optimized candidates
Using REINVENT4 via the SciRouter API
SciRouter provides REINVENT4 as a managed API service. You define your optimization objectives and the platform handles model execution, scoring, and filtering. No GPU setup or software installation required.
import os, requests, time
API_KEY = os.environ["SCIROUTER_API_KEY"]
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# Define generation objectives
job = requests.post(f"{BASE}/chemistry/generate", headers=HEADERS, json={
"model": "reinvent4",
"num_molecules": 50,
"objectives": {
"drug_likeness": {"weight": 1.0, "method": "lipinski"},
"similarity": {
"weight": 0.5,
"reference_smiles": "Cc1ccc(-c2cc(C(F)(F)F)nn2-c2ccc(S(N)(=O)=O)cc2)cc1",
"min_similarity": 0.3,
"max_similarity": 0.7,
},
"synthetic_accessibility": {"weight": 0.8, "max_sa_score": 4.0},
"molecular_weight": {"weight": 0.3, "min": 200, "max": 500},
},
}).json()
print(f"Job ID: {job['job_id']}")
# Poll for results
while True:
result = requests.get(
f"{BASE}/chemistry/generate/{job['job_id']}", headers=HEADERS
).json()
if result["status"] == "completed":
break
if result["status"] == "failed":
raise RuntimeError(result.get("error", "Unknown error"))
time.sleep(5)
print(f"Generated {len(result['molecules'])} molecules\n")
for mol in result["molecules"][:5]:
print(f"SMILES: {mol['smiles']}")
print(f" Drug-likeness: {mol['scores']['drug_likeness']:.2f}")
print(f" SA Score: {mol['scores']['synthetic_accessibility']:.1f}")
print(f" Similarity: {mol['scores']['similarity']:.2f}")
print()The Molecular Design Lab on SciRouter
SciRouter's Molecular Design Lab provides a visual interface for generative chemistry. You configure your objectives through the dashboard, launch generation runs, and browse results with interactive molecular viewers. Each generated molecule is automatically profiled with molecular properties, drug-likeness metrics, and ADMET predictions.
The lab also supports iterative design. Take your best candidates from one generation round, use them as starting points for the next, and progressively converge on molecules that meet all your design criteria.
From Generation to Lead Candidate
Generating molecules is the first step. A complete de novo design workflow integrates generation with evaluation and filtering:
Step 1: Generate
Use REINVENT4 to produce 50-500 candidate molecules optimized for your target profile. The scoring function enforces drug-likeness, synthetic accessibility, and structural novelty constraints during generation.
Step 2: Profile
Calculate detailed molecular properties and ADMET predictions for each candidate. Filter out molecules that violate hard constraints on toxicity, metabolic stability, or permeability.
# Profile top candidates with molecular properties and ADMET
top_molecules = result["molecules"][:10]
for mol in top_molecules:
# Calculate detailed properties
props = requests.post(f"{BASE}/chemistry/properties",
headers=HEADERS, json={"smiles": mol["smiles"]}).json()
# Predict ADMET properties
admet = requests.post(f"{BASE}/chemistry/admet",
headers=HEADERS, json={"smiles": mol["smiles"]}).json()
print(f"SMILES: {mol['smiles']}")
print(f" MW: {props['molecular_weight']:.1f}")
print(f" LogP: {props['logp']:.2f}")
print(f" hERG risk: {admet['herg_inhibition']}")
print(f" Hepatotoxicity: {admet['hepatotoxicity']}")
print()Step 3: Dock
For candidates that pass property filters, predict binding to your target protein using DiffDock or Chai-1. This adds structural context — you can see exactly how each candidate is predicted to interact with the binding site.
Step 4: Rank and Select
Rank surviving candidates by a composite score combining predicted binding affinity, drug-likeness, ADMET profile, and synthetic accessibility. Select the top 3 to 10 molecules for synthesis and experimental testing.
When to Use De Novo Design
- New target, no known actives: When you have a protein target but no known small molecule binders to start from
- Patent busting: Generate novel scaffolds that achieve the same binding mode as a patented compound but with distinct chemical structure
- Escaping local optima: When traditional medicinal chemistry modifications are not improving potency, generative models can jump to entirely different chemotypes
- Library design: Create focused compound libraries for high-throughput screening campaigns
Next Steps
De novo drug design works best as part of a multi-tool pipeline. Pair REINVENT4 with Molecular Properties for drug-likeness filtering, ADMET Prediction for safety profiling, and DiffDock for binding pose prediction.
To learn about the protein side of drug discovery, see our ProteinMPNN tutorial for protein design or the Chai-1 guide for protein-ligand complex prediction.
Sign up for a free SciRouter API key and start generating novel drug candidates today. With 500 free credits per month and no infrastructure to manage, it is the fastest way to go from target to lead molecule.