ChemistryChemistry

De Novo Drug Design with AI: Generating Novel Molecules

How generative chemistry and REINVENT4 design novel drug-like molecules from scratch. SciRouter Molecular Design Lab walkthrough with working API examples.

Ryan Bethencourt
May 7, 2026
10 min read

What Is Generative Chemistry?

Traditional drug discovery starts with a known active molecule and makes incremental modifications — changing a methyl group here, adding a fluorine there. This works, but it is limited to the chemical neighborhood of your starting point.

Generative chemistry takes a fundamentally different approach. AI models learn the grammar of molecular structure from millions of known compounds and then generate entirely new molecules optimized for specific objectives. Instead of searching near a known compound, generative models can explore vast regions of chemical space that medicinal chemists would never consider.

The result is a shift from molecule optimization to molecule invention. De novo drug design produces novel scaffolds, unexpected chemotypes, and patent-free chemical matter — all driven by computational objectives rather than human intuition alone.

How REINVENT4 Works

REINVENT4, developed by AstraZeneca, is one of the most widely used generative chemistry platforms in pharmaceutical research. It uses a recurrent neural network (RNN) trained on SMILES strings — the text-based molecular representation — and optimizes it through reinforcement learning.

The Three Phases

  • Pre-training: An RNN learns the syntax and statistics of valid SMILES strings from a large dataset of known drug-like molecules (typically ChEMBL or ZINC)
  • Transfer learning (optional): The model is fine-tuned on a focused set of molecules related to your target or chemical series to bias generation toward relevant chemical space
  • Reinforcement learning: A multi-objective scoring function rewards molecules with desired properties. The model iteratively generates, scores, and updates to produce increasingly optimized candidates
Note
The scoring function is where you encode your drug design objectives. Common components include predicted binding affinity, Lipinski drug-likeness, synthetic accessibility, novelty (dissimilarity from known compounds), and ADMET property predictions.

Using REINVENT4 via the SciRouter API

SciRouter provides REINVENT4 as a managed API service. You define your optimization objectives and the platform handles model execution, scoring, and filtering. No GPU setup or software installation required.

Generate novel molecules with REINVENT4
import os, requests, time

API_KEY = os.environ["SCIROUTER_API_KEY"]
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# Define generation objectives
job = requests.post(f"{BASE}/chemistry/generate", headers=HEADERS, json={
    "model": "reinvent4",
    "num_molecules": 50,
    "objectives": {
        "drug_likeness": {"weight": 1.0, "method": "lipinski"},
        "similarity": {
            "weight": 0.5,
            "reference_smiles": "Cc1ccc(-c2cc(C(F)(F)F)nn2-c2ccc(S(N)(=O)=O)cc2)cc1",
            "min_similarity": 0.3,
            "max_similarity": 0.7,
        },
        "synthetic_accessibility": {"weight": 0.8, "max_sa_score": 4.0},
        "molecular_weight": {"weight": 0.3, "min": 200, "max": 500},
    },
}).json()

print(f"Job ID: {job['job_id']}")

# Poll for results
while True:
    result = requests.get(
        f"{BASE}/chemistry/generate/{job['job_id']}", headers=HEADERS
    ).json()
    if result["status"] == "completed":
        break
    if result["status"] == "failed":
        raise RuntimeError(result.get("error", "Unknown error"))
    time.sleep(5)

print(f"Generated {len(result['molecules'])} molecules\n")
for mol in result["molecules"][:5]:
    print(f"SMILES: {mol['smiles']}")
    print(f"  Drug-likeness: {mol['scores']['drug_likeness']:.2f}")
    print(f"  SA Score: {mol['scores']['synthetic_accessibility']:.1f}")
    print(f"  Similarity: {mol['scores']['similarity']:.2f}")
    print()
Tip
The similarity objective with min and max bounds creates a "Goldilocks zone" — generated molecules are related enough to your reference compound to be relevant, but different enough to represent novel chemical matter.

The Molecular Design Lab on SciRouter

SciRouter's Molecular Design Lab provides a visual interface for generative chemistry. You configure your objectives through the dashboard, launch generation runs, and browse results with interactive molecular viewers. Each generated molecule is automatically profiled with molecular properties, drug-likeness metrics, and ADMET predictions.

The lab also supports iterative design. Take your best candidates from one generation round, use them as starting points for the next, and progressively converge on molecules that meet all your design criteria.

From Generation to Lead Candidate

Generating molecules is the first step. A complete de novo design workflow integrates generation with evaluation and filtering:

Step 1: Generate

Use REINVENT4 to produce 50-500 candidate molecules optimized for your target profile. The scoring function enforces drug-likeness, synthetic accessibility, and structural novelty constraints during generation.

Step 2: Profile

Calculate detailed molecular properties and ADMET predictions for each candidate. Filter out molecules that violate hard constraints on toxicity, metabolic stability, or permeability.

Profile generated molecules
# Profile top candidates with molecular properties and ADMET
top_molecules = result["molecules"][:10]

for mol in top_molecules:
    # Calculate detailed properties
    props = requests.post(f"{BASE}/chemistry/properties",
        headers=HEADERS, json={"smiles": mol["smiles"]}).json()

    # Predict ADMET properties
    admet = requests.post(f"{BASE}/chemistry/admet",
        headers=HEADERS, json={"smiles": mol["smiles"]}).json()

    print(f"SMILES: {mol['smiles']}")
    print(f"  MW: {props['molecular_weight']:.1f}")
    print(f"  LogP: {props['logp']:.2f}")
    print(f"  hERG risk: {admet['herg_inhibition']}")
    print(f"  Hepatotoxicity: {admet['hepatotoxicity']}")
    print()

Step 3: Dock

For candidates that pass property filters, predict binding to your target protein using DiffDock or Chai-1. This adds structural context — you can see exactly how each candidate is predicted to interact with the binding site.

Step 4: Rank and Select

Rank surviving candidates by a composite score combining predicted binding affinity, drug-likeness, ADMET profile, and synthetic accessibility. Select the top 3 to 10 molecules for synthesis and experimental testing.

When to Use De Novo Design

  • New target, no known actives: When you have a protein target but no known small molecule binders to start from
  • Patent busting: Generate novel scaffolds that achieve the same binding mode as a patented compound but with distinct chemical structure
  • Escaping local optima: When traditional medicinal chemistry modifications are not improving potency, generative models can jump to entirely different chemotypes
  • Library design: Create focused compound libraries for high-throughput screening campaigns

Next Steps

De novo drug design works best as part of a multi-tool pipeline. Pair REINVENT4 with Molecular Properties for drug-likeness filtering, ADMET Prediction for safety profiling, and DiffDock for binding pose prediction.

To learn about the protein side of drug discovery, see our ProteinMPNN tutorial for protein design or the Chai-1 guide for protein-ligand complex prediction.

Sign up for a free SciRouter API key and start generating novel drug candidates today. With 500 free credits per month and no infrastructure to manage, it is the fastest way to go from target to lead molecule.

Frequently Asked Questions

What is de novo drug design?

De novo drug design is the process of generating entirely new molecular structures from scratch, rather than modifying existing compounds. AI generative models learn the rules of chemistry from large datasets and can propose novel molecules with desired properties like target binding affinity, drug-likeness, and synthesizability.

What is REINVENT4?

REINVENT4 is an open-source generative chemistry platform developed by AstraZeneca. It uses reinforcement learning to train a molecular generator that produces SMILES strings for novel molecules optimized toward user-defined objectives like binding affinity, ADMET properties, and synthetic accessibility.

How does reinforcement learning apply to molecule generation?

A pre-trained generative model proposes candidate molecules as SMILES strings. A scoring function evaluates each molecule on desired properties (binding, drug-likeness, novelty). The model is then updated to increase the probability of generating high-scoring molecules. This cycle repeats, gradually steering the generator toward the desired chemical space.

Are AI-generated molecules actually synthesizable?

Modern generative models like REINVENT4 include synthetic accessibility scoring as part of their optimization objectives. However, not every generated molecule is trivially synthesizable. The SciRouter pipeline includes a synthetic accessibility score to help prioritize candidates that are practical to make in the lab.

How many molecules can REINVENT4 generate?

A typical REINVENT4 run generates hundreds to thousands of candidate molecules in a single session. Through the SciRouter API, you can generate batches of 100 molecules per request, filter them by properties, and iterate. The chemical space of drug-like molecules is estimated at 10 to the power of 60, so generative approaches explore far more efficiently than brute-force enumeration.

Try this yourself

500 free credits. No credit card required.