The Synthesizability Problem in Drug Discovery
You have designed the perfect molecule on your computer. It binds the target with nanomolar affinity, passes every ADMET filter, and sits squarely in drug-like chemical space. There is just one problem – nobody can actually make it in a lab.
This is one of the most common and costly failures in computational drug discovery. Teams spend weeks optimizing virtual molecules only to discover that their top candidates require twenty-step synthesis routes, exotic reagents, or reactions that do not reliably work at scale. The disconnect between computational design and synthetic reality has killed more drug programs than poor binding affinity.
Synthetic accessibility scoring exists to catch this problem early. Before you invest in docking studies, ADMET profiling, or lead optimization, you can check whether your molecule is realistically synthesizable. A two-second computation can save months of wasted effort.
In this guide, we will explain how synthetic accessibility scores work, walk through real examples from approved drugs, and show you how to check any molecule using the SciRouter API. Whether you are screening a generative chemistry output of 500 molecules or evaluating a single lead candidate, SA scoring should be one of your first filters.
What Is a Synthetic Accessibility Score?
The synthetic accessibility (SA) score was introduced by Peter Ertl and Ansgar Schuffenhauer at Novartis in 2009. It estimates how difficult a molecule would be to synthesize using conventional organic chemistry, producing a single number on a scale from 1 (trivial to make) to 10 (essentially impossible).
The algorithm works by combining two components. The first is a fragment score that measures how common the molecule's substructures are in known compounds. Molecules built from frequently occurring fragments – the kind of pieces that appear in commercial building block catalogs – score well. Unusual or unprecedented fragment combinations score poorly. The fragment frequencies are typically derived from large databases like PubChem or ChEMBL, giving the model a statistical picture of what chemists actually make.
The second component is a complexity penalty that accounts for structural features known to make synthesis harder. This includes the number of stereocenters (each one can double the difficulty of a synthesis), macrocyclic rings (notoriously hard to close), spiro and bridged ring systems, and the overall size of the molecule. Large, complex molecules with many chiral centers receive higher (worse) scores.
The SA Score Scale in Practice
- 1.0 – 2.0 (Very Easy): Simple molecules with common functional groups. Think aspirin, ibuprofen, and most commodity chemicals. Typically one to three synthetic steps from commercial starting materials.
- 2.0 – 3.0 (Easy): Straightforward drug-like molecules. Most fragment-based and HTS-derived leads fall here. A competent medicinal chemistry CRO can make these without difficulty.
- 3.0 – 4.0 (Moderate): Requires some planning but uses well-established reactions. Many approved drugs sit in this range. Five to ten synthetic steps are typical.
- 4.0 – 6.0 (Difficult): Challenging synthesis requiring specialist expertise. May involve air-sensitive reactions, difficult ring closures, or tricky stereochemistry. Budget for multiple attempts and route scouting.
- 6.0 – 8.0 (Very Difficult): Often requires novel reaction development or lengthy linear routes. Natural product-inspired molecules frequently land here. Synthesis campaigns can take months.
- 8.0 – 10.0 (Impractical): Theoretical molecules that would be extremely difficult to synthesize with current technology. If your generative model produces molecules in this range, filter them out.
Real Drug Examples: SA Scores Across the Spectrum
To make the SA scale concrete, let us look at actual marketed drugs and their approximate SA scores. These numbers illustrate the range of synthetic complexity that the pharmaceutical industry has successfully tackled.
Aspirin (SA Score ~1.2)
Aspirin (acetylsalicylic acid, SMILES: CC(=O)Oc1ccccc1C(=O)O) is about as easy as synthesis gets. It is a single acetylation of salicylic acid – one reaction, cheap reagents, near-quantitative yield. Its SA score of roughly 1.2 reflects this simplicity. The molecule has no stereocenters, no unusual rings, and every fragment is abundant in chemical databases.
Ibuprofen (SA Score ~1.6)
Ibuprofen (CC(C)Cc1ccc(cc1)C(C)C(=O)O) is another straightforward synthesis. The Boots process uses three steps from isobutylbenzene. The BHC process, developed later, reduced it to just two catalytic steps with nearly perfect atom economy. Despite being a blockbuster drug, ibuprofen's molecular structure is simple enough that it is often used as a teaching example in undergraduate organic chemistry courses.
Celecoxib (SA Score ~2.8)
Celecoxib (Celebrex) is a COX-2 inhibitor with a diaryl pyrazole core. Its synthesis is more involved than aspirin but still manageable – typically four to five steps. The pyrazole ring construction and sulfonamide installation are well-precedented reactions. An SA score around 2.8 correctly identifies this as a molecule that any pharmaceutical chemistry group could produce without significant difficulty.
Atorvastatin (SA Score ~4.2)
Atorvastatin (Lipitor) starts to show real synthetic complexity. The molecule has two stereocenters in the dihydroxy acid side chain that must be set with high enantioselectivity. The original Pfizer synthesis required about twelve steps. An SA score around 4.2 puts it in the "difficult but achievable" category, which matches reality – Pfizer invested heavily in process chemistry to make manufacturing economical.
Paclitaxel (SA Score ~7.8)
Paclitaxel (Taxol, CC1=C2C(C(=O)C3(C(CC4C(C3C(C(C2(C)C)(CC1OC(=O)C(C(C5=CC=CC=C5)NC(=O)C6=CC=CC=C6)O)O)OC(=O)C7=CC=CC=C7)(CO4)OC(=O)C)O)C)OC(=O)C) is the poster child for synthetic difficulty. The first total synthesis by Robert Holton required over 40 linear steps. Its SA score near 7.8 reflects the molecule's terrifying complexity: four fused rings, eleven stereocenters, and multiple sensitive functional groups. In practice, paclitaxel is produced by semi-synthesis from 10-deacetylbaccatin III extracted from yew tree needles, not by total synthesis.
Retrosynthesis: Beyond the Single Score
While SA scores tell you how hard a molecule is to make, retrosynthetic analysis tells you how to make it. Retrosynthesis works backward from the target molecule, identifying strategic bond disconnections that break it into simpler precursors. Each disconnection corresponds to a known chemical reaction run in reverse. The process continues recursively until all precursors are commercially available starting materials.
E.J. Corey formalized retrosynthetic analysis in the 1960s, earning the 1990 Nobel Prize in Chemistry for his work. Today, AI-powered retrosynthesis tools like ASKCOS, IBM RXN, and Spaya automate this process. They use neural networks trained on millions of published reactions to propose synthetic routes, estimate yields, and flag problematic steps.
For drug discovery workflows, the recommended approach is to use SA scores as a fast first-pass filter and then run retrosynthetic analysis on your top candidates. Screen 1,000 molecules by SA score in seconds, identify the 50 that score below 4.0, and then invest the computational time to generate full synthetic routes for those 50.
When SA Scores and Retrosynthesis Disagree
Occasionally, a molecule will have a moderate SA score (say 3.5) but retrosynthetic analysis reveals that the most obvious route requires a reaction with notoriously low yield or selectivity. Conversely, a molecule with a higher SA score (say 5.0) might have a clever three-step route that a retrosynthesis engine discovers. This is why both tools are complementary. The SA score is your rapid screening heuristic; retrosynthesis is your detailed route planning tool.
Checking Synthetic Accessibility with SciRouter
SciRouter provides a dedicated synthesis-check endpoint that returns the SA score, a categorical feasibility label, and additional details about molecular complexity. You can call it from the Python SDK, the REST API, or through the MCP server for agent-based workflows. No local software installation is required.
import os, requests
API_KEY = os.environ["SCIROUTER_API_KEY"]
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# Check synthetic accessibility for aspirin
result = requests.post(f"{BASE}/generate/synthesis-check", headers=HEADERS, json={
"smiles": "CC(=O)Oc1ccccc1C(=O)O"
}).json()
print(f"Molecule: Aspirin")
print(f"SA Score: {result['sa_score']:.2f}")
print(f"Feasibility: {result['feasibility']}")
print(f"Stereocenters: {result['stereocenters']}")
print(f"Ring systems: {result['ring_systems']}")The response includes the raw SA score (1–10 float), a human-readable feasibility category, and structural descriptors that contribute to the score. This gives you enough information to make a quick go/no-go decision on each molecule.
Batch Screening: Filtering Generative Chemistry Output
The real power of SA scoring comes when you apply it to large sets of molecules. After running a generative chemistry model like REINVENT4, you might have hundreds of candidates. SA scoring lets you immediately discard the molecules that would be impractical to synthesize.
import os, requests
API_KEY = os.environ["SCIROUTER_API_KEY"]
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# Suppose we have candidate molecules from a generative run
candidates = [
{"name": "Aspirin", "smiles": "CC(=O)Oc1ccccc1C(=O)O"},
{"name": "Ibuprofen", "smiles": "CC(C)Cc1ccc(cc1)C(C)C(=O)O"},
{"name": "Caffeine", "smiles": "Cn1c(=O)c2c(ncn2C)n(C)c1=O"},
{"name": "Celecoxib", "smiles": "Cc1ccc(-c2cc(C(F)(F)F)nn2-c2ccc(S(N)(=O)=O)cc2)cc1"},
{"name": "Candidate_A", "smiles": "O=C(NC1CCCCC1)c1ccc(F)cc1"},
{"name": "Candidate_B", "smiles": "CC(=O)Nc1ccc(O)cc1"},
]
synthesizable = []
for mol in candidates:
result = requests.post(
f"{BASE}/generate/synthesis-check",
headers=HEADERS,
json={"smiles": mol["smiles"]}
).json()
sa = result["sa_score"]
label = result["feasibility"]
status = "PASS" if sa < 4.0 else "REVIEW" if sa < 6.0 else "REJECT"
print(f"{mol['name']:20s} SA={sa:.2f} ({label:12s}) [{status}]")
if sa < 4.0:
synthesizable.append({**mol, "sa_score": sa})
print(f"\n{len(synthesizable)} of {len(candidates)} molecules passed SA filter (< 4.0)")Combining SA Scores with Molecular Properties
SA scoring is most powerful when combined with other property filters. A molecule that is easy to synthesize but fails Lipinski's rules is just as useless as one that is drug-like but impossible to make. The SciRouter API lets you chain multiple endpoints to build a comprehensive filter.
import os, requests
API_KEY = os.environ["SCIROUTER_API_KEY"]
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
smiles_list = [
"CC(=O)Oc1ccccc1C(=O)O", # Aspirin
"CC(C)Cc1ccc(cc1)C(C)C(=O)O", # Ibuprofen
"Cn1c(=O)c2c(ncn2C)n(C)c1=O", # Caffeine
"O=C(NC1CCCCC1)c1ccc(F)cc1", # Candidate_A
]
for smi in smiles_list:
# Get molecular properties
props = requests.post(f"{BASE}/chemistry/properties",
headers=HEADERS, json={"smiles": smi}).json()
# Get SA score
sa = requests.post(f"{BASE}/generate/synthesis-check",
headers=HEADERS, json={"smiles": smi}).json()
mw = props["molecular_weight"]
logp = props["logp"]
hbd = props["hbd"]
hba = props["hba"]
sa_score = sa["sa_score"]
# Apply combined filter
lipinski_ok = mw < 500 and logp < 5 and hbd <= 5 and hba <= 10
sa_ok = sa_score < 4.0
verdict = "PASS" if (lipinski_ok and sa_ok) else "FAIL"
print(f"SMILES: {smi}")
print(f" MW={mw:.1f} LogP={logp:.2f} HBD={hbd} HBA={hba} SA={sa_score:.2f}")
print(f" Lipinski: {'PASS' if lipinski_ok else 'FAIL'} | SA: {'PASS' if sa_ok else 'FAIL'} | Overall: {verdict}")
print()This pattern – screen by SA score first, then calculate drug-likeness properties – is the standard workflow in computational medicinal chemistry. By eliminating unsynthesizable compounds early, you save API calls on the more detailed property analysis and focus your resources on molecules that have a realistic path to the lab.
When to Trust (and When to Question) SA Scores
SA scores are powerful screening tools, but understanding their limitations will make you a better computational chemist. Here are the situations where SA scores work well and where you should exercise caution.
SA Scores Work Well For
- Drug-like small molecules: The SA algorithm was trained primarily on pharmaceutical-relevant compounds. For typical drug discovery targets (MW 200–600, 0–3 rings, common heteroatoms), SA scores are reliable guides.
- Comparative ranking: Even when absolute SA values are debatable, the relative ranking of molecules by SA score is usually correct. If molecule A scores 2.5 and molecule B scores 5.5, molecule A is almost certainly easier to make.
- Filtering generative output: When a generative model produces 500 molecules, SA scoring reliably separates the synthesizable candidates from the fantastical ones. This is the highest-value use case.
- Early-stage triage: During hit identification, SA scores help prioritize which virtual hits to attempt in the lab first.
SA Scores Can Be Misleading For
- Natural products: Many natural products have high SA scores because their de novo synthesis is genuinely hard. But they may be available by extraction or semi-synthesis, which the SA algorithm does not consider.
- Peptides and macrocycles: SA scoring was not designed for peptide-like molecules or large macrocycles. These compound classes have specialized synthesis strategies (solid-phase peptide synthesis, ring-closing metathesis) that the fragment-based algorithm does not capture.
- Reagent availability: A molecule might score well on SA but require a reagent that is out of stock, restricted, or prohibitively expensive. SA scores do not account for supply chain realities.
- Scale-up considerations: A reaction that works at milligram scale in a research lab may fail at kilogram scale in manufacturing. SA scores reflect bench-scale feasibility only.
Integrating SA Scoring into Your Drug Discovery Pipeline
The most effective drug discovery pipelines check synthetic accessibility at multiple stages. Here is how SA scoring fits into a complete computational workflow.
Stage 1: Virtual Library Filtering
Before running expensive docking or binding affinity predictions, filter your virtual library by SA score. Remove anything above 5.0 (or 4.0 for conservative programs). This can eliminate 30–50% of candidates, dramatically reducing downstream computation costs.
Stage 2: Post-Generation Triage
After running a generative model like REINVENT4, immediately score all output molecules for SA. Generative models sometimes produce exotic structures that score well on binding objectives but are synthetic nightmares. Catch these before investing in further profiling.
Stage 3: Lead Optimization Guardrails
During lead optimization, medicinal chemists propose analogs to improve potency, selectivity, or ADMET properties. Each proposed analog should be checked against SA thresholds to ensure the optimization is not drifting toward unsynthesizable chemical space. Set an SA ceiling and flag any designs that exceed it.
Stage 4: Candidate Selection
When choosing which compounds to advance to synthesis, use SA scores alongside predicted activity, ADMET profiles, and intellectual property landscape. A compound with slightly lower predicted affinity but an SA score of 2.0 may be a better investment than one with marginally better affinity and an SA score of 5.5 – because you will have it in hand weeks sooner.
The Molecular Design Lab on SciRouter
SciRouter's Molecular Design Lab provides a visual interface that integrates SA scoring directly into the molecular design workflow. Generate molecules, view their SA scores alongside other properties, and filter interactively without writing any code.
The lab displays SA scores with color-coded indicators: green for easy (below 3.0), yellow for moderate (3.0–5.0), and red for difficult (above 5.0). You can sort and filter by SA score, combine it with Lipinski filters, and export your shortlisted molecules for further analysis or synthesis ordering.
For programmatic access, the same data is available through the REST API and Python SDK. Whether you prefer a graphical interface or a scripted pipeline, SciRouter gives you fast, reliable SA scoring without any local software installation.
Next Steps
Synthetic accessibility scoring is one piece of a larger molecular evaluation toolkit. Combine it with Molecular Properties for drug-likeness assessment, ADMET Prediction for safety and pharmacokinetic profiling, and Molecule Generator to create novel candidates that are optimized for both activity and synthesizability from the start.
To learn more about the molecular property calculations that complement SA scoring, see our Lipinski Rule of Five Calculator guide or the ADMET Prediction Explained deep dive.
Sign up for a free SciRouter API key and start checking synthetic accessibility today. With 500 free API calls per month, you can screen entire virtual libraries before committing a single dollar to synthesis.