The RDKit Installation Problem
If you have ever tried to install RDKit, you know the pain. RDKit is the gold standard open-source toolkit for cheminformatics – it powers molecular property calculations, substructure searches, fingerprinting, and more across thousands of research labs. But installing it is a notorious hurdle.
RDKit is written in C++ with Python bindings. It does not install with a simple pip install rdkit. You typically need conda, a specific Python version, and platform-dependent C++ libraries. The installation can take 10–30 minutes, breaks frequently across OS updates, and conflicts with other scientific Python packages. On Apple Silicon Macs, Windows machines without Visual Studio, or cloud environments without root access, the installation can be outright impossible without Docker.
For many tasks – particularly calculating standard molecular properties from SMILES strings – you do not need RDKit installed locally at all. You need the computation that RDKit performs, not RDKit itself. That is exactly what an API provides.
One API Call to Calculate Everything
SciRouter's /v1/chemistry/properties endpoint takes a SMILES string and returns all standard molecular descriptors in a single call. No installation, no imports, no conda environments. Here is how it works:
import requests
API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# Calculate properties for aspirin
response = requests.post(f"{BASE}/chemistry/properties",
headers=HEADERS,
json={"smiles": "CC(=O)Oc1ccccc1C(=O)O"})
props = response.json()
print(f"Molecular Weight: {props['molecular_weight']:.2f} g/mol")
print(f"LogP: {props['logp']:.2f}")
print(f"TPSA: {props['tpsa']:.2f} Ų")
print(f"H-Bond Donors: {props['hbd']}")
print(f"H-Bond Acceptors: {props['hba']}")
print(f"Rotatable Bonds: {props['rotatable_bonds']}")
print(f"Lipinski Compliant: {props['lipinski_pass']}")Molecular Weight: 180.16 g/mol
LogP: 1.31
TPSA: 63.60 Ų
H-Bond Donors: 1
H-Bond Acceptors: 4
Rotatable Bonds: 3
Lipinski Compliant: TrueThat is it. Three lines of meaningful code (import, request, parse) give you the same property calculations that would otherwise require setting up an entire RDKit environment. The response time is typically under 200 milliseconds.
Understanding the Properties
Each property returned by the API has specific meaning in the context of drug design. Here is what each one tells you and why it matters:
Molecular Weight (MW)
The sum of atomic weights for all atoms in the molecule. Drug-like small molecules typically fall between 150 and 500 g/mol. Higher molecular weight generally means lower oral absorption and cell permeability. Most approved oral drugs are under 500.
LogP (Lipophilicity)
The logarithm of the octanol-water partition coefficient. It measures how much a molecule prefers oil (lipid membranes) over water. A LogP between 1 and 3 is ideal for oral drugs. Below 0, the molecule is too hydrophilic to cross cell membranes. Above 5, it is too lipophilic and will accumulate in fatty tissue, leading to toxicity and poor solubility.
TPSA (Topological Polar Surface Area)
The surface area contributed by nitrogen, oxygen, and their attached hydrogens. TPSA correlates with membrane permeability. For oral absorption, TPSA should be below 140 angstroms squared. For blood-brain barrier penetration (CNS drugs), it should be below 90. The sweet spot for most oral drugs is 60–120.
HBD and HBA (Hydrogen Bond Donors and Acceptors)
Hydrogen bond donors are NH and OH groups. Acceptors are lone pairs on N and O atoms. Each hydrogen bond a molecule can form with water makes it harder to desolvate for membrane crossing. Lipinski's rules set the limits at 5 donors and 10 acceptors.
Rotatable Bonds
The number of freely rotating single bonds (excluding terminal bonds and ring bonds). More rotatable bonds means more conformational flexibility, which increases the entropic penalty of binding. Oral drugs typically have 10 or fewer rotatable bonds.
Lipinski's Rule of Five
In 1997, Christopher Lipinski at Pfizer analyzed thousands of orally active drugs and identified four property ranges that 90% of them fell within. These became the Rule of Five, named because each cutoff is a multiple of five:
- Molecular Weight ≤ 500 g/mol
- LogP ≤ 5
- Hydrogen Bond Donors ≤ 5
- Hydrogen Bond Acceptors ≤ 10
A molecule that violates two or more of these rules is unlikely to be orally bioavailable. The API returns a lipinski_pass boolean that checks all four criteria for you. Note that passing Lipinski does not make a molecule a good drug – it is a necessary but not sufficient condition. Many molecules pass Lipinski but fail for other reasons (toxicity, metabolic instability, poor selectivity).
Batch Processing Multiple Molecules
Real screening workflows involve hundreds or thousands of molecules, not just one. The API supports batch requests where you send multiple SMILES strings and get all properties back in a single call:
import requests
API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# A small compound library
compounds = [
{"name": "Aspirin", "smiles": "CC(=O)Oc1ccccc1C(=O)O"},
{"name": "Ibuprofen", "smiles": "CC(C)Cc1ccc(C(C)C(=O)O)cc1"},
{"name": "Celecoxib", "smiles": "Cc1ccc(-c2cc(C(F)(F)F)nn2-c2ccc(S(N)(=O)=O)cc2)cc1"},
{"name": "Metformin", "smiles": "CN(C)C(=N)NC(=N)N"},
{"name": "Atorvastatin", "smiles": "CC(C)c1n(CC[C@@H](O)C[C@@H](O)CC(=O)O)c(c2ccccc2)c(C(=O)Nc2ccccc2)c1-c1ccc(F)cc1"},
]
# Send batch request
smiles_list = [c["smiles"] for c in compounds]
response = requests.post(f"{BASE}/chemistry/properties",
headers=HEADERS,
json={"smiles": smiles_list})
results = response.json()["results"]
# Display as a table
print(f"{'Name':<15} {'MW':>8} {'LogP':>6} {'TPSA':>7} {'HBD':>4} {'HBA':>4} {'Lipinski':>9}")
print("-" * 60)
for compound, props in zip(compounds, results):
print(f"{compound['name']:<15} "
f"{props['molecular_weight']:>8.1f} "
f"{props['logp']:>6.2f} "
f"{props['tpsa']:>7.1f} "
f"{props['hbd']:>4} "
f"{props['hba']:>4} "
f"{'PASS' if props['lipinski_pass'] else 'FAIL':>9}")Name MW LogP TPSA HBD HBA Lipinski
------------------------------------------------------------
Aspirin 180.2 1.31 63.6 1 4 PASS
Ibuprofen 206.3 3.50 37.3 1 2 PASS
Celecoxib 381.4 3.53 86.2 1 7 PASS
Metformin 129.2 -1.36 103.7 3 5 PASS
Atorvastatin 558.6 5.39 111.8 4 7 FAILAtorvastatin (Lipitor) fails Lipinski – its molecular weight is 558 and its LogP is 5.39. Yet it is one of the best-selling drugs in history. This illustrates why Lipinski is a guideline, not a law. Atorvastatin is actively transported into hepatocytes by OATP transporters, bypassing the passive diffusion that Lipinski's rules model.
Building a Compound Screening Script
Let's put it all together. Here is a complete script that reads a CSV of molecules, calculates properties via the API, filters by drug-likeness criteria, and writes the results to a new CSV:
import requests
import csv
API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
def screen_compounds(input_csv: str, output_csv: str,
max_mw=500, max_logp=5, max_hbd=5,
max_hba=10, max_tpsa=140):
"""Read compounds from CSV, filter by drug-likeness, write results."""
# Read input CSV (expects columns: name, smiles)
with open(input_csv) as f:
reader = csv.DictReader(f)
compounds = list(reader)
print(f"Loaded {len(compounds)} compounds from {input_csv}")
# Process in batches of 100 (API limit)
all_results = []
for i in range(0, len(compounds), 100):
batch = compounds[i:i+100]
smiles_batch = [c["smiles"] for c in batch]
resp = requests.post(f"{BASE}/chemistry/properties",
headers=HEADERS,
json={"smiles": smiles_batch})
batch_results = resp.json()["results"]
for compound, props in zip(batch, batch_results):
if "error" in props:
print(f" Skipping {compound['name']}: {props['error']}")
continue
# Apply drug-likeness filters
passes = (
props["molecular_weight"] <= max_mw
and props["logp"] <= max_logp
and props["hbd"] <= max_hbd
and props["hba"] <= max_hba
and props["tpsa"] <= max_tpsa
)
all_results.append({
"name": compound["name"],
"smiles": compound["smiles"],
"mw": f"{props['molecular_weight']:.1f}",
"logp": f"{props['logp']:.2f}",
"tpsa": f"{props['tpsa']:.1f}",
"hbd": props["hbd"],
"hba": props["hba"],
"rotatable_bonds": props["rotatable_bonds"],
"lipinski": "PASS" if props["lipinski_pass"] else "FAIL",
"screen_pass": "PASS" if passes else "FAIL",
})
print(f" Processed batch {i//100 + 1} "
f"({min(i+100, len(compounds))}/{len(compounds)})")
# Write results
passed = [r for r in all_results if r["screen_pass"] == "PASS"]
with open(output_csv, "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=all_results[0].keys())
writer.writeheader()
writer.writerows(all_results)
print(f"\nResults: {len(passed)}/{len(all_results)} compounds "
f"passed screening")
print(f"Written to {output_csv}")
# Run the screen
screen_compounds("my_library.csv", "screened_results.csv")Beyond Lipinski: Modern Drug-Likeness Criteria
While Lipinski's Rule of Five is the most widely known drug-likeness filter, modern medicinal chemistry uses additional criteria that provide more nuanced filtering:
Veber's Rules (Oral Bioavailability)
Daniel Veber at GSK found that two properties predict oral bioavailability in rats better than Lipinski alone: TPSA ≤ 140 angstroms squared, and rotatable bonds ≤ 10. These are already in the API response, so you can add them to your screening filters.
Lead-Likeness (Fragment-Based Screening)
For lead optimization, tighter criteria are useful. Leads should have room to grow during optimization, so the starting point should be smaller and simpler: MW ≤ 350, LogP ≤ 3, rotatable bonds ≤ 7. This gives medicinal chemists room to add functional groups without exceeding drug-like space.
CNS Drug Space
Drugs targeting the central nervous system must cross the blood-brain barrier, which is far more restrictive than intestinal absorption. CNS drugs typically require MW ≤ 400, LogP between 1 and 3, TPSA ≤ 90, HBD ≤ 3, and no more than 7 rotatable bonds.
def apply_filters(props: dict, target: str = "oral") -> bool:
"""Apply drug-likeness filters based on target route."""
if target == "oral":
# Lipinski + Veber
return (
props["molecular_weight"] <= 500
and props["logp"] <= 5
and props["hbd"] <= 5
and props["hba"] <= 10
and props["tpsa"] <= 140
and props["rotatable_bonds"] <= 10
)
elif target == "cns":
# CNS-penetrant criteria
return (
props["molecular_weight"] <= 400
and 1 <= props["logp"] <= 3
and props["hbd"] <= 3
and props["hba"] <= 7
and props["tpsa"] <= 90
and props["rotatable_bonds"] <= 7
)
elif target == "lead":
# Lead-like (room to grow)
return (
props["molecular_weight"] <= 350
and props["logp"] <= 3
and props["rotatable_bonds"] <= 7
)
return FalseCombining Properties with Similarity Search
Property calculations tell you about individual molecules. Combining them with similarity search lets you find analogs of promising compounds in your library. Here is how to use both endpoints together:
import requests
API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# Reference compound: celecoxib
reference = "Cc1ccc(-c2cc(C(F)(F)F)nn2-c2ccc(S(N)(=O)=O)cc2)cc1"
# Your compound library (list of SMILES)
library = [
"Cc1ccc(-c2cc(C(F)(F)F)nn2-c2ccc(S(=O)(=O)N)cc2)cc1F",
"CC(=O)Oc1ccccc1C(=O)O",
"Cc1ccc(-c2cc(CF)nn2-c2ccc(S(N)(=O)=O)cc2)cc1",
"O=C(O)c1ccccc1O",
"Cc1ccc(-c2cc(C(F)(F)F)nn2-c2ccc(S(N)(=O)=O)cc2)c(Cl)c1",
]
# Step 1: Find similar compounds (Tanimoto > 0.6)
sim_resp = requests.post(f"{BASE}/chemistry/similarity",
headers=HEADERS,
json={"reference": reference, "targets": library,
"threshold": 0.6})
similar = sim_resp.json()["results"]
similar_smiles = [s["smiles"] for s in similar]
print(f"Found {len(similar_smiles)} analogs above 0.6 similarity")
# Step 2: Calculate properties for analogs only
if similar_smiles:
props_resp = requests.post(f"{BASE}/chemistry/properties",
headers=HEADERS,
json={"smiles": similar_smiles})
for sim, props in zip(similar, props_resp.json()["results"]):
status = "PASS" if props["lipinski_pass"] else "FAIL"
print(f" Tanimoto={sim['similarity']:.2f} "
f"MW={props['molecular_weight']:.0f} "
f"LogP={props['logp']:.1f} "
f"Lipinski={status}")Next Steps
You can now calculate molecular properties for any compound from a single SMILES string, screen entire libraries against drug-likeness criteria, and combine property calculations with similarity search – all without installing RDKit or managing conda environments.
To go deeper into molecular representations, read our SMILES notation guide. To predict ADMET properties for your top candidates, see the ADMET prediction guide. And to explore individual tools interactively:
- Molecular Properties – Calculate descriptors for any SMILES string
- Molecule Similarity – Find analogs using Tanimoto fingerprint similarity
- ADMET Prediction – Predict absorption, metabolism, and toxicity profiles
Ready to profile your compounds? Open the Compound Profiler Studio or get a free API key to start building with the API directly.