ChemistryChemistry

Calculate Molecular Properties from SMILES (No RDKit Needed)

Skip the painful RDKit installation. Calculate molecular weight, LogP, TPSA, and drug-likeness from SMILES strings with one API call. Includes batch processing examples.

Ryan Bethencourt
April 5, 2026
8 min read

The RDKit Installation Problem

If you have ever tried to install RDKit, you know the pain. RDKit is the gold standard open-source toolkit for cheminformatics – it powers molecular property calculations, substructure searches, fingerprinting, and more across thousands of research labs. But installing it is a notorious hurdle.

RDKit is written in C++ with Python bindings. It does not install with a simple pip install rdkit. You typically need conda, a specific Python version, and platform-dependent C++ libraries. The installation can take 10–30 minutes, breaks frequently across OS updates, and conflicts with other scientific Python packages. On Apple Silicon Macs, Windows machines without Visual Studio, or cloud environments without root access, the installation can be outright impossible without Docker.

For many tasks – particularly calculating standard molecular properties from SMILES strings – you do not need RDKit installed locally at all. You need the computation that RDKit performs, not RDKit itself. That is exactly what an API provides.

Note
If you need advanced RDKit features like custom fingerprints, reaction SMARTS, or 3D conformer generation, a local installation is still worthwhile. But for molecular property calculations, drug-likeness screening, and format conversion, the API approach saves hours of setup time and works identically across all platforms.

One API Call to Calculate Everything

SciRouter's /v1/chemistry/properties endpoint takes a SMILES string and returns all standard molecular descriptors in a single call. No installation, no imports, no conda environments. Here is how it works:

Calculate molecular properties from SMILES
import requests

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# Calculate properties for aspirin
response = requests.post(f"{BASE}/chemistry/properties",
    headers=HEADERS,
    json={"smiles": "CC(=O)Oc1ccccc1C(=O)O"})

props = response.json()
print(f"Molecular Weight:    {props['molecular_weight']:.2f} g/mol")
print(f"LogP:                {props['logp']:.2f}")
print(f"TPSA:                {props['tpsa']:.2f} Ų")
print(f"H-Bond Donors:       {props['hbd']}")
print(f"H-Bond Acceptors:    {props['hba']}")
print(f"Rotatable Bonds:     {props['rotatable_bonds']}")
print(f"Lipinski Compliant:  {props['lipinski_pass']}")
Output
Molecular Weight:    180.16 g/mol
LogP:                1.31
TPSA:                63.60 Ų
H-Bond Donors:       1
H-Bond Acceptors:    4
Rotatable Bonds:     3
Lipinski Compliant:  True

That is it. Three lines of meaningful code (import, request, parse) give you the same property calculations that would otherwise require setting up an entire RDKit environment. The response time is typically under 200 milliseconds.

Understanding the Properties

Each property returned by the API has specific meaning in the context of drug design. Here is what each one tells you and why it matters:

Molecular Weight (MW)

The sum of atomic weights for all atoms in the molecule. Drug-like small molecules typically fall between 150 and 500 g/mol. Higher molecular weight generally means lower oral absorption and cell permeability. Most approved oral drugs are under 500.

LogP (Lipophilicity)

The logarithm of the octanol-water partition coefficient. It measures how much a molecule prefers oil (lipid membranes) over water. A LogP between 1 and 3 is ideal for oral drugs. Below 0, the molecule is too hydrophilic to cross cell membranes. Above 5, it is too lipophilic and will accumulate in fatty tissue, leading to toxicity and poor solubility.

TPSA (Topological Polar Surface Area)

The surface area contributed by nitrogen, oxygen, and their attached hydrogens. TPSA correlates with membrane permeability. For oral absorption, TPSA should be below 140 angstroms squared. For blood-brain barrier penetration (CNS drugs), it should be below 90. The sweet spot for most oral drugs is 60–120.

HBD and HBA (Hydrogen Bond Donors and Acceptors)

Hydrogen bond donors are NH and OH groups. Acceptors are lone pairs on N and O atoms. Each hydrogen bond a molecule can form with water makes it harder to desolvate for membrane crossing. Lipinski's rules set the limits at 5 donors and 10 acceptors.

Rotatable Bonds

The number of freely rotating single bonds (excluding terminal bonds and ring bonds). More rotatable bonds means more conformational flexibility, which increases the entropic penalty of binding. Oral drugs typically have 10 or fewer rotatable bonds.

Lipinski's Rule of Five

In 1997, Christopher Lipinski at Pfizer analyzed thousands of orally active drugs and identified four property ranges that 90% of them fell within. These became the Rule of Five, named because each cutoff is a multiple of five:

  • Molecular Weight ≤ 500 g/mol
  • LogP ≤ 5
  • Hydrogen Bond Donors ≤ 5
  • Hydrogen Bond Acceptors ≤ 10

A molecule that violates two or more of these rules is unlikely to be orally bioavailable. The API returns a lipinski_pass boolean that checks all four criteria for you. Note that passing Lipinski does not make a molecule a good drug – it is a necessary but not sufficient condition. Many molecules pass Lipinski but fail for other reasons (toxicity, metabolic instability, poor selectivity).

Tip
Lipinski's rules were derived from orally administered small molecules. They do not apply to injectable biologics, antibodies, natural products, or molecules designed for intracellular targets with active transport mechanisms. Use them as guidelines, not absolute filters.

Batch Processing Multiple Molecules

Real screening workflows involve hundreds or thousands of molecules, not just one. The API supports batch requests where you send multiple SMILES strings and get all properties back in a single call:

Batch property calculation
import requests

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# A small compound library
compounds = [
    {"name": "Aspirin",      "smiles": "CC(=O)Oc1ccccc1C(=O)O"},
    {"name": "Ibuprofen",    "smiles": "CC(C)Cc1ccc(C(C)C(=O)O)cc1"},
    {"name": "Celecoxib",    "smiles": "Cc1ccc(-c2cc(C(F)(F)F)nn2-c2ccc(S(N)(=O)=O)cc2)cc1"},
    {"name": "Metformin",    "smiles": "CN(C)C(=N)NC(=N)N"},
    {"name": "Atorvastatin", "smiles": "CC(C)c1n(CC[C@@H](O)C[C@@H](O)CC(=O)O)c(c2ccccc2)c(C(=O)Nc2ccccc2)c1-c1ccc(F)cc1"},
]

# Send batch request
smiles_list = [c["smiles"] for c in compounds]
response = requests.post(f"{BASE}/chemistry/properties",
    headers=HEADERS,
    json={"smiles": smiles_list})

results = response.json()["results"]

# Display as a table
print(f"{'Name':<15} {'MW':>8} {'LogP':>6} {'TPSA':>7} {'HBD':>4} {'HBA':>4} {'Lipinski':>9}")
print("-" * 60)
for compound, props in zip(compounds, results):
    print(f"{compound['name']:<15} "
          f"{props['molecular_weight']:>8.1f} "
          f"{props['logp']:>6.2f} "
          f"{props['tpsa']:>7.1f} "
          f"{props['hbd']:>4} "
          f"{props['hba']:>4} "
          f"{'PASS' if props['lipinski_pass'] else 'FAIL':>9}")
Output
Name                  MW   LogP    TPSA  HBD  HBA  Lipinski
------------------------------------------------------------
Aspirin            180.2   1.31    63.6    1    4      PASS
Ibuprofen          206.3   3.50    37.3    1    2      PASS
Celecoxib          381.4   3.53    86.2    1    7      PASS
Metformin          129.2  -1.36   103.7    3    5      PASS
Atorvastatin       558.6   5.39   111.8    4    7      FAIL

Atorvastatin (Lipitor) fails Lipinski – its molecular weight is 558 and its LogP is 5.39. Yet it is one of the best-selling drugs in history. This illustrates why Lipinski is a guideline, not a law. Atorvastatin is actively transported into hepatocytes by OATP transporters, bypassing the passive diffusion that Lipinski's rules model.

Building a Compound Screening Script

Let's put it all together. Here is a complete script that reads a CSV of molecules, calculates properties via the API, filters by drug-likeness criteria, and writes the results to a new CSV:

Screen a CSV compound library
import requests
import csv

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

def screen_compounds(input_csv: str, output_csv: str,
                     max_mw=500, max_logp=5, max_hbd=5,
                     max_hba=10, max_tpsa=140):
    """Read compounds from CSV, filter by drug-likeness, write results."""

    # Read input CSV (expects columns: name, smiles)
    with open(input_csv) as f:
        reader = csv.DictReader(f)
        compounds = list(reader)

    print(f"Loaded {len(compounds)} compounds from {input_csv}")

    # Process in batches of 100 (API limit)
    all_results = []
    for i in range(0, len(compounds), 100):
        batch = compounds[i:i+100]
        smiles_batch = [c["smiles"] for c in batch]

        resp = requests.post(f"{BASE}/chemistry/properties",
            headers=HEADERS,
            json={"smiles": smiles_batch})
        batch_results = resp.json()["results"]

        for compound, props in zip(batch, batch_results):
            if "error" in props:
                print(f"  Skipping {compound['name']}: {props['error']}")
                continue

            # Apply drug-likeness filters
            passes = (
                props["molecular_weight"] <= max_mw
                and props["logp"] <= max_logp
                and props["hbd"] <= max_hbd
                and props["hba"] <= max_hba
                and props["tpsa"] <= max_tpsa
            )

            all_results.append({
                "name": compound["name"],
                "smiles": compound["smiles"],
                "mw": f"{props['molecular_weight']:.1f}",
                "logp": f"{props['logp']:.2f}",
                "tpsa": f"{props['tpsa']:.1f}",
                "hbd": props["hbd"],
                "hba": props["hba"],
                "rotatable_bonds": props["rotatable_bonds"],
                "lipinski": "PASS" if props["lipinski_pass"] else "FAIL",
                "screen_pass": "PASS" if passes else "FAIL",
            })

        print(f"  Processed batch {i//100 + 1} "
              f"({min(i+100, len(compounds))}/{len(compounds)})")

    # Write results
    passed = [r for r in all_results if r["screen_pass"] == "PASS"]
    with open(output_csv, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=all_results[0].keys())
        writer.writeheader()
        writer.writerows(all_results)

    print(f"\nResults: {len(passed)}/{len(all_results)} compounds "
          f"passed screening")
    print(f"Written to {output_csv}")

# Run the screen
screen_compounds("my_library.csv", "screened_results.csv")
Tip
For large libraries (10,000+ compounds), add a small delay between batches to stay within rate limits. The free tier allows 500 credits per month. Each molecule in a batch request costs 1 credit, so 500 molecules per month on the free tier, or upgrade for higher throughput.

Beyond Lipinski: Modern Drug-Likeness Criteria

While Lipinski's Rule of Five is the most widely known drug-likeness filter, modern medicinal chemistry uses additional criteria that provide more nuanced filtering:

Veber's Rules (Oral Bioavailability)

Daniel Veber at GSK found that two properties predict oral bioavailability in rats better than Lipinski alone: TPSA ≤ 140 angstroms squared, and rotatable bonds ≤ 10. These are already in the API response, so you can add them to your screening filters.

Lead-Likeness (Fragment-Based Screening)

For lead optimization, tighter criteria are useful. Leads should have room to grow during optimization, so the starting point should be smaller and simpler: MW ≤ 350, LogP ≤ 3, rotatable bonds ≤ 7. This gives medicinal chemists room to add functional groups without exceeding drug-like space.

CNS Drug Space

Drugs targeting the central nervous system must cross the blood-brain barrier, which is far more restrictive than intestinal absorption. CNS drugs typically require MW ≤ 400, LogP between 1 and 3, TPSA ≤ 90, HBD ≤ 3, and no more than 7 rotatable bonds.

Multi-criteria screening filters
def apply_filters(props: dict, target: str = "oral") -> bool:
    """Apply drug-likeness filters based on target route."""

    if target == "oral":
        # Lipinski + Veber
        return (
            props["molecular_weight"] <= 500
            and props["logp"] <= 5
            and props["hbd"] <= 5
            and props["hba"] <= 10
            and props["tpsa"] <= 140
            and props["rotatable_bonds"] <= 10
        )
    elif target == "cns":
        # CNS-penetrant criteria
        return (
            props["molecular_weight"] <= 400
            and 1 <= props["logp"] <= 3
            and props["hbd"] <= 3
            and props["hba"] <= 7
            and props["tpsa"] <= 90
            and props["rotatable_bonds"] <= 7
        )
    elif target == "lead":
        # Lead-like (room to grow)
        return (
            props["molecular_weight"] <= 350
            and props["logp"] <= 3
            and props["rotatable_bonds"] <= 7
        )
    return False

Combining Properties with Similarity Search

Property calculations tell you about individual molecules. Combining them with similarity search lets you find analogs of promising compounds in your library. Here is how to use both endpoints together:

Find drug-like analogs of a reference compound
import requests

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# Reference compound: celecoxib
reference = "Cc1ccc(-c2cc(C(F)(F)F)nn2-c2ccc(S(N)(=O)=O)cc2)cc1"

# Your compound library (list of SMILES)
library = [
    "Cc1ccc(-c2cc(C(F)(F)F)nn2-c2ccc(S(=O)(=O)N)cc2)cc1F",
    "CC(=O)Oc1ccccc1C(=O)O",
    "Cc1ccc(-c2cc(CF)nn2-c2ccc(S(N)(=O)=O)cc2)cc1",
    "O=C(O)c1ccccc1O",
    "Cc1ccc(-c2cc(C(F)(F)F)nn2-c2ccc(S(N)(=O)=O)cc2)c(Cl)c1",
]

# Step 1: Find similar compounds (Tanimoto > 0.6)
sim_resp = requests.post(f"{BASE}/chemistry/similarity",
    headers=HEADERS,
    json={"reference": reference, "targets": library,
          "threshold": 0.6})

similar = sim_resp.json()["results"]
similar_smiles = [s["smiles"] for s in similar]
print(f"Found {len(similar_smiles)} analogs above 0.6 similarity")

# Step 2: Calculate properties for analogs only
if similar_smiles:
    props_resp = requests.post(f"{BASE}/chemistry/properties",
        headers=HEADERS,
        json={"smiles": similar_smiles})

    for sim, props in zip(similar, props_resp.json()["results"]):
        status = "PASS" if props["lipinski_pass"] else "FAIL"
        print(f"  Tanimoto={sim['similarity']:.2f}  "
              f"MW={props['molecular_weight']:.0f}  "
              f"LogP={props['logp']:.1f}  "
              f"Lipinski={status}")

Next Steps

You can now calculate molecular properties for any compound from a single SMILES string, screen entire libraries against drug-likeness criteria, and combine property calculations with similarity search – all without installing RDKit or managing conda environments.

To go deeper into molecular representations, read our SMILES notation guide. To predict ADMET properties for your top candidates, see the ADMET prediction guide. And to explore individual tools interactively:

Ready to profile your compounds? Open the Compound Profiler Studio or get a free API key to start building with the API directly.

Frequently Asked Questions

What molecular properties does the API return?

The /v1/chemistry/properties endpoint returns molecular weight, LogP (octanol-water partition coefficient), topological polar surface area (TPSA), number of hydrogen bond donors (HBD), number of hydrogen bond acceptors (HBA), number of rotatable bonds, ring count, aromatic ring count, and heavy atom count. It also returns a Lipinski Rule of Five pass/fail assessment.

How accurate are the calculated properties compared to local RDKit?

The API uses the same RDKit engine under the hood, so the results are identical to what you would get from a local RDKit installation. Molecular weight, TPSA, HBD, HBA, and rotatable bond counts are deterministic calculations on the molecular graph. LogP uses the Wildman-Crippen method, the same default in RDKit.

Can I process molecules in batch?

Yes. The /v1/chemistry/properties endpoint accepts a list of SMILES strings in the 'smiles' field. You can send up to 100 molecules per request. Each molecule in the batch counts as one API credit. Batch requests are significantly faster than individual calls due to reduced HTTP overhead.

What happens if I send an invalid SMILES string?

The API validates all SMILES inputs using RDKit's MolFromSmiles parser before processing. Invalid SMILES return a 422 error with a descriptive message. In batch mode, valid molecules are processed normally and invalid ones are flagged in the response with an error field, so one bad SMILES does not invalidate the entire batch.

Is there a size limit on molecules?

The API processes molecules up to 500 heavy atoms. This covers virtually all drug-like small molecules, most natural products, and many peptides. For larger biomolecules like proteins, use the protein-specific endpoints instead.

How does this compare to PubChem or ChEMBL lookups?

PubChem and ChEMBL are databases that store pre-computed properties for known compounds. If your molecule is in their database, a lookup is instant. But for novel compounds, analogs, or custom libraries, there is nothing to look up — you need to calculate properties from the structure. The SciRouter API calculates on demand for any valid SMILES string, known or novel.

Try this yourself

500 free credits. No credit card required.