ChemistryCPU1 credits

ChemBERTa — SMILES Embeddings

Generate 384-dim molecular embeddings from SMILES strings

Convert any SMILES string into a dense 384-dimensional vector using the ChemBERTa molecular language model. Drop-in replacement for molecular fingerprints in ML pipelines — more expressive, captures substructure context, and works across all drug-like chemistry. Ships with a cosine-similarity endpoint for drop-in Tanimoto replacement.

$0.01
per API call
1
credits per call
/v1/chemistry/embed
API endpoint

Features

384-dimensional molecular embeddings
Works on any valid SMILES string
CPU-only — no GPU needed
Deterministic hash fallback if the local model isn't loaded
Cosine similarity endpoint for drop-in Tanimoto replacement
L2-normalized vectors — ready for cosine similarity

Quick Start

ChemBERTa — Python Examplepython
import requests

API_KEY = "sk-sci-your-key-here"
url = "https://scirouter.ai/v1/chemistry/embed"

response = requests.post(
    url,
    json={"sequence": "CC(=O)OC1=CC=CC=C1C(=O)O"},  # aspirin
    headers={"Authorization": f"Bearer {API_KEY}"}
)
data = response.json()["data"]
print(f"Dimension: {data['dimension']}")  # 384
print(f"Model: {data['model']}")          # molecule_chemberta
print(f"Backend: {data['backend']}")      # cpu or hash
print(f"First 5: {data['embedding'][:5]}")

# Compute similarity between two molecules
sim = requests.post(
    "https://scirouter.ai/v1/chemistry/embed/similarity",
    json={
        "sequence_a": "CC(=O)OC1=CC=CC=C1C(=O)O",  # aspirin
        "sequence_b": "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"  # ibuprofen
    }
).json()["data"]
print(f"Aspirin vs Ibuprofen: {sim['similarity']:.3f}")

Use Cases

1

Molecular similarity search at scale

2

Clustering compound libraries by scaffold

3

Transfer learning for custom ADMET / property models

4

Replacing Morgan fingerprints with context-aware embeddings

Start Using ChemBERTa

500 free credits every month. No credit card required.