Generate 384-dim molecular embeddings from SMILES strings
Convert any SMILES string into a dense 384-dimensional vector using the ChemBERTa molecular language model. Drop-in replacement for molecular fingerprints in ML pipelines — more expressive, captures substructure context, and works across all drug-like chemistry. Ships with a cosine-similarity endpoint for drop-in Tanimoto replacement.
/v1/chemistry/embedimport requests
API_KEY = "sk-sci-your-key-here"
url = "https://scirouter.ai/v1/chemistry/embed"
response = requests.post(
url,
json={"sequence": "CC(=O)OC1=CC=CC=C1C(=O)O"}, # aspirin
headers={"Authorization": f"Bearer {API_KEY}"}
)
data = response.json()["data"]
print(f"Dimension: {data['dimension']}") # 384
print(f"Model: {data['model']}") # molecule_chemberta
print(f"Backend: {data['backend']}") # cpu or hash
print(f"First 5: {data['embedding'][:5]}")
# Compute similarity between two molecules
sim = requests.post(
"https://scirouter.ai/v1/chemistry/embed/similarity",
json={
"sequence_a": "CC(=O)OC1=CC=CC=C1C(=O)O", # aspirin
"sequence_b": "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O" # ibuprofen
}
).json()["data"]
print(f"Aspirin vs Ibuprofen: {sim['similarity']:.3f}")Molecular similarity search at scale
Clustering compound libraries by scaffold
Transfer learning for custom ADMET / property models
Replacing Morgan fingerprints with context-aware embeddings