Performance Data

API Benchmarks

Published P50 and P95 latency for every SciRouter endpoint. Measured from API gateway to response, excluding network transit.

P50 (median)
P95 (tail)
Warm GPU — excludes cold start

Chemistry

5 endpoints
Molecular Properties
CPU
45ms
120ms
Format Conversion
CPU
30ms
85ms
Similarity Search
CPU
55ms
150ms
Substructure Search
CPU
60ms
180ms
Batch Properties (10 mol)
CPU
200ms
450ms

Pharma

1 endpoints
ADMET Predictions
CPU
350ms
800ms

Proteins

5 endpoints
ESMFold (100 aa)

Async job — cold start adds ~30s

GPU A100
8s
15s
BioReason-Pro

Async job

GPU
12s
25s
Sequence Alignment
CPU
150ms
400ms
Pocket Detection
CPU
800ms
2000ms
UniProt Annotation

External API dependency

CPU
500ms
1200ms

Docking

3 endpoints
DiffDock

Async job — cold start adds ~45s

GPU A100
25s
60s
Boltz-2 Complex

Async job — cold start adds ~60s

GPU A100 80GB
45s
120s
Chai-1 Complex

Async job — cold start adds ~60s

GPU A100 80GB
50s
130s

Design

3 endpoints
ProteinMPNN

Async job

GPU A24
6s
15s
Stability Prediction
CPU
200ms
500ms
Solubility Prediction
CPU
180ms
450ms

Antibodies

2 endpoints
ImmuneBuilder

Async job

GPU A24
10s
25s
AntiFold CDR Design

Async job

GPU A24
8s
20s

Generation

2 endpoints
REINVENT4 MolGen

Async job

GPU A24
15s
40s
Synthesis Check
CPU
100ms
250ms

Labs

4 endpoints
Drug Discovery Pipeline

Multi-model pipeline

Mixed
90s
180s
Protein Engineering

Multi-model pipeline

Mixed
60s
150s
Antibody Discovery

Multi-model pipeline

Mixed
75s
160s
Molecular Design

Multi-model pipeline

Mixed
70s
150s

GPU Fleet

NVIDIA A100 80GB for heavy inference (ESMFold, Boltz-2, Chai-1, DiffDock). A24 / A5000 for lighter models (ProteinMPNN, ImmuneBuilder, AntiFold, REINVENT4). Hosted on RunPod serverless.

API Gateway

FastAPI on Railway with auto-scaling. PostgreSQL for state, Redis for rate limiting and caching. Sub-50ms overhead for CPU endpoints.

Cold Starts

GPU models use serverless workers. First request after idle may add 30-60s. Pro and Agentic tiers get priority GPU queue for faster warm-up.

Methodology

Latencies are measured server-side from request receipt to response send, excluding network transit time. GPU benchmarks assume a warm worker (model already loaded in VRAM).

P50 = median latency (50th percentile). P95 = tail latency (95th percentile). Benchmarks are collected over a rolling 7-day window from production traffic.

CPU endpoints (Chemistry, ADMET, alignment) run on the API gateway itself. GPU endpoints dispatch to RunPod serverless workers via async job queue.

Want to test latency yourself? Try the API with a free account.