Performance Data

API Benchmarks

Published P50 and P95 latency for every SciRouter endpoint. Measured from API gateway to response, excluding network transit.

P50 (median)

P95 (tail)

Warm GPU — excludes cold start

Chemistry

5 endpoints

EndpointComputeP50P95

Molecular Properties

CPU

45ms

120ms

Format Conversion

CPU

30ms

85ms

Similarity Search

CPU

55ms

150ms

Substructure Search

CPU

60ms

180ms

Batch Properties (10 mol)

CPU

200ms

450ms

Pharma

1 endpoints

EndpointComputeP50P95

ADMET Predictions

CPU

350ms

800ms

Proteins

5 endpoints

EndpointComputeP50P95

ESMFold (100 aa)

Async job — cold start adds ~30s

GPU A100

15s

BioReason-Pro

Async job

GPU

12s

25s

Sequence Alignment

CPU

150ms

400ms

Pocket Detection

CPU

800ms

2000ms

UniProt Annotation

External API dependency

CPU

500ms

1200ms

Docking

3 endpoints

EndpointComputeP50P95

DiffDock

Async job — cold start adds ~45s

GPU A100

25s

60s

Boltz-2 Complex

Async job — cold start adds ~60s

GPU A100 80GB

45s

120s

Chai-1 Complex

Async job — cold start adds ~60s

GPU A100 80GB

50s

130s

Design

3 endpoints

EndpointComputeP50P95

ProteinMPNN

Async job

GPU A24

15s

Stability Prediction

CPU

200ms

500ms

Solubility Prediction

CPU

180ms

450ms

Antibodies

2 endpoints

EndpointComputeP50P95

ImmuneBuilder

Async job

GPU A24

10s

25s

AntiFold CDR Design

Async job

GPU A24

20s

Generation

2 endpoints

EndpointComputeP50P95

REINVENT4 MolGen

Async job

GPU A24

15s

40s

Synthesis Check

CPU

100ms

250ms

Labs

4 endpoints

EndpointComputeP50P95

Drug Discovery Pipeline

Multi-model pipeline

Mixed

90s

180s

Protein Engineering

Multi-model pipeline

Mixed

60s

150s

Antibody Discovery

Multi-model pipeline

Mixed

75s

160s

Molecular Design

Multi-model pipeline

Mixed

70s

150s

GPU Fleet

NVIDIA A100 80GB for heavy inference (ESMFold, Boltz-2, Chai-1, DiffDock). A24 / A5000 for lighter models (ProteinMPNN, ImmuneBuilder, AntiFold, REINVENT4). Hosted on RunPod serverless.

API Gateway

FastAPI on Railway with auto-scaling. PostgreSQL for state, Redis for rate limiting and caching. Sub-50ms overhead for CPU endpoints.

Cold Starts

GPU models use serverless workers. First request after idle may add 30-60s. Pro and Agentic tiers get priority GPU queue for faster warm-up.

Methodology

Latencies are measured server-side from request receipt to response send, excluding network transit time. GPU benchmarks assume a warm worker (model already loaded in VRAM).

P50 = median latency (50th percentile). P95 = tail latency (95th percentile). Benchmarks are collected over a rolling 7-day window from production traffic.

CPU endpoints (Chemistry, ADMET, alignment) run on the API gateway itself. GPU endpoints dispatch to RunPod serverless workers via async job queue.

Want to test latency yourself? Try the API with a free account.