SciRouterVet
Open my portal

Public benchmarks · R&D engine performance

How the engine actually scores.

Three benchmarks — two small-molecule (TDC ADMET + MoleculeNet), one for the literature-confirmation engine itself (PubMed Q&A consistency). Methodology and the full audit trail are published alongside the scores. Failures and gaps are surfaced, not hidden.

v1 shows the benchmark cards with placeholder scores. The actual runners land in P7-F follow-up — TDC + MoleculeNet require local held-out evaluation; PubMed Q&A consistency needs the literature- confirmation engine to accumulate enough dossiers to measure against the expert key.

TDC ADMET (Caco-2, BBB, hERG, AMES)

Awaiting v1 run

Four small-molecule ADMET tasks from the Therapeutics Data Commons. Tests the gateway's `predict_admet` primitive against held-out TDC sets.

Metric

Mean AUROC

SciRouter

Reference baseline

0.82 (TDC reference)

Benchmark runner lands in P7-F follow-up. Will publish per-task AUROCs + failure cases (where the primitive is wrong) alongside the headline number.

Source: Therapeutics Data Commons

MoleculeNet (BACE / BBBP / ToxCast)

Awaiting v1 run

Three small-molecule activity prediction tasks from MoleculeNet. Lower bound for the `generate_molecules` + `mol_properties` pipeline.

Metric

ROC-AUC (mean over 3 splits)

SciRouter

Reference baseline

0.74 (MoleculeNet GNN ref)

Same runner pattern as TDC. Per-task breakdowns + failed predictions surfaced in detail page.

Source: MoleculeNet

PubMed Q&A consistency

Awaiting v1 run

100 cross-species oncology questions; the literature-confirmation engine builds a dossier per question, scored vs an expert-curated answer key.

Metric

Status agreement (confirmed / contested / unsupported)

SciRouter

Reference baseline

This benchmarks the literature-triangulation engine itself, not a primitive. Lands in P7-F follow-up after the daily-cron + paper-ingest hook produces enough dossiers to measure.

Source: SciRouter (internal expert-curated)

Benchmarks v1 · placeholders with methodology. Real scores publish as runners land. Both passes and failures included — no cherry-picking is a hard product rule.