SciRouterVet
Open my portal

Public benchmarks · R&D engine performance

How the engine actually scores.

Three benchmarks — two small-molecule (TDC ADMET + MoleculeNet), one for the literature-confirmation engine itself (PubMed Q&A consistency). Methodology and the full audit trail are published alongside the scores. Failures and gaps are surfaced, not hidden.

v1 shows the benchmark cards with placeholder scores. The actual runners land in P7-F follow-up — TDC + MoleculeNet require local held-out evaluation; PubMed Q&A consistency needs the literature- confirmation engine to accumulate enough dossiers to measure against the expert key.

TDC ADMET (Caco-2, BBB, hERG, AMES)

Awaiting v1 run

Four small-molecule ADMET tasks from the Therapeutics Data Commons. Tests the gateway's `predict_admet` primitive against held-out TDC sets.

Metric

Mean AUROC

SciRouter

Reference baseline

0.82 (TDC reference)

Benchmark runner lands in P7-F follow-up. Will publish per-task AUROCs + failure cases (where the primitive is wrong) alongside the headline number.

Source: Therapeutics Data Commons

MoleculeNet (BACE / BBBP / ToxCast)

Awaiting v1 run

Three small-molecule activity prediction tasks from MoleculeNet. Lower bound for the `generate_molecules` + `mol_properties` pipeline.

Metric

ROC-AUC (mean over 3 splits)

SciRouter

Reference baseline

0.74 (MoleculeNet GNN ref)

Same runner pattern as TDC. Per-task breakdowns + failed predictions surfaced in detail page.

Source: MoleculeNet

PubMed Q&A consistency

Awaiting v1 run

100 cross-species oncology questions; the literature-confirmation engine builds a dossier per question, scored vs an expert-curated answer key. Eval-set composition: ≥15 veterinary-oncology questions from COTC + Morris Animal Foundation + VetCompass cohort studies.

Metric

Status agreement (confirmed / contested / unsupported)

SciRouter

Reference baseline

This benchmarks the literature-triangulation engine itself, not a primitive. Lands in P7-F follow-up after the daily-cron + paper-ingest hook produces enough dossiers to measure.

Source: SciRouter (internal expert-curated)

Vet-ADMET holdout (canine + feline PK)

Awaiting v1 run (50-compound holdout in P7-L follow-up)

50 compounds with published canine and/or feline PK parameters drawn from Morris Animal Foundation + COTC trial literature + veterinary pharmacology references. Tests how well `predict_admet` (human-trained underneath) predicts canine Vd / Cl / T₁/₂ / oral bioavailability vs. observed vet PK. THE comparative-oncology moat benchmark — honest about where human-trained ADMET breaks for vet medicine is the value prop.

Metric

Per-species R² + bias direction

SciRouter

Reference baseline

v1 ships the card + the methodology — surfacing per-species PK gaps with P-gp / UGT1A6 / CYP2D15 risk chips per compound. Runner lands in P7-L follow-up alongside the curated 50-compound holdout set.

Source: SciRouter (internal curated)

🐾 Species-specific PK risks the vet-ADMET holdout flags

When human-trained ADMET predictors are applied to canine or feline patients, a handful of species-specific metabolic differences cause systematic prediction errors. The vet-ADMET holdout above will surface these three risk classes as per-compound chips so it's clear where the human prediction can and can't be trusted for vet medicine.

  • canine P-gp

    MDR1 (canine P-glycoprotein)

    Collies + related herding breeds carry the MDR1-1Δ mutation; P-gp substrates accumulate in the CNS at toxic levels. Substrate examples: ivermectin, loperamide, vincristine, doxorubicin.

  • feline UGT1A6

    Feline UGT1A6 glucuronidation deficiency

    Cats glucuronidate phenolic compounds far slower than humans — acetaminophen / NSAID toxicity at 'safe' human doses. Affects any compound cleared primarily via UGT.

  • canine CYP2D15

    Canine CYP2D15 (homolog of human CYP2D6)

    Canine-specific isoform with different substrate affinities than CYP2D6. Drugs metabolized primarily by CYP2D6 in humans may have unpredictable clearance in dogs.

Benchmarks v1 · 3 standard + 1 vet-specific holdout · placeholders with methodology. Real scores publish as runners land. Both passes and failures included — no cherry-picking is a hard product rule.