What does “zero-shot annotation” actually mean?

It means assigning cell-type labels without training a classifier on your dataset. A pretrained foundation model embeds each cell, and labels come from nearest-neighbor lookups against a reference atlas plus marker-gene signature scoring.

How reliable are zero-shot labels compared to supervised classifiers?

For well-represented cell types in common tissues, zero-shot labels from a strong foundation model land within a few percentage points of a purpose-built classifier. The gap widens for rare or tissue-specific types where the reference atlas is thin.

Do I have to know the expected cell types in advance?

No. The reference atlas provides a built-in vocabulary of cell types, so the model will return the closest matches even if you had no prior expectations. That said, reviewing the top marker genes is always a good sanity check.

What is a sensible confidence threshold?

The Cell Atlas endpoint returns a confidence score per cell along with the top three candidate labels. A common convention is to treat anything above 0.7 as a confident call, 0.4 to 0.7 as ambiguous and worth manual review, and below 0.4 as low-confidence.

Can I run this on my laptop?

Yes, because the heavy lifting happens on the hosted GPU. Your laptop just posts the sparse expression matrix and receives labels back, so any machine that can send an HTTPS request works.

Do I still need normalization and QC before calling the API?

Keep your standard QC — remove empty droplets, filter out dying cells, drop doublets. But the rank-based representation used by Geneformer means you can skip heavy normalization and batch correction steps before annotation.

Annotate Your Single-Cell Data Without Training a Model (2026 Guide)

Cell-type annotation used to be the slowest part of any scRNA-seq project. You would cluster, score marker genes, argue with your collaborators about which markers were canonical, hand-label each cluster, and then redo it all when the next batch of data arrived. Foundation models have flipped that workflow on its head. In 2026 you can get reasonable cell-type labels for a fresh dataset in a single API call, with no training loop in sight.

This guide walks through how zero-shot annotation actually works, what the confidence scores mean, when to trust them, and how to run the full pipeline through SciRouter's Cell Atlas without touching a GPU.

Note

Zero-shot annotation does not replace careful biology. It replaces the mechanical parts of labeling. You still have to review marker genes, sanity-check rare populations, and follow up anything surprising with wet-lab validation.

How zero-shot annotation works

Three ingredients combine to make zero-shot annotation reliable:

A pretrained foundation model. Geneformer or scGPT embeds each cell into a latent space where biologically similar cells are close together. This space was learned from tens of millions of cells during pretraining, so it already encodes most common cell types.
A labeled reference atlas. A curated atlas of pre-embedded cells with human-reviewed labels provides the lookup table. When a new cell arrives, its nearest neighbors in the reference drive the label vote.
Marker-gene signature scoring. Each candidate label comes with a canonical marker set. The endpoint scores your cell against those markers as a second, independent signal, and the agreement between embedding-based and marker-based scores becomes the confidence score.

What the API actually returns

The Cell Atlas annotation endpoint takes a sparse expression matrix plus a gene list and returns, per cell:

The top-ranked cell-type label.
Up to three runner-up labels with their scores.
A confidence value between 0 and 1.
The top marker genes that drove the call, so you can sanity-check the biology.

Everything is returned in a single response. No polling, no async jobs, no waiting hours for a classifier to train.

A minimal Python example

Here is what a zero-shot annotation call looks like in practice. The example assumes you already have a sparse matrix in CSR format and a list of gene symbols.

python

import requests
import numpy as np
from scipy.sparse import csr_matrix

API_URL = "https://scirouter-gateway-production.up.railway.app/v1/singlecell/annotate"
API_KEY = "sk-sci-your-api-key-here"

# X is a (cells, genes) sparse expression matrix in CSR format
# gene_ids is a list of gene symbols matching X columns
def annotate_cells(X: csr_matrix, gene_ids: list[str]):
    payload = {
        "model": "geneformer",
        "genes": gene_ids,
        "matrix": {
            "indptr": X.indptr.tolist(),
            "indices": X.indices.tolist(),
            "data": X.data.tolist(),
            "shape": list(X.shape),
        },
        "reference_atlas": "human-core-2026",
    }
    headers = {"Authorization": f"Bearer {API_KEY}"}
    r = requests.post(API_URL, json=payload, headers=headers, timeout=300)
    r.raise_for_status()
    return r.json()

result = annotate_cells(X, gene_ids)
for cell_id, call in enumerate(result["calls"][:5]):
    print(
        f"cell {cell_id}: {call['label']} "
        f"(confidence {call['confidence']:.2f})"
    )

That is the whole thing. One request, one response, labels in hand.

Reading the confidence score

The confidence score is the agreement between the embedding-based nearest-neighbor vote and the marker-gene signature score. High agreement gives you a confident call, low agreement flags a cell that needs review.

Above 0.7 — trust and move on

These are confident calls. The embedding and marker-gene signals agree. For common cell types in well-studied tissues this will be the bulk of your dataset.

0.4 to 0.7 — review manually

These are the interesting cells. Look at the top three candidate labels and the marker genes that drove the call. Often you will find transitional states, doublets you missed at QC, or rare subtypes that split a canonical category.

Below 0.4 — low confidence

These are cells the model is not sure about. Options: relabel as “unknown” and leave them out of downstream analysis, or treat them as candidate novel populations and cluster them separately.

Note

A surprisingly useful workflow: keep every annotation plus its confidence, and use confidence as a weight in downstream statistical tests. Low-confidence cells contribute less to effect estimates without being dropped entirely.

When zero-shot falls short

Zero-shot annotation is not a cure-all. Three failure modes are worth knowing about:

Rare or under-represented populations. If a cell type was not in the reference atlas, there is nothing for the model to match against. Confidence scores will be low, but the top label may still be misleading.
Non-human species. Foundation models are predominantly trained on human data. Mouse and other organisms work via ortholog mapping, but expect some loss of resolution.
Disease-specific states. Tumor cells, exhausted T cells, and other disease-altered states may map to the closest healthy reference rather than a disease-specific label. Review the marker gene output carefully for these cases.

Combining zero-shot with your own labels

If you have some labeled data from a previous study, you do not have to choose between zero-shot and supervised approaches. A common pattern is to run zero-shot first, manually review the uncertain cells, and then use the resulting labels as a warm start for a small classification head if you need sharper decision boundaries.

Workflow tips

Run QC first. Zero-shot does not fix bad cells. Drop empty droplets and high-mito cells before calling the API.
Batch by donor, not by technical lot. The rank- based foundation model representation is robust to chemistry differences, so grouping by biological batch is usually fine.
Always eyeball the markers. Even a confident call deserves a glance at the top marker genes the model surfaced.
Save the confidence column. It is one of the most useful artifacts for later QC, and costs nothing to keep.

Running the workflow end to end

The Cell Atlas workspace wraps all of this in a browser-friendly UI. You can upload a 10x .h5ad-style matrix, pick a foundation model, and get an annotated output back in seconds. For scripted workflows, the same API backs the Geneformer and scGPT tool pages.

Bottom line

Zero-shot annotation has quietly become the default way to label a new scRNA-seq dataset. It is fast, it is cheap, it scales to millions of cells, and it frees you up to focus on the biology instead of the bookkeeping. Start there, review the uncertain cells, and only reach for a supervised classifier when you need the extra precision.

Try zero-shot annotation in Cell Atlas →

Annotate Single-Cell Data Without Training a Model (2026 Guide)

How zero-shot annotation works

What the API actually returns

A minimal Python example

Reading the confidence score

Above 0.7 — trust and move on

0.4 to 0.7 — review manually

Below 0.4 — low confidence

When zero-shot falls short

Combining zero-shot with your own labels

Workflow tips

Running the workflow end to end

Bottom line

Frequently Asked Questions

What does “zero-shot annotation” actually mean?

How reliable are zero-shot labels compared to supervised classifiers?

Do I have to know the expected cell types in advance?

What is a sensible confidence threshold?

Can I run this on my laptop?

Do I still need normalization and QC before calling the API?

Related Tools

Geneformer — Single-Cell Foundation Model

More in the Single-Cell Omics Series

What is Geneformer? The Single-Cell Foundation Model Explained

scGPT vs Geneformer vs scFoundation: Single-Cell Foundation Models Compared

Cell Type Prediction from Sparse Expression Matrices: A Tutorial

Try this yourself