Single-CellSingle-Cell Omics

scGPT vs Geneformer vs scFoundation: Single-Cell Foundation Models Compared

Comparison of the three leading single-cell foundation models. Training data, architecture, benchmarks, and when to use each.

SciRouter Team
April 11, 2026
12 min read

Three single-cell foundation models dominate the conversation right now: Geneformer, scGPT, and scFoundation. They all promise the same thing in principle — a reusable transformer backbone for scRNA-seq analysis — but they make meaningfully different design choices that affect how well they handle different tasks. This article lines them up side by side so you can pick the right tool for your project.

The short version: Geneformer is the strongest zero-shot generalist, scGPT wins on generative and cross-modal use cases, and scFoundation fights hardest to preserve absolute expression magnitudes. All three are available through the SciRouter Cell Atlas so you can run the same benchmark on your own data before committing.

Note
A warning up front: public benchmarks for single-cell foundation models are still immature. Different papers report numbers on different tasks, with different preprocessing. Take any bold “model X beats model Y” claim with appropriate salt.

1. Training data: how much and from where

Geneformer

Geneformer was pretrained on roughly 30 million human single cells assembled into the Genecorpus-30M dataset. The corpus spans dozens of tissues, donor states, and disease contexts, with heavy representation of developmental and immune cells. It is human-only and scRNA-seq-only.

scGPT

scGPT was trained on larger aggregated collections assembled from the CELLxGENE catalog and related repositories, with the published variants exceeding 30 million cells and later updates pushing further. The authors explicitly set out to build a generative backbone that could be extended to additional modalities, so the data story is broader rather than deeper.

scFoundation

scFoundation was pretrained on a corpus in the tens of millions of cells with a particular emphasis on preserving raw count information. The curation philosophy leans toward “keep the magnitude signal,” which matters for downstream drug-response tasks.

2. Input representation

This is where the three models diverge most sharply, and the choice propagates through everything else:

  • Geneformer: rank tokens. Each cell becomes an ordered list of its top expressed genes, normalized against the median expression across the corpus. Library size is automatically controlled for.
  • scGPT: binned expression. Genes are tokenized and paired with binned expression levels. A cell becomes a sequence of (gene, bin) pairs that the model can attend over, which preserves some magnitude information while still keeping the input discrete.
  • scFoundation: continuous expression. The model accepts continuous expression values and uses a custom decoder designed to recover absolute counts. This is the most magnitude-preserving of the three.

3. Architecture and objective

All three models are transformer-based, but the training objectives differ:

Geneformer — masked language modeling

A BERT-style masked language model over ranked gene tokens. Pretraining predicts masked genes from the surrounding context, which forces the model to learn co-expression structure and gene dependencies.

scGPT — generative pretraining

scGPT uses a generative objective that predicts the next gene token or reconstructs masked cells, sometimes with an autoencoder-style reconstruction loss. This generative framing is what makes scGPT good at cross-modal extensions and at producing plausible synthetic cells.

scFoundation — count-aware regression

scFoundation uses a regression-style objective that tries to predict masked expression values rather than categorical tokens. This keeps the quantitative information in the loss function, which is why it tends to dominate on downstream tasks that need absolute levels.

4. Zero-shot capabilities

All three models support the basics: cell embedding, nearest-neighbor annotation, and batch integration. Where they pull ahead is in more specialized zero-shot tasks.

  • Geneformer wins at in-silico perturbation. Its masked-gene pretraining objective transfers naturally to “what happens to the cell if I delete this gene?” queries. This is why it is the go-to model for target nomination workflows.
  • scGPT wins at generation and integration. The generative objective makes it good at producing synthetic cells, denoising noisy inputs, and integrating across chemistries and donors.
  • scFoundation wins at drug-response transfer. The count-aware pretraining preserves the quantitative fingerprint of perturbations, which transfers to drug-response and dose-prediction tasks.

5. Fine-tuning cost and complexity

A practical question for most teams is how expensive it is to adapt these models to a narrow downstream task.

  • Geneformer. A frozen backbone plus a small linear or MLP head is often enough. Fine-tuning on a labeled dataset of a few hundred thousand cells can finish on a single GPU.
  • scGPT. The generative objectives give you more to tune, which is both a feature and a cost. Expect more hyperparameter sensitivity and longer training runs.
  • scFoundation. The regression-style head is straightforward to extend, but because the model is designed to track absolute levels, you may want more careful normalization upstream of the training loop.

6. Benchmarks, honestly

There is no single accepted benchmark suite for single-cell foundation models yet. Each paper reports wins on its own favored tasks. A reasonable rule of thumb from published comparisons and community bake-offs:

  • On basic cell-type classification with plenty of labels, all three models land in the same ballpark as strong supervised baselines.
  • On zero-shot cell-type transfer across datasets, Geneformer tends to be the most robust out of the box.
  • On in-silico perturbation benchmarks, Geneformer leads on qualitative ranking of gene importance, while scFoundation is stronger when absolute expression changes matter.
  • On generative tasks and cross-modal integration, scGPT has the most mature story.
Warning
The only benchmark that really matters is the one you run on your own data. The cheapest way to do that is to point the same API at all three backbones and compare their outputs on a held-out slice of your project.

7. Which one should you pick?

Pick Geneformer if…

  • You need strong zero-shot cell-type and gene-importance outputs.
  • You plan to run in-silico perturbation workflows.
  • You want the simplest fine-tuning recipe (frozen backbone + head).

Pick scGPT if…

  • You want to generate synthetic cells, denoise, or impute dropouts explicitly.
  • You care about batch integration across many donors, chemistries, and tissues.
  • You expect to extend to ATAC-seq or other non-RNA modalities down the road.

Pick scFoundation if…

  • You need absolute expression magnitudes in the embedding — for example, for drug-response modeling.
  • You want a regression-style head on top of the pretrained model.
  • Your downstream task is about quantitative changes rather than categorical labels.

Running the comparison through SciRouter

The easiest way to evaluate these models on your own data is through the Cell Atlas workspace. Upload your expression matrix once, then rerun the same annotation and embedding job with each backbone. The tool pages for Geneformer and scGPT document the request and response schemas if you prefer to call the API directly from a script.

Bottom line

Geneformer, scGPT, and scFoundation are complementary models, not competitors. The right question is not “which one is best” but “which one fits this task?” Keep all three available behind a common API and switch between them based on what your current experiment needs.

Compare all three in Cell Atlas →

Frequently Asked Questions

Which single-cell foundation model is best overall?

There is no single winner. Geneformer is the strongest choice for gene-perturbation reasoning, scGPT excels at generative tasks and cross-batch integration, and scFoundation holds up best when you care about preserving absolute expression levels. A pilot on your own data is the only reliable way to pick.

Can I use all three from one API?

Yes. SciRouter exposes Geneformer, scGPT, and scFoundation as named backbones behind the same endpoint. You can rerun the same annotation or embedding job with a different model name and compare the outputs directly.

How do they differ in input representation?

Geneformer uses a rank-based token sequence of the top expressed genes. scGPT uses a binned-expression representation that keeps a discretized magnitude per gene. scFoundation operates on continuous expression values and tries to preserve dynamic range more faithfully. Those choices drive the rest of the design.

Do any of them handle multiple modalities?

scGPT has the most explicit multi-modal story so far, with published extensions for ATAC-seq and perturbation data. Geneformer and scFoundation are currently scRNA-seq focused, though adapter approaches can extend them.

Which one is cheapest to fine-tune?

Geneformer's classification-head recipe is the simplest and cheapest, since the backbone can usually be frozen. scGPT fine-tuning involves more moving parts because of its generative objectives. scFoundation sits somewhere in between.

Which one should I start with for a typical scRNA-seq project?

If you want cell-type calls, embeddings, and marker genes with minimal fuss, start with Geneformer through the Cell Atlas workspace. Once your workflow is stable, rerun the same steps with scGPT and scFoundation on a small subset to compare.

Try this yourself

500 free credits. No credit card required.