Three single-cell foundation models dominate the conversation right now: Geneformer, scGPT, and scFoundation. They all promise the same thing in principle — a reusable transformer backbone for scRNA-seq analysis — but they make meaningfully different design choices that affect how well they handle different tasks. This article lines them up side by side so you can pick the right tool for your project.
The short version: Geneformer is the strongest zero-shot generalist, scGPT wins on generative and cross-modal use cases, and scFoundation fights hardest to preserve absolute expression magnitudes. All three are available through the SciRouter Cell Atlas so you can run the same benchmark on your own data before committing.
1. Training data: how much and from where
Geneformer
Geneformer was pretrained on roughly 30 million human single cells assembled into the Genecorpus-30M dataset. The corpus spans dozens of tissues, donor states, and disease contexts, with heavy representation of developmental and immune cells. It is human-only and scRNA-seq-only.
scGPT
scGPT was trained on larger aggregated collections assembled from the CELLxGENE catalog and related repositories, with the published variants exceeding 30 million cells and later updates pushing further. The authors explicitly set out to build a generative backbone that could be extended to additional modalities, so the data story is broader rather than deeper.
scFoundation
scFoundation was pretrained on a corpus in the tens of millions of cells with a particular emphasis on preserving raw count information. The curation philosophy leans toward “keep the magnitude signal,” which matters for downstream drug-response tasks.
2. Input representation
This is where the three models diverge most sharply, and the choice propagates through everything else:
- Geneformer: rank tokens. Each cell becomes an ordered list of its top expressed genes, normalized against the median expression across the corpus. Library size is automatically controlled for.
- scGPT: binned expression. Genes are tokenized and paired with binned expression levels. A cell becomes a sequence of (gene, bin) pairs that the model can attend over, which preserves some magnitude information while still keeping the input discrete.
- scFoundation: continuous expression. The model accepts continuous expression values and uses a custom decoder designed to recover absolute counts. This is the most magnitude-preserving of the three.
3. Architecture and objective
All three models are transformer-based, but the training objectives differ:
Geneformer — masked language modeling
A BERT-style masked language model over ranked gene tokens. Pretraining predicts masked genes from the surrounding context, which forces the model to learn co-expression structure and gene dependencies.
scGPT — generative pretraining
scGPT uses a generative objective that predicts the next gene token or reconstructs masked cells, sometimes with an autoencoder-style reconstruction loss. This generative framing is what makes scGPT good at cross-modal extensions and at producing plausible synthetic cells.
scFoundation — count-aware regression
scFoundation uses a regression-style objective that tries to predict masked expression values rather than categorical tokens. This keeps the quantitative information in the loss function, which is why it tends to dominate on downstream tasks that need absolute levels.
4. Zero-shot capabilities
All three models support the basics: cell embedding, nearest-neighbor annotation, and batch integration. Where they pull ahead is in more specialized zero-shot tasks.
- Geneformer wins at in-silico perturbation. Its masked-gene pretraining objective transfers naturally to “what happens to the cell if I delete this gene?” queries. This is why it is the go-to model for target nomination workflows.
- scGPT wins at generation and integration. The generative objective makes it good at producing synthetic cells, denoising noisy inputs, and integrating across chemistries and donors.
- scFoundation wins at drug-response transfer. The count-aware pretraining preserves the quantitative fingerprint of perturbations, which transfers to drug-response and dose-prediction tasks.
5. Fine-tuning cost and complexity
A practical question for most teams is how expensive it is to adapt these models to a narrow downstream task.
- Geneformer. A frozen backbone plus a small linear or MLP head is often enough. Fine-tuning on a labeled dataset of a few hundred thousand cells can finish on a single GPU.
- scGPT. The generative objectives give you more to tune, which is both a feature and a cost. Expect more hyperparameter sensitivity and longer training runs.
- scFoundation. The regression-style head is straightforward to extend, but because the model is designed to track absolute levels, you may want more careful normalization upstream of the training loop.
6. Benchmarks, honestly
There is no single accepted benchmark suite for single-cell foundation models yet. Each paper reports wins on its own favored tasks. A reasonable rule of thumb from published comparisons and community bake-offs:
- On basic cell-type classification with plenty of labels, all three models land in the same ballpark as strong supervised baselines.
- On zero-shot cell-type transfer across datasets, Geneformer tends to be the most robust out of the box.
- On in-silico perturbation benchmarks, Geneformer leads on qualitative ranking of gene importance, while scFoundation is stronger when absolute expression changes matter.
- On generative tasks and cross-modal integration, scGPT has the most mature story.
7. Which one should you pick?
Pick Geneformer if…
- You need strong zero-shot cell-type and gene-importance outputs.
- You plan to run in-silico perturbation workflows.
- You want the simplest fine-tuning recipe (frozen backbone + head).
Pick scGPT if…
- You want to generate synthetic cells, denoise, or impute dropouts explicitly.
- You care about batch integration across many donors, chemistries, and tissues.
- You expect to extend to ATAC-seq or other non-RNA modalities down the road.
Pick scFoundation if…
- You need absolute expression magnitudes in the embedding — for example, for drug-response modeling.
- You want a regression-style head on top of the pretrained model.
- Your downstream task is about quantitative changes rather than categorical labels.
Running the comparison through SciRouter
The easiest way to evaluate these models on your own data is through the Cell Atlas workspace. Upload your expression matrix once, then rerun the same annotation and embedding job with each backbone. The tool pages for Geneformer and scGPT document the request and response schemas if you prefer to call the API directly from a script.
Bottom line
Geneformer, scGPT, and scFoundation are complementary models, not competitors. The right question is not “which one is best” but “which one fits this task?” Keep all three available behind a common API and switch between them based on what your current experiment needs.