TxGemma is Google's therapeutics-focused large language model, and it is one of the most interesting open releases in the drug-discovery AI space this year. It is built on the Gemma base architecture, but instruction-tuned on a broad mixture of chemistry, pharmacology, and preclinical datasets. The result is an LLM that reasons about molecules the way a trained medicinal chemist would reason in a design review meeting, rather than the way a generic chat model might talk around the problem.
This post walks through what TxGemma is, what its three variants are good for, how it was trained, where it sits next to PaLM and Med-PaLM, and how you can start calling it through SciRouter's TxGemma tool.
The three TxGemma variants
TxGemma is released in three sizes, matching the Gemma family it was built on. Each size trades off inference cost against reasoning quality.
TxGemma 2B
The 2B variant is the smallest and fastest. It fits in 8 GB of GPU memory in 4-bit quantization and runs comfortably on a single consumer card. It is intended for inline hints, autocomplete-style assistants, and high-throughput batch screens where you want to ask one short question per molecule. It is not the model you want for multi-step reasoning about unfamiliar targets, but it is fine for “does this SMILES have a PAINS liability” style lookups.
TxGemma 9B
The 9B variant is the practical sweet spot for interactive use. With 4-bit quantization it fits in a 24 GB card, and it is fast enough to drive a chat loop without painful waits. It handles chained questions, preserves chemistry context across a conversation, and is the default we recommend for most developer workflows. In SciRouter benchmarks it answers 66 TDC-aligned tasks with accuracy close to the 27B variant at a fraction of the cost.
TxGemma 27B
The 27B variant is the heavyweight. It is the variant that Google reports produces the best results on hard reasoning benchmarks, particularly around retrosynthesis, clinical-stage prediction, and multi-hop questions that require pulling together information from multiple datasets. It is also the variant we recommend for batched offline analysis where you want the best answer and can afford the GPU time. It is less suited to latency-sensitive agent loops.
What TxGemma was trained on
The clearest way to understand TxGemma is to look at its training mixture. Google built an instruction-tuning corpus from the Therapeutic Data Commons, which packages dozens of public drug discovery datasets into a consistent format. TxGemma was trained on that corpus alongside natural-language reasoning traces that walk through the scientific logic of each task.
- ADMET endpoints. Absorption, distribution, metabolism, excretion, and toxicity datasets including solubility, Caco-2 permeability, CYP isoform inhibition, hERG, LD50, and BBB penetration.
- Binding and affinity. Target-ligand interaction datasets for protein-small molecule binding, including DAVIS, KIBA, and BindingDB subsets.
- Molecular properties. logP, QED, SAS, topological polar surface area, and many of the baseline descriptors that medicinal chemists reach for when profiling a molecule.
- Retrosynthesis. Single-step and multi-step retrosynthetic planning datasets that teach the model to propose reasonable disconnections.
- Clinical outcome prediction. Datasets that pair a drug with a trial outcome, which TxGemma uses to reason about developability and attrition risk.
The natural-language wrapping is important. Training a model on raw SMILES-to-number mappings gives you a property predictor. Training it on explanations of why a given property takes a given value gives you a reasoning model that can tell a chemist what to change and why.
TxGemma versus PaLM 2 and Med-PaLM
It is tempting to lump all of Google's biomedical LLMs together. Do not. They target different stages of the pipeline.
- PaLM 2 is the general-purpose base model. It has broad knowledge and will answer chemistry questions, but it has no specific chemistry training and will happily hallucinate SMILES strings or misread a named reaction.
- Med-PaLM 2 is a clinical and medical-knowledge model. It answers patient-facing and physician-facing questions about disease, diagnosis, and treatment. It was trained on USMLE questions and clinical literature. It is not a drug-discovery model.
- TxGemma lives at the preclinical and discovery end. It reasons about molecules, targets, ADMET, and therapeutic mechanisms. It does not replace Med-PaLM for clinical reasoning, and Med-PaLM does not replace TxGemma for chemistry.
A realistic therapeutic pipeline uses both kinds of models. Med-PaLM helps frame the clinical question. TxGemma helps design the molecule. Physical tools like Boltz-2 and DiffDock validate the hypotheses. A routing gateway like SciRouter makes it possible to chain all of them without rebuilding the plumbing every time.
Where TxGemma shines and where it does not
In day-to-day use the strengths and weaknesses of TxGemma become obvious fast.
Strengths
- Spotting structural liabilities in a proposed scaffold and explaining them.
- Recalling SAR for well-studied chemotypes and target classes.
- Reasoning about ADMET trade-offs when you swap a functional group.
- Proposing reasonable retrosynthetic disconnections at the level you'd get from a senior graduate student.
- Reading a cluster of numbers from a property panel and writing a short prose summary of what they mean together.
Weaknesses
- Quantitative binding free energy prediction. Use a physical or alchemical FEP tool instead.
- Answering questions about very new or proprietary targets that were not in the training data.
- Avoiding overconfidence. Like every LLM, it states wrong answers with the same tone as correct ones. Always cross-check with structure or assay data.
Calling TxGemma through SciRouter
SciRouter exposes TxGemma as a managed tool. You do not need to provision GPUs, quantize the weights, or manage chat history. You send a question and an optional molecular context and you get a structured answer.
A typical call looks like this: you send a SMILES string, a target name, and a natural-language question such as “is this compound likely to have hERG liability and why”. TxGemma returns its reasoning along with a short answer. If you want deeper quantitative validation, you can chain the same molecule into SciRouter's ADMET panel or Boltz-2 endpoint without leaving your script.
For agent use, TxGemma is also available through SciRouter's MCP server. Claude, GPT, and any other MCP-compatible agent can discover the tool, read its schema, and call it directly as part of a multi-step reasoning loop. See our tutorial on agentic drug discovery with TxGemma and MCP for a full walkthrough.
Bottom line
TxGemma is the first open-weight therapeutics LLM that feels genuinely useful inside a drug-discovery workflow. It is not a replacement for physical models, and it will not give you a clinical trial answer. What it will give you is a fast, chemistry-literate reasoning partner that you can layer on top of the structural and predictive tools you already use.