Therapeutics LLMTherapeutics LLM

TxGemma Explained: Google's Therapeutics LLM for Drug Discovery

TxGemma is Google's therapeutics-specialized LLM with 2B, 9B, and 27B variants, pretrained on 66 drug-discovery tasks.

SciRouter Team
April 11, 2026
12 min read

TxGemma is Google's therapeutics-focused large language model, and it is one of the most interesting open releases in the drug-discovery AI space this year. It is built on the Gemma base architecture, but instruction-tuned on a broad mixture of chemistry, pharmacology, and preclinical datasets. The result is an LLM that reasons about molecules the way a trained medicinal chemist would reason in a design review meeting, rather than the way a generic chat model might talk around the problem.

This post walks through what TxGemma is, what its three variants are good for, how it was trained, where it sits next to PaLM and Med-PaLM, and how you can start calling it through SciRouter's TxGemma tool.

Note
TxGemma is a research assistant, not an oracle. Use it to generate and triage hypotheses, and always confirm quantitative claims with a physical or experimental model. SciRouter exposes it alongside Boltz-2, DiffDock, and ADMET predictors so you can close the loop.

The three TxGemma variants

TxGemma is released in three sizes, matching the Gemma family it was built on. Each size trades off inference cost against reasoning quality.

TxGemma 2B

The 2B variant is the smallest and fastest. It fits in 8 GB of GPU memory in 4-bit quantization and runs comfortably on a single consumer card. It is intended for inline hints, autocomplete-style assistants, and high-throughput batch screens where you want to ask one short question per molecule. It is not the model you want for multi-step reasoning about unfamiliar targets, but it is fine for “does this SMILES have a PAINS liability” style lookups.

TxGemma 9B

The 9B variant is the practical sweet spot for interactive use. With 4-bit quantization it fits in a 24 GB card, and it is fast enough to drive a chat loop without painful waits. It handles chained questions, preserves chemistry context across a conversation, and is the default we recommend for most developer workflows. In SciRouter benchmarks it answers 66 TDC-aligned tasks with accuracy close to the 27B variant at a fraction of the cost.

TxGemma 27B

The 27B variant is the heavyweight. It is the variant that Google reports produces the best results on hard reasoning benchmarks, particularly around retrosynthesis, clinical-stage prediction, and multi-hop questions that require pulling together information from multiple datasets. It is also the variant we recommend for batched offline analysis where you want the best answer and can afford the GPU time. It is less suited to latency-sensitive agent loops.

What TxGemma was trained on

The clearest way to understand TxGemma is to look at its training mixture. Google built an instruction-tuning corpus from the Therapeutic Data Commons, which packages dozens of public drug discovery datasets into a consistent format. TxGemma was trained on that corpus alongside natural-language reasoning traces that walk through the scientific logic of each task.

  • ADMET endpoints. Absorption, distribution, metabolism, excretion, and toxicity datasets including solubility, Caco-2 permeability, CYP isoform inhibition, hERG, LD50, and BBB penetration.
  • Binding and affinity. Target-ligand interaction datasets for protein-small molecule binding, including DAVIS, KIBA, and BindingDB subsets.
  • Molecular properties. logP, QED, SAS, topological polar surface area, and many of the baseline descriptors that medicinal chemists reach for when profiling a molecule.
  • Retrosynthesis. Single-step and multi-step retrosynthetic planning datasets that teach the model to propose reasonable disconnections.
  • Clinical outcome prediction. Datasets that pair a drug with a trial outcome, which TxGemma uses to reason about developability and attrition risk.

The natural-language wrapping is important. Training a model on raw SMILES-to-number mappings gives you a property predictor. Training it on explanations of why a given property takes a given value gives you a reasoning model that can tell a chemist what to change and why.

TxGemma versus PaLM 2 and Med-PaLM

It is tempting to lump all of Google's biomedical LLMs together. Do not. They target different stages of the pipeline.

  • PaLM 2 is the general-purpose base model. It has broad knowledge and will answer chemistry questions, but it has no specific chemistry training and will happily hallucinate SMILES strings or misread a named reaction.
  • Med-PaLM 2 is a clinical and medical-knowledge model. It answers patient-facing and physician-facing questions about disease, diagnosis, and treatment. It was trained on USMLE questions and clinical literature. It is not a drug-discovery model.
  • TxGemma lives at the preclinical and discovery end. It reasons about molecules, targets, ADMET, and therapeutic mechanisms. It does not replace Med-PaLM for clinical reasoning, and Med-PaLM does not replace TxGemma for chemistry.

A realistic therapeutic pipeline uses both kinds of models. Med-PaLM helps frame the clinical question. TxGemma helps design the molecule. Physical tools like Boltz-2 and DiffDock validate the hypotheses. A routing gateway like SciRouter makes it possible to chain all of them without rebuilding the plumbing every time.

Where TxGemma shines and where it does not

In day-to-day use the strengths and weaknesses of TxGemma become obvious fast.

Strengths

  • Spotting structural liabilities in a proposed scaffold and explaining them.
  • Recalling SAR for well-studied chemotypes and target classes.
  • Reasoning about ADMET trade-offs when you swap a functional group.
  • Proposing reasonable retrosynthetic disconnections at the level you'd get from a senior graduate student.
  • Reading a cluster of numbers from a property panel and writing a short prose summary of what they mean together.

Weaknesses

  • Quantitative binding free energy prediction. Use a physical or alchemical FEP tool instead.
  • Answering questions about very new or proprietary targets that were not in the training data.
  • Avoiding overconfidence. Like every LLM, it states wrong answers with the same tone as correct ones. Always cross-check with structure or assay data.
Warning
TxGemma's training data has a cutoff. Anything published after that date is invisible to the base model. Pair it with retrieval or with up-to-date structure tools when you need recent information.

Calling TxGemma through SciRouter

SciRouter exposes TxGemma as a managed tool. You do not need to provision GPUs, quantize the weights, or manage chat history. You send a question and an optional molecular context and you get a structured answer.

A typical call looks like this: you send a SMILES string, a target name, and a natural-language question such as “is this compound likely to have hERG liability and why”. TxGemma returns its reasoning along with a short answer. If you want deeper quantitative validation, you can chain the same molecule into SciRouter's ADMET panel or Boltz-2 endpoint without leaving your script.

For agent use, TxGemma is also available through SciRouter's MCP server. Claude, GPT, and any other MCP-compatible agent can discover the tool, read its schema, and call it directly as part of a multi-step reasoning loop. See our tutorial on agentic drug discovery with TxGemma and MCP for a full walkthrough.

Bottom line

TxGemma is the first open-weight therapeutics LLM that feels genuinely useful inside a drug-discovery workflow. It is not a replacement for physical models, and it will not give you a clinical trial answer. What it will give you is a fast, chemistry-literate reasoning partner that you can layer on top of the structural and predictive tools you already use.

Explore TxGemma on SciRouter →

Frequently Asked Questions

What is TxGemma?

TxGemma is an open-weight therapeutics language model released by Google as part of the Gemma family. It is fine-tuned from the general-purpose Gemma base models on a large mixture of drug-discovery datasets and reasoning traces, and it ships in 2B, 9B, and 27B parameter sizes. Unlike a general chat model, TxGemma is trained to answer structured questions about small molecules, proteins, ADMET properties, and therapeutic mechanisms.

How many drug discovery tasks does TxGemma cover?

Google reports that TxGemma was instruction-tuned across 66 distinct drug-discovery tasks drawn largely from the Therapeutic Data Commons (TDC). These span ADMET endpoints (absorption, permeability, metabolism, toxicity), target-ligand interaction classification, molecular property regression, retrosynthesis planning, and clinical trial outcome prediction. The 27B variant has the best reported coverage across those tasks.

How is TxGemma different from Med-PaLM?

Med-PaLM is a clinical and medical-knowledge model trained for patient-facing and physician-facing question answering. TxGemma is a therapeutics and drug-discovery model trained on chemistry, pharmacology, and preclinical endpoints. Med-PaLM reads medical literature and answers clinical questions. TxGemma reasons about molecules, binding, ADMET, and discovery pipelines. They address different stages of the overall drug-to-patient pipeline.

Can I use TxGemma commercially?

TxGemma is released under the Gemma Terms of Use, which allows commercial use with certain prohibited-use restrictions. You should review the official license before deploying it in a product. Note that the model is a research assistant, not an FDA-cleared medical device, and its outputs should be treated as hypothesis-generating rather than regulatory-grade.

Which TxGemma size should I start with?

For interactive experimentation and agentic workflows, the 9B variant is a solid default. It runs on a single 24 GB GPU in 4-bit and is fast enough for a chat loop. Use the 2B variant for latency-sensitive inline hints, and reserve the 27B variant for batch analysis and offline evaluation where you want the best reasoning quality.

How do I access TxGemma through SciRouter?

SciRouter exposes TxGemma as a managed tool in the gateway. You send a chemistry or therapeutics question to the TxGemma endpoint with your API key, and SciRouter routes it to the correct model variant on GPU and returns the answer. You can also reach the same tool through the MCP server so that Claude, GPT, and other agents can call it directly.

Is TxGemma good at chemistry or just language?

TxGemma is trained on a mixture of natural language and chemistry-specific representations such as SMILES and IUPAC. It will not replace a full-physics docking engine, but it is meaningfully better than a generic LLM at spotting structural liabilities, recalling SAR for common scaffolds, and reasoning about ADMET trade-offs. Pair it with physical models like Boltz-2 and DiffDock when you need quantitative predictions.

Try this yourself

500 free credits. No credit card required.