ImmunologyDarkScan

Neoantigen Prediction: Find Immunotherapy Targets from Tumor Mutations

Predict neoantigens from tumor mutations for personalized immunotherapy. TMB scoring, MHC binding prediction, and clinical trial context with hands-on API tutorial.

Ryan Bethencourt
April 8, 2026
10 min read

What Are Neoantigens?

Every cancer starts with mutations. As tumor cells divide and accumulate somatic DNA changes – point mutations, insertions, deletions, gene fusions – some of those mutations fall within protein-coding regions and alter the amino acid sequence of the resulting protein. When these mutated proteins are processed by the proteasome, the resulting peptide fragments are loaded onto MHC (Major Histocompatibility Complex) molecules and displayed on the cell surface. These mutant peptides are called neoantigens.

Neoantigens are, in a real sense, the Achilles heel of cancer. They are absent from every normal cell in the patient's body, which means the immune system has not developed tolerance to them. When a T cell encounters a neoantigen-MHC complex, it recognizes it as foreign and can mount a cytotoxic response against the tumor cell. This is the biological basis of cancer immunotherapy – the immune system can distinguish tumor cells from normal cells because of the neoantigens they display.

The clinical evidence is compelling. Patients whose tumors have high neoantigen loads (typically correlated with high tumor mutation burden, or TMB) respond better to immune checkpoint inhibitors like pembrolizumab (Keytruda) and nivolumab (Opdivo). The landmark CheckMate-227 trial showed that TMB greater than 10 mutations per megabase predicted response to nivolumab-ipilimumab in non-small cell lung cancer with a hazard ratio of 0.62 for overall survival. The more neoantigens a tumor displays, the more targets the immune system has to attack.

But checkpoint inhibitors are blunt instruments – they release the brakes on the immune system broadly, causing autoimmune side effects in 15 to 40% of patients. The next frontier is precision immunotherapy: identifying the specific neoantigens in a patient's tumor and designing vaccines or T cell therapies that target those neoantigens directly. This requires computational prediction of which mutant peptides will bind MHC molecules and be recognized by T cells.

Tumor Mutation Burden as a Biomarker

Tumor mutation burden (TMB) is the total count of somatic mutations per megabase of coding DNA. It varies enormously across cancer types. Melanoma and non-small cell lung cancer (NSCLC) have median TMBs of 10 to 15 mutations per megabase, driven by UV radiation and tobacco carcinogen exposure respectively. Microsatellite-instable (MSI-high) colorectal cancers have TMBs exceeding 40 mutations per megabase due to defective DNA mismatch repair. At the other end of the spectrum, pediatric cancers, prostate cancer, and pancreatic cancer typically have TMBs below 3 mutations per megabase.

TMB matters for neoantigen prediction because more mutations mean more potential neoantigens. A tumor with 300 non-synonymous mutations (roughly 10 mutations per megabase in a typical exome) might generate 10 to 30 high-confidence predicted neoantigens after filtering through MHC binding prediction and immunogenicity scoring. A tumor with 30 non-synonymous mutations might yield only 1 to 3 candidates – still potentially enough for a vaccine, but with less margin for error if some predictions do not validate experimentally.

The FDA approved TMB as a pan-cancer biomarker for pembrolizumab in 2020 (KEYNOTE-158 trial), using a cutoff of 10 or more mutations per megabase. This was a landmark decision that validated the link between mutational load, neoantigen abundance, and immunotherapy response. However, TMB alone is an imperfect biomarker – not all mutations produce immunogenic neoantigens, and some low-TMB tumors respond to immunotherapy through mechanisms unrelated to neoantigens (such as viral antigens in HPV-positive cancers).

Computational neoantigen prediction goes beyond simple TMB counting by asking which specific mutations actually produce peptides that bind MHC molecules, are presented on the cell surface, and can be recognized by the patient's T cell repertoire. This is the difference between counting bullets and counting the ones that hit the target.

The Neoantigen Prediction Pipeline

A complete neoantigen prediction pipeline transforms raw sequencing data into a ranked list of candidate neoantigens for vaccine or therapy design. The pipeline has five major steps, each with specific computational tools and biological considerations.

Step 1: Somatic Mutation Calling

The starting point is whole-exome or whole-genome sequencing of both the tumor and a matched normal sample (typically blood). Variant callers like Mutect2, Strelka2, or VarScan compare the two samples to identify somatic mutations – changes present in the tumor but absent from the germline. Non-synonymous mutations (those that change the amino acid sequence) are retained for neoantigen prediction. Typical output: 50 to 500 non-synonymous mutations depending on TMB.

Step 2: HLA Typing

MHC molecules are encoded by the HLA (Human Leukocyte Antigen) gene complex. Every person has a unique combination of HLA alleles (up to 6 HLA-I alleles: 2 each of HLA-A, HLA-B, and HLA-C; and multiple HLA-II alleles). Different HLA alleles bind different peptide motifs. HLA typing determines which alleles the patient carries, which determines which mutant peptides can be presented on their tumor cells. Tools like OptiType, HISAT-genotype, or HLA-HD perform HLA typing from the same sequencing data used for mutation calling.

Step 3: Peptide Generation

For each non-synonymous mutation, generate all possible peptide windows containing the mutated residue. For MHC-I prediction, generate 8-mer through 11-mer peptides (the size range that MHC-I can accommodate). For MHC-II, generate 13-mer through 25-mer peptides. A single point mutation typically produces 30 to 40 candidate peptides across all lengths and positions.

Step 4: MHC Binding Prediction

This is the core computational step. For each candidate peptide and each of the patient's HLA alleles, predict the binding affinity of the peptide-MHC complex. Tools like NetMHCpan 4.1, MHCflurry 2.0, and MixMHCpred use neural networks trained on experimental binding data to predict IC50 values (the peptide concentration needed for 50% inhibition of a reference peptide). A common threshold is IC50 below 500 nM for a weak binder and below 50 nM for a strong binder.

Step 5: Immunogenicity Scoring

MHC binding is necessary but not sufficient for immunogenicity. A peptide must also be processed by the proteasome, transported by TAP (Transporter associated with Antigen Processing), and recognized by T cells. Immunogenicity scoring models like PRIME, BigMHC, or DeepImmuno attempt to predict the full pipeline from peptide to T cell recognition. Additional features that improve prediction include the difference in MHC binding between the mutant and wild-type peptide (larger differences suggest stronger immune recognition), the mutation position within the peptide, and hydrophobicity at TCR-facing residues.

Note
The gold standard for neoantigen validation is experimental: stimulate patient T cells with predicted neoantigen peptides and measure IFN-gamma production or cytotoxic activity. Computational prediction typically achieves a positive predictive value of 10 to 30% – meaning 1 in 3 to 1 in 10 predicted neoantigens actually elicits a T cell response. Improving this ratio is an active area of research.

MHC-I vs. MHC-II Neoantigens

The immune system uses two distinct antigen presentation pathways, and understanding the difference is critical for neoantigen vaccine design.

MHC class I molecules are expressed on virtually all nucleated cells. They present short peptides (8 to 11 amino acids) derived from intracellular proteins to CD8+ cytotoxic T lymphocytes (CTLs). When a CTL recognizes a neoantigen on MHC-I, it directly kills the tumor cell through perforin and granzyme secretion. MHC-I neoantigens are the primary effectors of anti-tumor immunity and are the focus of most neoantigen vaccine efforts. HLA-A*02:01 is the most common HLA-I allele in Caucasian populations (frequency approximately 25 to 30%) and binds peptides with leucine or methionine at position 2 and valine or leucine at the C-terminus.

MHC class II molecules are expressed primarily on professional antigen-presenting cells (dendritic cells, macrophages, B cells). They present longer peptides (13 to 25 amino acids) to CD4+ helper T cells. While CD4+ T cells do not directly kill tumor cells in most cases, they provide essential support for CD8+ responses: they activate dendritic cells, promote CD8+ T cell priming, maintain CD8+ T cell memory, and recruit and activate macrophages. Clinical data from neoantigen vaccine trials increasingly shows that including MHC-II neoantigens improves the durability and breadth of the anti-tumor response.

The computational challenge differs between the two classes. MHC-I binding prediction is relatively mature, with models trained on extensive experimental binding data (over 500,000 binding measurements across common HLA-I alleles). MHC-II prediction is harder because the binding groove is open-ended (accommodating variable peptide lengths), the training data is smaller, and the set of HLA-II alleles is more diverse. Current MHC-II predictors (NetMHCIIpan 4.3, MixMHC2pred) achieve AUC values of 0.80 to 0.90, compared to 0.90 to 0.95 for MHC-I predictors.

Best practice for neoantigen vaccine design is to include both MHC-I and MHC-II epitopes. A vaccine with 10 to 20 predicted neoantigens typically includes 7 to 12 MHC-I epitopes (for direct CTL responses) and 3 to 8 MHC-II epitopes (for CD4+ helper support). This dual approach is used by BioNTech in their autogene cevumeran (BNT122) individualized neoantigen vaccine, which showed a 44% recurrence-free survival benefit in pancreatic cancer (Phase 1, published in Nature 2023).

Hands-On: Neoantigen Prediction with the SciRouter API

SciRouter provides API endpoints for the core computational steps of neoantigen prediction: peptide generation from mutant sequences, MHC binding prediction, and immunogenicity scoring. The following example walks through predicting neoantigens from a set of somatic mutations.

Predict neoantigens from tumor mutations
import os, requests

API_KEY = os.environ["SCIROUTER_API_KEY"]
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# Example: somatic mutations from a melanoma exome
# Each mutation: gene, wild-type peptide context, mutant amino acid, position
mutations = [
    {
        "gene": "BRAF",
        "wild_type_sequence": "LATEKSRWSGSHQFEQLS",
        "mutant_sequence":    "LATEKSRWSGSHQFEELS",
        "mutation": "Q612E",
    },
    {
        "gene": "NRAS",
        "wild_type_sequence": "MTEYKLVVVGAGGVGKSALT",
        "mutant_sequence":    "MTEYKLVVVGADGVGKSALT",
        "mutation": "G12D",
    },
    {
        "gene": "TP53",
        "wild_type_sequence": "VVRCPHERCTEGQFHRHSE",
        "mutant_sequence":    "VVRCPHERCTEGHFHRHSE",
        "mutation": "Q248H",
    },
]

# Patient HLA alleles (determined from sequencing)
hla_alleles = ["HLA-A*02:01", "HLA-A*24:02", "HLA-B*07:02",
               "HLA-B*44:02", "HLA-C*05:01", "HLA-C*07:02"]

# Predict MHC-I binding for all mutations
results = requests.post(f"{BASE}/immunology/neoantigen-predict", headers=HEADERS, json={
    "mutations": mutations,
    "hla_alleles": hla_alleles,
    "peptide_lengths": [8, 9, 10, 11],
    "binding_threshold": 500,   # IC50 in nM
    "include_wild_type": True,  # compare mutant vs WT binding
}).json()

print(f"Total candidate peptides evaluated: {results['total_peptides']}")
print(f"Predicted binders (IC50 < 500 nM): {results['num_binders']}")
print(f"Strong binders (IC50 < 50 nM): {results['num_strong_binders']}")

print("\n=== Top Predicted Neoantigens ===")
for neo in results["neoantigens"][:10]:
    print(f"\nGene: {neo['gene']} ({neo['mutation']})")
    print(f"  Peptide: {neo['peptide']}")
    print(f"  HLA: {neo['hla_allele']}")
    print(f"  Mutant IC50: {neo['mutant_ic50']:.1f} nM")
    print(f"  Wild-type IC50: {neo['wildtype_ic50']:.1f} nM")
    print(f"  DAI: {neo['differential_agretopicity']:.2f}")
    print(f"  Immunogenicity score: {neo['immunogenicity_score']:.3f}")

The output includes the Differential Agretopicity Index (DAI), which measures the ratio of MHC binding affinity between the mutant and wild-type peptide. A high DAI indicates that the mutation creates a peptide that binds MHC much more strongly than the wild-type version – a strong signal for immunogenicity because the immune system has not been tolerized to the mutant sequence.

For a complete vaccine design workflow, filter the predicted neoantigens by binding strength (IC50 below 100 nM), DAI (above 5-fold), and immunogenicity score (above a model-specific threshold). Then select 10 to 20 diverse peptides covering multiple HLA alleles and multiple mutations for inclusion in a personalized vaccine.

Clinical Context: Neoantigen Vaccines in the Clinic

As of early 2026, over 100 clinical trials testing neoantigen-based immunotherapies are registered on ClinicalTrials.gov. The approaches span mRNA vaccines, synthetic long peptide vaccines, dendritic cell vaccines loaded with neoantigen peptides, and adoptive T cell therapies targeting neoantigen-reactive T cells.

The most advanced program is BioNTech's autogene cevumeran (BNT122), a personalized mRNA vaccine encoding up to 20 patient-specific neoantigens. In a Phase 1 trial for resected pancreatic ductal adenocarcinoma (published in Nature, May 2023), patients who mounted T cell responses to the vaccine had significantly delayed recurrence (median recurrence-free survival not reached vs. 13.4 months in non-responders). BioNTech and Genentech are now conducting a Phase 2 trial (IMCODE-003) in melanoma combining the personalized vaccine with atezolizumab (an anti-PD-L1 checkpoint inhibitor).

Moderna's mRNA-4157 (V940) personalized cancer vaccine showed a 44% reduction in recurrence or death risk when combined with pembrolizumab in resected high-risk melanoma (KEYNOTE-942, Phase 2b). Based on these results, Moderna and Merck are advancing to a Phase 3 trial – the first randomized Phase 3 trial of a personalized neoantigen vaccine.

These trials demonstrate that neoantigen prediction pipelines like the one described in this article are no longer purely academic. They are driving clinical decisions about which peptides to include in patient-specific vaccines. The computational accuracy of neoantigen prediction directly affects vaccine efficacy – better predictions mean more immunogenic peptides in the vaccine, stronger T cell responses, and better clinical outcomes.

Combining Neoantigen Prediction with DarkScan

Standard neoantigen prediction focuses on the approximately 2% of the genome that encodes known proteins. But tumors also express antigens from the dark genome– cancer-testis antigens, endogenous retroviral proteins, and non-coding RNA-derived peptides that are normally silenced in adult tissues. SciRouter's DarkScan Studio extends neoantigen prediction to include these dark genome targets, dramatically expanding the pool of candidate vaccine antigens.

This is particularly valuable for low-TMB tumors where conventional neoantigens are scarce. A pancreatic cancer with only 30 non-synonymous mutations might yield 2 to 3 conventional neoantigens after MHC binding and immunogenicity filtering. But the same tumor may also express MAGE-A4, NY-ESO-1, and HERV-K-derived peptides that are strong MHC binders and highly immunogenic. By combining conventional neoantigen prediction with dark genome scanning, you can design a vaccine with 15 to 20 antigens even for low-TMB tumors.

Combine neoantigen and dark genome antigen prediction
# Predict conventional neoantigens from somatic mutations
neo_results = requests.post(f"{BASE}/immunology/neoantigen-predict",
    headers=HEADERS, json={
        "mutations": mutations,
        "hla_alleles": hla_alleles,
        "peptide_lengths": [8, 9, 10, 11],
        "binding_threshold": 500,
    }).json()

# Scan for dark genome antigens (cancer-testis, HERV, non-coding)
dark_results = requests.post(f"{BASE}/immunology/darkscan", headers=HEADERS, json={
    "cancer_type": "melanoma",
    "hla_alleles": hla_alleles,
    "include_cta": True,         # cancer-testis antigens
    "include_herv": True,        # endogenous retroviral antigens
    "include_noncoding": True,   # non-coding RNA-derived peptides
    "binding_threshold": 500,
}).json()

# Combine and rank all candidates
all_antigens = []

for neo in neo_results["neoantigens"]:
    all_antigens.append({
        "source": "somatic_mutation",
        "gene": neo["gene"],
        "peptide": neo["peptide"],
        "hla": neo["hla_allele"],
        "ic50": neo["mutant_ic50"],
        "score": neo["immunogenicity_score"],
    })

for dark in dark_results["antigens"]:
    all_antigens.append({
        "source": dark["antigen_class"],  # CTA, HERV, or noncoding
        "gene": dark["gene"],
        "peptide": dark["peptide"],
        "hla": dark["hla_allele"],
        "ic50": dark["ic50"],
        "score": dark["immunogenicity_score"],
    })

# Sort by immunogenicity score
all_antigens.sort(key=lambda x: x["score"], reverse=True)

print(f"Conventional neoantigens: {len(neo_results['neoantigens'])}")
print(f"Dark genome antigens: {len(dark_results['antigens'])}")
print(f"Total vaccine candidates: {len(all_antigens)}")

print("\n=== Top 20 Vaccine Candidates ===")
for i, ag in enumerate(all_antigens[:20]):
    print(f"{i+1}. [{ag['source']}] {ag['gene']} - {ag['peptide']}")
    print(f"   HLA: {ag['hla']}, IC50: {ag['ic50']:.0f} nM, "
          f"Score: {ag['score']:.3f}")

The combined pipeline produces a ranked list of vaccine candidates from both conventional and dark genome sources. For vaccine design, select the top 15 to 20 candidates ensuring coverage across multiple HLA alleles and a mix of CD8+ (MHC-I) and CD4+ (MHC-II) epitopes. This multi-source approach maximizes the breadth of the anti-tumor immune response and reduces the risk that any single antigen is lost due to immune editing.

From Predicted Neoantigens to Vaccine Design

Selecting the final set of neoantigens for a vaccine requires balancing several factors beyond raw immunogenicity score. First, include antigens presented by different HLA alleles to maximize the probability that at least some will be presented. Second, include both MHC-I and MHC-II epitopes for coordinated CD8+ and CD4+ T cell responses. Third, prioritize antigens from driver mutations (like BRAF V600E or KRAS G12D) that are essential for tumor survival, because the tumor cannot easily escape immune pressure by losing these mutations.

The vaccine modality affects antigen selection. mRNA vaccines (BioNTech/Moderna approach) encode the full mutant protein sequence or a concatenated string of neoantigen-containing peptide sequences. The patient's own cells then translate the mRNA, process the protein, and present the neoantigens on MHC molecules. This approach naturally generates both MHC-I and MHC-II epitopes from a single construct. Synthetic long peptide (SLP) vaccines deliver the peptides directly, requiring separate peptide synthesis for each neoantigen.

SciRouter's vaccine design API can take a ranked list of neoantigen peptides and generate the optimal mRNA construct, including codon optimization, UTR design, and sequence ordering to maximize expression of all included neoantigens. This bridges the gap between computational neoantigen prediction and physical vaccine manufacturing.

Next Steps

Neoantigen prediction is one component of a broader immunotherapy design toolkit. Use MHC Binding Prediction for peptide-MHC affinity calculations, Vaccine Design for mRNA construct optimization, and Neoantigen Pipeline for the complete end-to-end workflow from mutations to vaccine candidates.

For a deeper look at immunotherapy targets beyond conventional neoantigens, read our guide on scanning the dark genome for cancer targets. For mRNA vaccine design principles, see the mRNA vaccine design guide.

Sign up for a free SciRouter API key and start predicting neoantigens today. With 500 free credits per month and no bioinformatics infrastructure to manage, SciRouter is the fastest path from tumor sequencing data to personalized immunotherapy candidates.

Frequently Asked Questions

What is a neoantigen?

A neoantigen is a peptide derived from a somatic mutation in a tumor cell that is presented on the cell surface by MHC (Major Histocompatibility Complex) molecules. Because the mutation is unique to the tumor, the resulting peptide is foreign to the immune system and can be recognized by T cells. Neoantigens are the basis of personalized cancer immunotherapy, including neoantigen vaccines and adoptive T cell therapy. Unlike shared tumor antigens, neoantigens are patient-specific because every tumor has a unique mutational profile.

What is tumor mutation burden (TMB) and why does it matter?

Tumor mutation burden is the total number of somatic mutations per megabase of coding DNA in a tumor. High-TMB tumors (more than 10 mutations per megabase) include melanoma, non-small cell lung cancer, and microsatellite-instable colorectal cancer. These tumors generate more neoantigens and are more likely to respond to checkpoint immunotherapy. Low-TMB tumors like pancreatic and prostate cancer produce fewer neoantigens and typically have lower response rates to immune checkpoint inhibitors. TMB is increasingly used as a biomarker for immunotherapy patient selection.

What is the difference between MHC-I and MHC-II neoantigens?

MHC class I molecules present short peptides (8 to 11 amino acids) to CD8+ cytotoxic T cells, which directly kill tumor cells. MHC class II molecules present longer peptides (13 to 25 amino acids) to CD4+ helper T cells, which orchestrate the immune response by activating CD8+ T cells, B cells, and macrophages. Most neoantigen prediction pipelines focus on MHC-I because CD8+ T cells are the primary effectors of tumor killing. However, MHC-II neoantigens are increasingly recognized as important for durable anti-tumor immunity because CD4+ T cell help is required for sustained CD8+ responses.

How accurate is computational neoantigen prediction?

Current neoantigen prediction pipelines have high sensitivity (they identify most true neoantigens) but moderate specificity (they also predict many peptides that do not actually elicit immune responses). MHC binding prediction alone has an AUC of approximately 0.85 to 0.95 for predicting peptide-MHC binding. However, MHC binding is necessary but not sufficient for immunogenicity. Additional factors like peptide processing, transport, T cell receptor recognition, and the tumor microenvironment all affect whether a predicted neoantigen actually triggers an immune response. Experimental validation with T cell assays remains essential.

Can neoantigen vaccines treat any type of cancer?

Neoantigen vaccines are most promising for high-TMB cancers that generate abundant neoantigens, including melanoma, non-small cell lung cancer, bladder cancer, and microsatellite-instable cancers. For low-TMB cancers, fewer neoantigens are available, making vaccine design more challenging but not impossible. As of 2026, there are over 100 registered clinical trials testing neoantigen vaccines across cancer types. BioNTech and Moderna have reported positive Phase 2 results for personalized mRNA neoantigen vaccines in melanoma and pancreatic cancer respectively.

How long does it take to design a personalized neoantigen vaccine?

The computational pipeline (mutation calling, HLA typing, peptide generation, MHC binding prediction, and immunogenicity scoring) can be completed in hours using the SciRouter API. The manufacturing bottleneck is the physical vaccine production, which takes 4 to 8 weeks for mRNA-based vaccines and 6 to 12 weeks for peptide-based vaccines. BioNTech has demonstrated a 6-week turnaround for personalized mRNA neoantigen vaccines in their clinical trials, and the goal is to reduce this to under 4 weeks.

Try this yourself

500 free credits. No credit card required.