GenomicsAPI Guides

How to Build a Personal Genomics App with Python

Parse 23andMe raw data and annotate SNPs with the SciRouter API. Complete Python tutorial with working code examples.

Ryan Bethencourt
April 9, 2026
12 min read

Why Build a Personal Genomics App

Millions of people have raw DNA data files from 23andMe or AncestryDNA sitting on their hard drives. These files contain hundreds of thousands of genotype calls, but without interpretation they are just rows of rsIDs and letter pairs. Building a personal genomics app lets you turn that raw data into actionable trait reports, drug interaction alerts, and ancestry insights.

This tutorial walks through the full pipeline: parsing a 23andMe raw data file with Python, querying the SciRouter SNP annotation API, matching user genotypes against known associations, and assembling a trait report. By the end you will have a working script that produces a structured JSON report from any 23andMe file.

Step 1: Parse the 23andMe Raw Data File

A 23andMe raw data file is a tab-separated text file. Lines starting with # are comments. Each data line has four columns: rsID, chromosome, position, and genotype. Here is a Python function that parses it into a dictionary keyed by rsID:

parse_23andme.py
def parse_23andme(filepath: str) -> dict[str, dict]:
    """Parse a 23andMe raw data file into a dict keyed by rsID."""
    genotypes = {}
    with open(filepath) as f:
        for line in f:
            if line.startswith("#") or line.strip() == "":
                continue
            parts = line.strip().split("\t")
            if len(parts) < 4:
                continue
            rsid, chrom, pos, geno = parts[0], parts[1], parts[2], parts[3]
            if rsid.startswith("rs"):
                genotypes[rsid] = {
                    "chromosome": chrom,
                    "position": int(pos),
                    "genotype": geno,
                }
    return genotypes

# Usage
user_snps = parse_23andme("genome_data.txt")
print(f"Parsed {len(user_snps)} SNPs")

A typical 23andMe v5 file contains around 640,000 SNPs. The parser runs in under a second on modern hardware. All processing happens locally — the raw file never leaves your machine.

Step 2: Fetch the SNP Annotation Catalog

SciRouter provides a free, unauthenticated endpoint that returns the full catalog of curated SNP annotations. Each entry includes the rsID, trait or condition, risk allele, genotype interpretations, category, and data source references.

Fetch annotations from SciRouter API
import requests

response = requests.get(
    "https://api.scirouter.ai/v1/personal-genomics/annotations"
)
annotations = response.json()["annotations"]
print(f"Fetched {len(annotations)} annotated SNPs")

# Each annotation looks like:
# {
#   "rsid": "rs4988235",
#   "gene": "MCM6/LCT",
#   "trait": "Lactose Tolerance",
#   "category": "Nutrition",
#   "risk_allele": "T",
#   "genotype_effects": {
#     "TT": "Lactase persistent (tolerant)",
#     "CT": "Likely tolerant (carrier)",
#     "CC": "Lactose intolerant (ancestral)"
#   },
#   "source": "GWAS Catalog"
# }
Note
The annotation endpoint is completely free and requires no API key. It returns over 400 curated SNPs spanning traits, pharmacogenomics, health markers, ancestry, and Neanderthal variants.

Step 3: Match User Genotypes Against Annotations

With the parsed genotypes and the annotation catalog in hand, matching is straightforward. For each annotated SNP, check whether the user has a genotype call, then look up the interpretation:

Match genotypes to annotations
def match_genotypes(user_snps: dict, annotations: list) -> list:
    """Match user genotypes against annotated SNPs."""
    results = []
    for ann in annotations:
        rsid = ann["rsid"]
        if rsid not in user_snps:
            continue
        user_geno = user_snps[rsid]["genotype"]
        # Normalize genotype order for matching
        sorted_geno = "".join(sorted(user_geno))
        effects = ann.get("genotype_effects", {})
        interpretation = effects.get(sorted_geno, "No interpretation available")
        results.append({
            "rsid": rsid,
            "gene": ann.get("gene", ""),
            "trait": ann["trait"],
            "category": ann["category"],
            "genotype": user_geno,
            "interpretation": interpretation,
            "risk_allele": ann.get("risk_allele", ""),
        })
    return results

matches = match_genotypes(user_snps, annotations)
print(f"Matched {len(matches)} SNPs with annotations")

Step 4: Build the Trait Report

Group the matched results by category to produce a structured report. This gives you separate sections for traits, pharmacogenomics, health markers, and ancestry:

Generate a categorized trait report
from collections import defaultdict
import json

def build_report(matches: list) -> dict:
    """Group matches into a categorized report."""
    report = defaultdict(list)
    for m in matches:
        report[m["category"]].append({
            "rsid": m["rsid"],
            "gene": m["gene"],
            "trait": m["trait"],
            "genotype": m["genotype"],
            "interpretation": m["interpretation"],
        })
    return dict(report)

report = build_report(matches)
print(json.dumps(report, indent=2))

# Output structure:
# {
#   "Traits": [...],
#   "Pharmacogenomics": [...],
#   "Health": [...],
#   "Ancestry": [...],
#   "Neanderthal": [...]
# }

The report typically contains 200-300 matched SNPs across all categories, depending on the user's chip version and call rate. You can render this as a web dashboard, export to PDF, or feed it into downstream analysis tools.

Next Steps

Frequently Asked Questions

Do I need a paid SciRouter account to use the personal genomics API?

No. The SNP annotation endpoint is free and requires no authentication. You can query the full catalog of 400+ curated SNPs without an API key. Uploading parsed results for storage and advanced analysis requires a free account with an API key.

What file formats does the parser support?

The Python parsing approach shown here works with 23andMe raw data files (v3, v4, and v5 chip versions). These are tab-separated text files with columns for rsID, chromosome, position, and genotype. AncestryDNA files use a similar format and can be parsed with minor adjustments to the column mapping.

Is my genomic data sent to SciRouter servers?

Only if you choose to upload parsed results. The parsing step runs entirely in your Python script on your own machine. The annotation catalog is fetched via API, but your raw genotype file never leaves your computer unless you explicitly post results to the upload endpoint.

How many SNPs are in the annotation database?

The SciRouter annotation catalog contains over 400 curated SNP entries covering traits, pharmacogenomics, health risk markers, ancestry-informative markers, and Neanderthal introgressed variants. Sources include ClinVar, PharmGKB, and the GWAS Catalog. The catalog is updated regularly.

Can I use this to build a commercial genomics product?

Yes. The SciRouter API is designed for developer integration. Free tier includes 5,000 API calls per month. For production use, Pro and Agentic tiers offer higher limits. Note that any genomic analysis product should include appropriate disclaimers that results are for informational and research purposes, not clinical diagnosis.

Try this yourself

500 free credits. No credit card required.