GenomicsSNP Analysis

SNP Analysis: A Beginner's Guide to Reading Your DNA

What are SNPs? How do you read genotype data? A complete beginner's guide to single nucleotide polymorphisms and personal genomics.

Ryan Bethencourt
April 9, 2026
9 min read

What Is a SNP?

A single nucleotide polymorphism (SNP, pronounced “snip”) is a position in the human genome where a single DNA base letter varies between individuals. The human genome is 3.2 billion bases long, and any two people differ at roughly 4–5 million of those positions. SNPs are the most common form of genetic variation and the basis of most consumer genetic tests.

Each SNP is identified by an rsID — a reference number from the dbSNP database maintained by NCBI. For example, rs12913832 is a SNP in the HERC2/OCA2 region that strongly influences eye color. When you download your 23andMe or AncestryDNA raw data, it is essentially a long list of rsIDs paired with your genotype at each position.

How Genotyping Works

Consumer genetic tests like 23andMe and AncestryDNA use microarray chips to read your DNA. Unlike whole genome sequencing (which reads every base), microarrays measure a pre-selected set of SNP positions — typically 600,000 to 700,000 sites chosen for their relevance to health, traits, ancestry, and research.

The process works by hybridization: your fragmented DNA is washed over a chip containing millions of short DNA probes. Each probe is designed to bind specifically to one allele at a SNP position. Fluorescent signals indicate which allele (or alleles) are present. The raw data file you download contains the results of this process for each measured position.

Reading Genotype Data

Your genotype at each SNP is reported as two letters, one from each chromosome (since humans are diploid). There are three possible genotypes at a biallelic SNP:

  • Homozygous reference — Both copies match the reference allele (e.g., AA). You carry zero copies of the variant.
  • Heterozygous — One copy of each allele (e.g., AG). You carry one copy of the variant.
  • Homozygous variant — Both copies are the alternative allele (e.g., GG). You carry two copies of the variant.

The order of letters (AG vs GA) does not matter — they represent the same heterozygous genotype. You cannot tell which allele came from which parent using microarray data alone; that requires phasing analysis or parental genotyping.

Reference vs Alternative Alleles

Every SNP has a reference allele (the base in the GRCh37 or GRCh38 human reference genome) and one or more alternative alleles. It is important to understand that the reference allele is not necessarily the “normal” or most common variant. The reference genome was assembled from a small number of individuals and may not represent the global majority at every position.

For example, at some SNP positions, the alternative allele may be more common globally than the reference allele. When reading research papers or annotations, pay attention to which allele is reported as the effect allele — the one associated with a trait or outcome — rather than assuming the reference allele is the baseline.

Population Frequencies

Allele frequencies vary significantly across global populations. A variant that is common in one population may be rare in another. Population frequency data (from projects like gnomAD and the 1000 Genomes Project) is typically reported for major continental groups:

  • EUR — European ancestry populations
  • AFR — African ancestry populations
  • EAS — East Asian ancestry populations
  • SAS — South Asian ancestry populations
  • AMR — Admixed American populations

Understanding population frequency is critical for interpreting whether your genotype is common or rare in your ancestral background, and for contextualizing risk associations that may have been studied primarily in one population.

Confidence Levels in Genomic Research

Not all SNP-trait associations are equally well-supported. When evaluating SNP annotations, consider:

  • Sample size — Large GWAS studies with tens of thousands of participants provide stronger evidence than small candidate gene studies.
  • Replication — Findings replicated across multiple independent studies and populations are more reliable.
  • Effect size — Most common SNPs have small individual effects. Claims of a single SNP “determining” a complex trait should be treated with skepticism.
  • Study design — GWAS provide statistical associations, not proof of causation. Functional studies that demonstrate a biological mechanism provide stronger evidence.
Note
Most complex traits (height, disease risk, intelligence, personality) are influenced by hundreds or thousands of SNPs, each contributing a tiny effect. A single SNP rarely determines an outcome. Be cautious of sources that overstate the predictive power of individual variants.

Start Exploring Your SNPs

Ready to look up individual SNPs from your raw data? The free SNP Lookup tool lets you search any rsID and see trait associations, population frequencies, effect alleles, and research confidence levels. For a broader analysis, upload your full raw data file through the Genomics Dashboard.

Developers can access the full SNP annotation catalog via the SciRouter Genomics API sign up for a free API key to get started.

Frequently Asked Questions

What is a SNP?

A SNP (single nucleotide polymorphism, pronounced 'snip') is a position in the genome where a single DNA base differs between individuals. For example, at a given position, most people might have a C, but some have a T. SNPs are the most common type of genetic variation, with roughly 4-5 million found in each individual when compared to the reference genome.

What does a genotype like AG mean?

A genotype like AG means you have one copy of the A allele (from one parent) and one copy of the G allele (from the other parent) at that SNP position. Since humans have two copies of each chromosome, every SNP has a two-letter genotype. AA or GG means both copies are the same (homozygous); AG means they differ (heterozygous).

What is a reference allele vs alternative allele?

The reference allele is the base found in the human reference genome (GRCh37/38). The alternative allele is the variant base observed in the population. Neither is inherently 'normal' or 'abnormal' — the reference genome is an arbitrary standard. The reference allele may actually be the less common one at some positions.

What is an effect allele?

The effect allele is the version of a SNP that is associated with a particular trait or outcome in a research study. For example, if a study finds that the T allele at a certain SNP is associated with higher caffeine sensitivity, T is the effect allele. The effect allele is not always the alternative allele — it depends on the specific study.

How accurate is 23andMe genotyping?

23andMe uses microarray genotyping chips, which are highly accurate (over 99% concordance) for the specific SNP positions they measure. However, microarrays only test a fixed set of positions (typically 600,000 to 700,000 SNPs) and cannot detect all types of variants. Whole genome sequencing provides complete coverage but at higher cost.

Try this yourself

500 free credits. No credit card required.