ClimateClimate Foundation Models

Aurora Explained: Microsoft's Atmospheric Foundation Model

Aurora is Microsoft's atmospheric foundation model that outperforms GraphCast on 94% of weather variables. Here's how and why it matters.

SciRouter Team
April 11, 2026
12 min read

Aurora is Microsoft Research's atmospheric foundation model, and it represents a quiet shift in how AI is being applied to weather and climate. For decades, weather forecasting has meant one thing: solving the Navier-Stokes equations on a global grid using a supercomputer. Aurora does not replace that work — but it shows that a single pretrained neural network, fine-tuned into multiple heads, can match or beat specialized NWP systems on several tasks in a fraction of the inference time.

This post is a clear walkthrough of what Aurora is, how it was built, and how you can start using it through SciRouter's climate lab without provisioning a single GPU on your side.

Note
Aurora is a foundation model. The idea is the same as in natural-language processing: pretrain once on a huge corpus, fine-tune many times for downstream tasks. In NLP the corpus is the web. For Aurora the corpus is the atmosphere.

Why a foundation model for the atmosphere?

Foundation models caught on in NLP because the same backbone transferred well to dozens of tasks. Sentiment analysis, question answering, translation, summarization — once you have a good base model, task-specific fine-tuning is cheap. The Microsoft team asked whether the same idea applied to atmospheric science. The answer, based on the Aurora results, is that it does.

The atmosphere is a high-dimensional dynamical system. Before foundation models, every atmospheric task — medium-range forecasting, air quality, storm tracking — was a separate research project with a custom model. Aurora replaces that with a shared backbone and lightweight per-task heads. The upfront cost of pretraining is large. The marginal cost of adding a new application is small.

The architecture

Aurora is built around a 3D Swin Transformer. The key ideas:

  • 3D tokenization. The atmospheric state is split into patches across latitude, longitude, and pressure level. Each patch becomes a token, the same way a patch of pixels becomes a token in Vision Transformer.
  • Windowed attention. Full self-attention over every token would be too expensive at global resolution. Swin uses shifted-window attention so each layer attends locally, and the receptive field grows with depth.
  • Next-state prediction. The self-supervised pretraining objective is to predict the atmospheric state a short time ahead, given the current state. This is the atmosphere's version of causal language modeling.
  • Fine-tuning heads. After pretraining, each downstream task (weather forecast, air quality, cyclones) adds a small task-specific head that takes the shared representation and maps it to the task output.

The result is a model that sees the atmosphere as a whole system. Temperature, wind, humidity, and pressure interact because they share a representation. When you ask Aurora to predict air quality, it is drawing on the same features that power weather prediction.

Pretraining on ERA5

The pretraining corpus is the center of gravity for any foundation model. For Aurora, it is ERA5 — the reanalysis produced by the European Centre for Medium-Range Weather Forecasts. ERA5 is an hourly, global, multi-variable dataset that goes back to 1940. It is the closest thing to a ground-truth long-term record of the atmosphere.

Training on ERA5 gives Aurora two advantages:

  • Scale. Decades of hourly global data is a lot of samples. The model sees every kind of weather situation that has happened in living memory.
  • Consistency. Reanalysis data is produced with a single model and assimilation pipeline, so the variables are internally consistent. The model learns real physical relationships, not artifacts of instrument changes.

In addition to ERA5, the Aurora team trained on supplementary atmospheric datasets to broaden coverage of variables and phenomena that are under-represented in reanalysis. Each fine-tuning head then adds its own task-specific data.

The fine-tuning heads

Three of the original fine-tuning tasks illustrate the range of what Aurora can do.

Medium-range weather forecasting

This is the most obvious task. Given an atmospheric state, predict it forward a few days. Aurora in its forecasting head matches or beats specialized data-driven weather models on several standard benchmarks, and does so with fewer FLOPs at inference time because the backbone was already trained for next-state prediction.

Air quality prediction

Global air quality prediction is historically a hard task. It requires tracking multiple pollutants across atmospheric layers and accounting for sources, sinks, and transport. Aurora's air quality head takes the shared atmospheric representation and maps it to pollutant concentrations.

Tropical cyclone tracking

Cyclone tracks are a high-leverage forecasting problem — a better forecast means better evacuation planning. Aurora's cyclone head has shown competitive performance against physics-based models while running orders of magnitude faster, which opens the door to ensemble forecasts that were previously too expensive.

Where Aurora fits in the broader AI weather landscape

Aurora is not alone. GraphCast from Google DeepMind, Pangu-Weather from Huawei, and FourCastNet from NVIDIA are all data-driven weather models released in the last few years. Each has its own architecture and its own strengths. What makes Aurora distinctive is the explicit foundation-model framing — one backbone, many fine-tuning heads — and the broader coverage of tasks beyond just forecasting.

For a head-to-head comparison, see our Aurora vs GraphCast vs Pangu-Weather benchmark post.

Limits of the model

Aurora is a learned emulator. That means a few things worth being honest about.

  • It does not know physics that was not in the training data. If a situation is genuinely outside the distribution of ERA5, the model's predictions are less trustworthy.
  • It does not produce the same kind of uncertainty estimates that a full ensemble NWP system produces, though ensemble techniques are being developed for data-driven models.
  • It cannot be directly used to attribute weather extremes to underlying physical mechanisms — that is a job for physics-based models and diagnostic tools.
Warning
Aurora is extremely useful for fast forecasting and research. It is not a substitute for operational NWP, and life-safety decisions should still rely on the full suite of physics-based models and expert forecasters.

Using Aurora through SciRouter

SciRouter hosts Aurora inference in the climate lab. You send a starting atmospheric state — current conditions plus the variables Aurora expects — along with a forecast horizon, and the gateway returns the predicted state. You do not need to manage GPU allocations, model weights, or ERA5 ingestion on your side. See our browser-based AI weather forecasting guide for a hands-on walkthrough.

Bottom line

Aurora is what an atmospheric foundation model looks like in practice: a shared backbone trained on reanalysis data, fine tuned into specialized heads, and fast enough at inference time to open new applications that operational NWP cannot reach. It is not a replacement for traditional forecasting — it is a complement that lets you ask weather questions interactively in a way that was previously impossible.

Try Aurora in the SciRouter Climate Lab →

Frequently Asked Questions

What is Aurora?

Aurora is an atmospheric foundation model released by Microsoft Research. It is a vision-transformer-based model pretrained on a massive corpus of atmospheric reanalysis data — primarily ERA5 — and then fine-tuned into specialized heads for tasks like medium-range weather forecasting, tropical cyclone tracking, and air quality prediction. The key claim is that a single pretrained backbone can be adapted to a wide range of atmospheric tasks, the same way a language foundation model can be adapted to many text tasks.

How is Aurora different from a traditional numerical weather model?

Traditional numerical weather prediction (NWP) solves the governing fluid-dynamics equations on a grid. Aurora learns the mapping from atmospheric state to next-state directly from data. It is dramatically faster at inference time — seconds instead of hours on a supercomputer — at the cost of being a data-driven emulator rather than a first-principles simulator. It is not a replacement for NWP; it is a complement that extends what is possible in real-time.

What data was Aurora pretrained on?

The primary pretraining corpus is ERA5, the European Centre for Medium-Range Weather Forecasts reanalysis dataset. ERA5 provides hourly atmospheric fields going back to 1940, including temperature, wind, humidity, pressure, and many other variables, on a global grid. Aurora was also exposed to additional atmospheric datasets during the self-supervised pretraining stage, and each fine-tuning head adds task-specific data on top.

What is the architecture?

Aurora uses a 3D Swin Transformer backbone. The atmosphere is tokenized into patches across latitude, longitude, and pressure level. The transformer learns spatial and vertical relationships in the same attention mechanism. The model is trained with a next-state prediction objective at pretraining, and then fine-tuned with task-specific loss functions — mean squared error for forecasting, classification loss for pollutant categories, and so on.

Is Aurora open source?

Microsoft released the pretrained weights under a research license, and the paper describes the architecture in enough detail that it can be reimplemented. The exact availability of fine-tuned heads depends on the variant. SciRouter exposes hosted Aurora inference through the climate lab so you do not need to manage the weights or GPU infrastructure yourself.

What can I actually do with Aurora?

Short-term and medium-range weather forecasting (wind, temperature, pressure), air quality prediction for major pollutants, tropical cyclone track forecasting, and several regional and task-specific applications. The model was designed as a backbone, so the list of applications continues to grow as new fine-tuning heads are released.

How do I try Aurora on SciRouter?

Aurora is available inside the SciRouter climate lab. You send a starting atmospheric state and a forecast horizon, and the gateway returns the predicted state. You can also call it as a tool through the MCP server so agents can plug weather reasoning into broader climate and sustainability workflows.

Try this yourself

500 free credits. No credit card required.