Foundation models are now unambiguously a thing in climate science. In the space of a few years, the field has gone from “data-driven weather models are interesting” to “several independent groups have released foundation models covering forecasting, air quality, geospatial imagery, and more.” This post is a survey of the 2026 landscape and where it looks like things are heading.
We will cover the models that show up in most serious conversations — Aurora, GraphCast, Pangu-Weather, ClimaX, Prithvi, and FourCastNet — and look at what each one is good for, where they overlap, and what the open research problems still are. We will close with how you can actually use these models today through SciRouter's climate lab.
The atmospheric forecasters
The most mature category. These are models that take an atmospheric state and forecast it forward. They are trained on reanalysis data, primarily ERA5, and they beat or match operational numerical weather prediction at medium range while running orders of magnitude faster.
Aurora (Microsoft)
The most explicitly foundation-model framed of the group. A 3D Swin Transformer backbone is pretrained on ERA5 and then fine-tuned into heads for forecasting, air quality, and cyclones. Aurora's defining feature is coverage — one backbone, many downstream tasks. Available through SciRouter.
GraphCast (Google DeepMind)
A graph neural network over an icosahedral grid. GraphCast is known for its strong deterministic medium-range forecasting accuracy and for the careful ablations in the original paper. It is less explicitly framed as a foundation model but acts as one in practice — the architecture generalizes to many forecasting variables.
Pangu-Weather (Huawei)
A 3D Earth-specific Transformer with hierarchical lead-time stacking. Pangu was among the first data-driven models to clearly beat operational NWP on standard benchmarks. It remains a strong choice for forecasting at specific short and medium lead times.
FourCastNet (NVIDIA)
One of the earliest global data-driven forecasting models. FourCastNet uses a Fourier neural operator architecture and was one of the first demonstrations that a trained neural network could run a global forecast on a single GPU in seconds.
The multi-task climate models
ClimaX
ClimaX, from a Microsoft Research and academic collaboration, is explicitly designed as a climate foundation model. It pretrains on a mixture of reanalysis and simulation data and fine-tunes to several tasks including forecasting, climate downscaling, and climate projection. It is an earlier example of the foundation-model pattern applied specifically to climate as opposed to weather.
Prithvi (NASA and IBM)
Prithvi is a geospatial foundation model trained primarily on satellite imagery. It is useful for tasks like land-use classification, flood mapping, burn-scar detection, and crop monitoring. It is not a weather forecaster — its input is imagery, not atmospheric state — but it is part of the same foundation-model wave in climate and earth observation.
The axes that matter
When you look at this list, three axes show up repeatedly.
- Input modality. Atmospheric state (reanalysis) versus imagery (satellite) versus mixed. The models cluster cleanly by this axis.
- Task coverage. Single-task (forecasting only) versus multi-head (forecasting plus other atmospheric tasks). Aurora and ClimaX are on the multi-head side.
- Compute cost. Pretraining is expensive; inference is dramatically cheaper than traditional NWP. This is the axis that enables the application explosion.
Open problems
Physical consistency
Data-driven models can sometimes produce output that is statistically good but physically inconsistent — conservation laws slightly violated, energy balance slightly off, hydrostatic balance not quite right. For applications that propagate forecasts many steps forward, these small inconsistencies can accumulate. Research on physics-informed training losses is active.
Ensemble calibration
Single deterministic forecasts are not enough for risk assessment — you need probabilistic output. Data-driven models are catching up to NWP ensembles in this area, but calibration is a live problem.
Out-of-distribution generalization
The climate is changing in ways that are, by definition, out of distribution relative to historical reanalysis. How well data-driven models extrapolate to a warmer world is a real open question.
Integration with physics-based models
The most likely future is hybrid. Physics-based models provide consistency, long time scales, and mechanism. Data- driven models provide speed, flexibility, and inference on new kinds of inputs. Building systems that combine both is an ongoing research agenda.
How to actually use them today
The easiest path in 2026 is a hosted endpoint. SciRouter exposes Aurora through the climate lab, so you can call forecasts, air quality prediction, and cyclone tracking from ordinary HTTP clients without provisioning a single GPU. For other models, the access picture is more varied — some are open-weight with inference scripts, some are research releases that require reimplementation, and some are only reachable through specific partners.
If you are building an application, start with whatever is most accessible. You can swap in a better model later if the interface is well defined. The interface is more valuable than the specific weights.
Where the field is heading
- More fine-tuning heads. Expect regional forecasting, precipitation, nowcasting, and specialized variables to show up as heads on existing backbones.
- Better ensembles. Data-driven models will catch up to NWP on probabilistic forecasting, closing one of the main remaining gaps.
- Hybrid architectures. Models that embed physics-based constraints into the training loss or architecture will become more common.
- Agent-native workflows. As MCP and similar protocols mature, agents will compose climate foundation models into larger reasoning loops.
Bottom line
The climate-science foundation-model landscape in 2026 is more crowded and more useful than it was even a year ago. Aurora, GraphCast, Pangu-Weather, ClimaX, Prithvi, and FourCastNet are all doing interesting things, and the patterns they share — pretraining on reanalysis or imagery, fine-tuning to task-specific heads, inference in seconds instead of hours — suggest that climate AI is following the same trajectory that natural language AI did a few years earlier.