Weather and climate modeling used to be the exclusive domain of national meteorological centers. Running a global forecast required thousands of CPU cores, tuned numerical schemes, and decades of accumulated expertise. That barrier collapsed in 2022 when the first machine learning weather models (Pangu, GraphCast, FourCastNet) showed they could match or beat operational centers in accuracy while running on a single GPU.
The next generation is here. Google's NeuralGCM fuses neural networks with a classical dynamical core to match ECMWF skill at thousands of times the speed. NASA and IBM's Prithvi brings foundation-model pretraining to climate data. Ai2's ACE emulator simulates 1600 climate years per GPU day. SciRouter's climate API exposes all of them through a single endpoint.
The ML weather revolution in three years
The first wave of ML weather models (2022-2023) proved that transformer and graph-neural-network architectures could match physics-based forecasting on the standard WeatherBench metrics. GraphCast from DeepMind hit a 10-day forecast skill comparable to ECMWF's Integrated Forecasting System while running in about a minute on a single TPU. That was a full generation's worth of compute savings delivered in one year.
The second wave (2024-2025) moved beyond single deterministic forecasts. Three trends define it:
- Hybrid physics-ML systems like NeuralGCM that keep a real dynamical core and learn the parameterizations, getting the best of both worlds.
- Foundation models like Prithvi that pretrain on massive atmospheric datasets and fine-tune for specific downstream tasks.
- Climate emulators like ACE that run fast enough to simulate centuries of climate for ensemble or sensitivity studies.
NeuralGCM: physics plus neural networks
NeuralGCM is the first ML-based model to match operational skill on both medium-range weather and climate timescales. The key insight is that you do not have to replace the dynamical core: Euler equations, advection, and large-scale transport are already well understood. The hard part is everything that happens below grid scale, which classical models handle with hand-tuned parameterizations. NeuralGCM replaces those parameterizations with a learned neural network.
The result is a model that is fully differentiable end to end, so you can train against real observations rather than tuning each parameterization in isolation. Published in Nature in July 2024, it matched ECMWF on deterministic 5-day forecasts, beat it on some variables, and ran stable multi-decade climate simulations with realistic atmospheric variability.
Prithvi: a climate foundation model
Prithvi comes from a collaboration between NASA and IBM Research. It comes in two forms:
- Prithvi-EO for Earth observation. A vision transformer pretrained on global Landsat and Sentinel imagery and fine-tunable for land-use classification, crop mapping, flood detection, and wildfire monitoring.
- Prithvi-WxC for weather and climate. Trained on MERRA-2 reanalysis, it is a 2.3 billion parameter transformer that can be fine-tuned for downscaling, bias correction, gap-filling, and short-range forecasting.
The foundation-model framing matters because most climate applications do not need to retrain from scratch. You take Prithvi-WxC weights, fine-tune on a few hundred examples of whatever task you care about (say, downscaling global ERA5 to a regional kilometer-scale grid), and get a model that works. The API exposes both the pretrained base and several already-fine-tuned variants.
Ai2 ACE: climate timescale emulation
ACE is built for a different job than NeuralGCM. Where NeuralGCM targets operational forecasting skill, ACE targets throughput. It can simulate roughly 1600 years of climate per day of GPU wall time, which is fast enough to run ensembles of decadal variability, explore sensitivity to model parameters, and generate training data for downstream climate impact models.
In practical terms, ACE lets climate scientists ask questions that were simply too expensive before: what is the distribution of once-in-500-year events under different scenarios? How does a model respond to small changes in forcing? These are ensemble-heavy experiments that need thousands of realizations.
A forecast call
Here is what a NeuralGCM forecast call looks like. You specify the initialization time, the forecast horizon, and the region or global grid you want back.
import httpx
API_KEY = "sk-sci-..."
BASE = "https://scirouter.ai/v1"
response = httpx.post(
f"{BASE}/climate/aurora",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"model": "neuralgcm",
"init_time": "2026-04-10T00:00:00Z",
"horizon_hours": 240,
"lead_step_hours": 6,
"variables": ["t2m", "u10", "v10", "msl", "tp"],
"region": {
"lat_min": 25.0,
"lat_max": 50.0,
"lon_min": -125.0,
"lon_max": -65.0,
},
"ensemble_members": 25,
},
timeout=600,
)
result = response.json()
print(f"Forecast produced {result['n_timesteps']} timesteps")
print(f"Ensemble spread at day 7: {result['spread_day7']:.2f}")
# Download NetCDF for plotting in xarray
import xarray as xr
ds = xr.open_dataset(result["netcdf_url"])
ds.t2m.isel(time=28).plot()The call returns in a few minutes for a 10-day global ensemble. The output is standard NetCDF, compatible with xarray, cartopy, and every climate analysis library. You never touch the underlying model weights or GPU infrastructure.
Precipitation is the hard variable
Of all the fields a weather model produces, precipitation is the hardest. It is sparse (most gridcells are dry at any given hour), discontinuous (rain either falls or does not), and driven by subgrid-scale processes that classical models parameterize. Early ML models struggled here because the MSE loss they were trained on disincentivizes predicting rare extremes.
NeuralGCM does notably better on precipitation because its neural component can learn data-driven corrections to the convection and cloud microphysics that classical models get wrong. The API returns both deterministic amounts and probability-of-exceedance thresholds so you can reason about rainfall uncertainty in a statistically meaningful way.
Downscaling and regional refinement
Global models run at roughly 25 to 100 km resolution, which is too coarse for most regional applications. Downscaling is the process of taking a global forecast and refining it to a finer grid, usually using topography, land cover, and local observations to add detail.
Prithvi-WxC is well suited to this job because it was pretrained on global data and can be fine-tuned for regional downscaling in a few hours. The API wraps several pretrained regional variants (Europe, North America, East Asia) so you can get kilometer-scale fields for those regions without training anything yourself.
Ensemble forecasts and uncertainty
A single deterministic forecast is always a bad bet beyond a few days because the atmosphere is chaotic. Small uncertainties in the initial state grow exponentially, and by day 5 to 10 a single trajectory tells you very little. The right answer is an ensemble: run many forecasts from slightly perturbed initial conditions and summarize them probabilistically.
The API runs 25 or 50 ensemble members by default and returns per-gridpoint mean, spread, and exceedance probabilities. For applications like energy forecasting, flood warning, and agricultural planning, the probabilistic output is what actually matters.
Use cases beyond weather
The climate API is used for more than daily forecasting. A partial list of projects built on top of it:
- Renewable energy firms forecasting wind and solar resource at project sites.
- Insurers pricing parametric flood and hurricane products.
- Agricultural researchers studying yield response to climate variability.
- Climate attribution studies running large ensembles with and without anthropogenic forcing.
- Urban planners downscaling regional climate projections to city scale.
Getting started
The interactive lab is the fastest way to explore. Climate Lab lets you pick a model, a region, and a date, and see both deterministic and ensemble forecasts on an interactive map. Once you find a configuration you like, the lab shows the exact API call for reproducibility.
For production usage, the Python SDK handles the NetCDF download, retries on long-running ensembles, and caching so repeated calls to the same init time are instant.