Universal neural network potentials are the biggest change in computational materials science this decade. Four names dominate the 2026 landscape: MACE, NequIP, Allegro, and Orb-v3. Each is open, each claims DFT-quality accuracy, and each runs orders of magnitude faster than density functional theory. This post compares them head to head so you can pick the right tool for your workflow.
The contenders
MACE
MACE is an equivariant graph neural network from a Cambridge-led team. It uses higher-body-order messages inspired by the Atomic Cluster Expansion, which means it captures three-body and four-body correlations inside a single layer. The foundation checkpoint, MACE-MP-0, is the most widely deployed universal potential in the field. For a deeper intro, see our MACE-MP-0 explainer.
NequIP
NequIP is the original equivariant neural network potential from the Kozinsky group at Harvard. It set the standard for what “E(3)-equivariant” means in this space and inspired much of the follow-up work including MACE and Allegro. NequIP tends to be highly accurate with relatively modest training data.
Allegro
Allegro is also from the Kozinsky group and is the strictly local cousin of NequIP. It removes global message passing in favor of purely local equivariant features, which unlocks very large-scale parallelism. It scales to billions of atoms on GPU clusters better than almost any other potential, and per-call inference is fast.
Orb-v3
Orb-v3 is from Orbital Materials and is the newest of the four. It is a foundation-model style universal potential with aggressive engineering for inference speed and broad element coverage. Early benchmarks show it at or near the top of the accuracy tables across the Materials Project.
Accuracy
On standard benchmarks (formation energies from Materials Project, MD17 organic molecules, standard ab-initio MD datasets), the gap between these four models is smaller than the gap between any of them and older fingerprint-based potentials. You are choosing between excellent and excellent, not between good and bad.
That said, a few observations hold up across benchmarks:
- Orb-v3 and MACE-MP-0 are typically at the top of the accuracy tables for formation energies and forces on bulk materials.
- NequIP is extremely data-efficient, meaning it reaches high accuracy with less training data, which matters if you are fine-tuning on a small custom dataset.
- Allegro can be slightly behind on per-system benchmarks but catches up at scale and wins on throughput.
Speed
For inference speed on a single GPU with a few hundred atoms:
- Allegro is typically the fastest because its local formulation is trivially parallel.
- Orb-v3 is competitive thanks to engineering optimizations and fused kernels.
- MACE is slightly slower per call but extremely CPU-friendly, which is valuable for laptop work.
- NequIP tends to be the slowest of the four per forward pass, though this depends heavily on the specific configuration.
All four are orders of magnitude faster than DFT. You are choosing between “fast” and “very fast,” not between fast and slow.
Element coverage
Element coverage matters if you want to use a pretrained foundation checkpoint without fine-tuning:
- MACE-MP-0: 89 elements, broad coverage from hydrogen through bismuth including lanthanides.
- Orb-v3: claims broad coverage across Materials Project elements.
- NequIP foundation checkpoints: recent releases cover most main-group and transition metal chemistry.
- Allegro: historically trained per-system rather than shipped as a single universal checkpoint, though recent universal Allegro work is catching up.
License and openness
All four models are open source with permissive licenses that allow commercial use. The training code, weights, and documentation are public for all of them. This is unusual in the broader ML foundation model landscape, and it is one of the things that makes materials science a comparatively healthy subfield.
When to pick which
Pick MACE-MP-0 when
- You want the friendliest documentation and largest community.
- You need CPU inference on a laptop.
- You need broad element coverage out of the box.
Pick NequIP when
- You have limited training data for a custom system.
- You need maximum accuracy for a narrow chemistry.
Pick Allegro when
- You need to run simulations at massive scale.
- Inference throughput matters more than foundation coverage.
Pick Orb-v3 when
- You are benchmarking against the latest universal checkpoints.
- You want competitive accuracy and engineering-grade speed.
The meta lesson: benchmark before committing
The most valuable thing you can do before building a production workflow around any of these models is to run a small DFT benchmark set on your own system and compare all four models against it. The winner for your chemistry may not be the winner on a generic benchmark, and a few hundred DFT reference points go a long way toward answering that question.
Bottom line
MACE-MP-0 is the friendliest starting point. Orb-v3 is the fastest-improving. NequIP is the most data-efficient. Allegro is the most scalable. All four are production-ready for serious materials work, and all four are orders of magnitude faster than DFT. Pick one, benchmark it on your system, and get started.