Which potential is most accurate?

On standard benchmarks like the Materials Project, MACE-MP-0 and Orb-v3 lead in mean absolute error on formation energies and forces, with Allegro and NequIP close behind. The gap between them is usually smaller than the gap between any of them and a legacy classical potential.

Which potential is fastest?

Allegro is generally the fastest per forward pass because it uses strictly local equivariant representations. Orb-v3 is competitive thanks to aggressive engineering. MACE and NequIP are slightly slower per call but still orders of magnitude faster than DFT.

Do any of these potentials cover the full periodic table?

MACE-MP-0, Orb-v3, and the latest NequIP foundation checkpoints all claim broad coverage across 80 or more elements. Allegro has strong coverage but tends to be trained per-system by users who want very high accuracy for a narrow chemistry.

Can I fine-tune these models on my own data?

Yes. MACE, NequIP, and Allegro all expose training pipelines that let you fine-tune a foundation checkpoint on a custom dataset. A few hundred DFT reference points is usually enough to significantly reduce error for a specific system.

Which one should I start with?

Start with MACE-MP-0. It has the friendliest documentation, a large user community, and a mature Python and ASE integration. Once you have a working pipeline, benchmark Orb-v3 against it on your specific system and pick whichever is more accurate.

Are these potentials drop-in replacements for DFT in molecular dynamics?

For most bulk and interfacial systems, yes. For reactive chemistry that is far from the training distribution, bond breaking, or unusual oxidation states, DFT is still the safer tool.

MACE vs NequIP vs Allegro vs Orb-v3: 2026 Neural Network Potential Comparison

Universal neural network potentials are the biggest change in computational materials science this decade. Four names dominate the 2026 landscape: MACE, NequIP, Allegro, and Orb-v3. Each is open, each claims DFT-quality accuracy, and each runs orders of magnitude faster than density functional theory. This post compares them head to head so you can pick the right tool for your workflow.

The contenders

MACE

MACE is an equivariant graph neural network from a Cambridge-led team. It uses higher-body-order messages inspired by the Atomic Cluster Expansion, which means it captures three-body and four-body correlations inside a single layer. The foundation checkpoint, MACE-MP-0, is the most widely deployed universal potential in the field. For a deeper intro, see our MACE-MP-0 explainer.

NequIP

NequIP is the original equivariant neural network potential from the Kozinsky group at Harvard. It set the standard for what “E(3)-equivariant” means in this space and inspired much of the follow-up work including MACE and Allegro. NequIP tends to be highly accurate with relatively modest training data.

Allegro

Allegro is also from the Kozinsky group and is the strictly local cousin of NequIP. It removes global message passing in favor of purely local equivariant features, which unlocks very large-scale parallelism. It scales to billions of atoms on GPU clusters better than almost any other potential, and per-call inference is fast.

Orb-v3

Orb-v3 is from Orbital Materials and is the newest of the four. It is a foundation-model style universal potential with aggressive engineering for inference speed and broad element coverage. Early benchmarks show it at or near the top of the accuracy tables across the Materials Project.

Accuracy

On standard benchmarks (formation energies from Materials Project, MD17 organic molecules, standard ab-initio MD datasets), the gap between these four models is smaller than the gap between any of them and older fingerprint-based potentials. You are choosing between excellent and excellent, not between good and bad.

That said, a few observations hold up across benchmarks:

Orb-v3 and MACE-MP-0 are typically at the top of the accuracy tables for formation energies and forces on bulk materials.
NequIP is extremely data-efficient, meaning it reaches high accuracy with less training data, which matters if you are fine-tuning on a small custom dataset.
Allegro can be slightly behind on per-system benchmarks but catches up at scale and wins on throughput.

Speed

For inference speed on a single GPU with a few hundred atoms:

Allegro is typically the fastest because its local formulation is trivially parallel.
Orb-v3 is competitive thanks to engineering optimizations and fused kernels.
MACE is slightly slower per call but extremely CPU-friendly, which is valuable for laptop work.
NequIP tends to be the slowest of the four per forward pass, though this depends heavily on the specific configuration.

All four are orders of magnitude faster than DFT. You are choosing between “fast” and “very fast,” not between fast and slow.

Element coverage

Element coverage matters if you want to use a pretrained foundation checkpoint without fine-tuning:

MACE-MP-0: 89 elements, broad coverage from hydrogen through bismuth including lanthanides.
Orb-v3: claims broad coverage across Materials Project elements.
NequIP foundation checkpoints: recent releases cover most main-group and transition metal chemistry.
Allegro: historically trained per-system rather than shipped as a single universal checkpoint, though recent universal Allegro work is catching up.

Note

If your system includes unusual lanthanides, actinides, or exotic oxidation states, always spot-check with DFT on a few configurations before trusting any universal potential.

License and openness

All four models are open source with permissive licenses that allow commercial use. The training code, weights, and documentation are public for all of them. This is unusual in the broader ML foundation model landscape, and it is one of the things that makes materials science a comparatively healthy subfield.

When to pick which

Pick MACE-MP-0 when

You want the friendliest documentation and largest community.
You need CPU inference on a laptop.
You need broad element coverage out of the box.

Pick NequIP when

You have limited training data for a custom system.
You need maximum accuracy for a narrow chemistry.

Pick Allegro when

You need to run simulations at massive scale.
Inference throughput matters more than foundation coverage.

Pick Orb-v3 when

You are benchmarking against the latest universal checkpoints.
You want competitive accuracy and engineering-grade speed.

The meta lesson: benchmark before committing

The most valuable thing you can do before building a production workflow around any of these models is to run a small DFT benchmark set on your own system and compare all four models against it. The winner for your chemistry may not be the winner on a generic benchmark, and a few hundred DFT reference points go a long way toward answering that question.

Bottom line

MACE-MP-0 is the friendliest starting point. Orb-v3 is the fastest-improving. NequIP is the most data-efficient. Allegro is the most scalable. All four are production-ready for serious materials work, and all four are orders of magnitude faster than DFT. Pick one, benchmark it on your system, and get started.

Open the Materials studio →

MACE vs NequIP vs Allegro vs Orb-v3: 2026 NN Potential Comparison