MaterialsMaterials Science

Crystal Structure Prediction: From Composition to 3D

What crystal structure prediction is, why it's one of science's hardest problems, and how modern methods from DFT to ML are solving it.

Ryan Bethencourt
April 14, 2026
9 min read

The Structure Prediction Problem

Crystal structure prediction (CSP) sits at the heart of materials science. Every property of a crystalline material – its electronic band gap, mechanical strength, ionic conductivity, magnetic behavior – is determined by how its atoms are arranged in three-dimensional space. If you know the structure, you can predict the properties. If you can predict the structure from composition alone, you can design materials without ever stepping into a lab.

The challenge is staggering. For a crystal unit cell containing just 20 atoms, the potential energy surface has 60 dimensions (3 coordinates per atom). This surface contains an astronomical number of local minima, each representing a different metastable arrangement. The global minimum – the thermodynamically stable structure – is the needle in a hyperdimensional haystack. Finding it reliably and efficiently is what makes CSP one of the great unsolved problems in computational science.

Note
The International Union of Crystallography has periodically held blind tests of crystal structure prediction since 1999. These competitions have driven significant progress, but even the best methods still struggle with flexible molecules and complex unit cells containing more than about 50 atoms.

Traditional Approaches to CSP

Density Functional Theory (DFT)

DFT is the workhorse of computational materials science. It solves the quantum mechanical equations for electrons in a crystal to compute the total energy of a given atomic arrangement. By comparing energies across many candidate structures, you can identify the most stable one. DFT is accurate – typically within 1–3% of experimental lattice parameters – but expensive. A single energy calculation for a 50-atom unit cell takes minutes to hours on a modern compute cluster. Searching thousands of candidates at DFT accuracy can take weeks of wall-clock time.

Evolutionary Algorithms (USPEX, CALYPSO)

Evolutionary algorithms borrow from biological evolution. They maintain a population of candidate structures, evaluate their fitness (energy), and breed new candidates through crossover and mutation operations. The key insight is that good crystal structures share local motifs – coordination polyhedra, bond lengths, symmetry elements – that can be recombined to explore structure space efficiently.

USPEX (Universal Structure Predictor: Evolutionary Xtallography) and CALYPSO are the two most widely used evolutionary CSP codes. Both use DFT for energy evaluation and have successfully predicted structures for hundreds of materials, including several that were later confirmed experimentally. Their main limitation is computational cost: a typical USPEX run for a ternary system requires thousands of DFT calculations.

Random Structure Search (AIRSS)

Ab Initio Random Structure Searching (AIRSS) takes a surprisingly simple approach: generate random crystal structures with sensible constraints (minimum interatomic distances, reasonable volumes), relax them to their nearest local minimum using DFT, and collect the lowest-energy results. The method works because crystallographic constraints dramatically reduce the effective search space, and the energy landscape has broad basins of attraction around stable structures. AIRSS has been particularly successful for high-pressure phases and hydrogen-rich materials.

Machine Learning Approaches

ML Interatomic Potentials

The key bottleneck in traditional CSP is the cost of DFT energy evaluation. Machine learning interatomic potentials (MLIPs) address this directly by training neural networks to approximate DFT energies and forces. Models like MACE, NequIP, and M3GNet learn from a dataset of DFT calculations and can then evaluate energies thousands of times faster than DFT with near-DFT accuracy.

This speedup transforms CSP. Where evolutionary algorithms with DFT might explore 10,000 structures in a week, the same algorithm with an MLIP can explore 10 million structures in a day. The tradeoff is that MLIPs can have systematic errors for compositions far from their training data, so the best workflows use MLIPs for broad screening and DFT for final validation of top candidates.

Graph Neural Networks for Stability Prediction

Rather than predicting energies from which stability is derived, some models predict stability directly. GNoME (Graph Networks for Materials Exploration) by DeepMind used this approach to screen 2.2 million candidate crystals. The model takes a crystal graph as input – atoms as nodes, bonds as edges – and outputs the formation energy and energy above the convex hull. Materials with energy above hull close to zero are predicted to be thermodynamically stable.

Generative Crystal Models

The most recent advance is generative models that directly output crystal structures. Diffusion models like CDVAE (Crystal Diffusion Variational Autoencoder) and DiffCSP learn the distribution of stable crystal structures and can sample new ones. Given a target composition or target property, they generate plausible structures without any search algorithm. This is conceptually similar to how image diffusion models generate images – but in crystallographic space instead of pixel space.

XRD Simulation: Validating Predictions

Once you have a predicted crystal structure, the next question is: does it match reality? X-ray diffraction (XRD) simulation bridges the gap between computational prediction and experimental validation. Given a crystal structure (lattice parameters, atom positions, space group), XRD simulation calculates the powder diffraction pattern that the structure would produce.

Comparing a simulated XRD pattern from a predicted structure against an experimental measurement is one of the most powerful validation tools in CSP. If the peaks align in position and relative intensity, the predicted structure is likely correct. If they do not, the prediction is wrong and the search must continue.

Simulate XRD pattern for a predicted crystal
import requests

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# Predict structure and simulate XRD for LiFePO4
response = requests.post(f"{BASE}/materials/xrd",
    headers=HEADERS,
    json={
        "composition": "LiFePO4",
        "radiation": "CuKa",       # Cu K-alpha, most common
        "two_theta_range": [10, 80],  # degrees
        "step_size": 0.02
    })

result = response.json()
print(f"Space Group:     {result['space_group']}")
print(f"Lattice Params:  a={result['a']:.3f}, b={result['b']:.3f}, c={result['c']:.3f} A")
print(f"Number of Peaks: {len(result['peaks'])}")
print(f"\nStrongest peaks (2-theta, intensity, hkl):")
for peak in sorted(result['peaks'], key=lambda p: p['intensity'], reverse=True)[:5]:
    print(f"  {peak['two_theta']:>7.2f}°  {peak['intensity']:>6.1f}%  ({peak['hkl']})")
Output
Space Group:     Pnma
Lattice Params:  a=10.334, b=6.008, c=4.693 A
Number of Peaks: 47

Strongest peaks (2-theta, intensity, hkl):
    25.58°   100.0%  (1 1 1)
    29.72°    78.3%  (3 1 1)
    35.59°    65.1%  (0 2 0)
    20.79°    52.8%  (0 1 1)
    36.54°    47.2%  (2 1 1)

Making CSP Accessible via API

Traditionally, running CSP required expertise in quantum chemistry codes (VASP, Quantum ESPRESSO), evolutionary algorithm software (USPEX, CALYPSO), and significant compute resources. API access removes these barriers. You provide a composition and get back predicted structures, properties, and simulated diffraction patterns – no software installation, no compute management, no learning curve for specialized codes.

Predict crystal structure from composition
import requests

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# Predict stable structure(s) for a composition
response = requests.post(f"{BASE}/materials/predict-structure",
    headers=HEADERS,
    json={
        "composition": "BaTiO3",
        "max_structures": 5   # return top-5 lowest energy
    })

structures = response.json()["structures"]
for i, s in enumerate(structures):
    print(f"Structure {i+1}:")
    print(f"  Space Group:      {s['space_group']}")
    print(f"  Formation Energy: {s['formation_energy']:.3f} eV/atom")
    print(f"  Energy Above Hull: {s['energy_above_hull']:.3f} eV/atom")
    print(f"  Volume:           {s['volume']:.1f} A³")
    print()

This democratizes CSP. A battery researcher who needs to check whether a novel cathode composition is likely to be stable can get an answer in seconds instead of setting up a week-long DFT campaign. A materials science student can explore crystal chemistry interactively. An AI agent can autonomously screen thousands of compositions as part of a larger discovery workflow.

Tip
API-based CSP is best suited for screening and prioritization. For publication-quality results or novel compositions far from known materials, validate the API predictions with full DFT calculations. The API is your fast filter; DFT is your precision instrument.

Practical Workflow: From Composition to Validated Structure

A typical CSP workflow using the API involves four steps:

  • Screen compositions: Query formation energies and stability for a list of candidate compositions. Filter for energy above hull below 0.05 eV/atom.
  • Predict structures: For stable candidates, retrieve the predicted crystal structures including lattice parameters, atom positions, and space groups.
  • Simulate XRD: Generate powder diffraction patterns for predicted structures. Compare against experimental data if available.
  • Predict properties: Calculate band gap, elastic modulus, ionic conductivity, or other application-specific properties for the validated structures.

Each step feeds into the next, creating a funnel that starts broad (thousands of compositions) and narrows to a handful of promising candidates worth synthesizing.

Next Steps

To learn more about the applications of crystal structure prediction:

Ready to predict structures for your own compositions? Open the Crystal Explorer Studio or get a free API key to start building with the materials API.

Frequently Asked Questions

What is crystal structure prediction (CSP)?

Crystal structure prediction is the computational determination of the three-dimensional arrangement of atoms in a crystalline solid given only its chemical composition. It answers the question: if I combine these elements in this ratio, how will the atoms arrange themselves? CSP is considered one of the grand challenges in materials science because the number of possible arrangements grows exponentially with system size.

Why is crystal structure prediction so difficult?

The difficulty comes from the vastness of the energy landscape. For a crystal with N atoms in the unit cell, the potential energy surface has on the order of 3N dimensions. This surface contains millions of local minima, each corresponding to a different atomic arrangement. Finding the global minimum (the most stable structure) requires either exhaustive search, which is computationally prohibitive, or clever heuristics that can miss important structures.

What is the difference between CSP and structure determination from XRD?

Structure determination from X-ray diffraction (XRD) works backward from experimental data: you synthesize the material, collect diffraction patterns, and solve the structure from the measured reflections. CSP works forward from composition alone: you have not made the material yet and want to predict what its structure would be. XRD tells you what exists; CSP tells you what could exist.

How accurate is DFT for crystal structure prediction?

DFT with standard functionals (PBE, PBEsol) typically reproduces experimental lattice parameters within 1-3% and correctly identifies the ground-state structure for most simple systems. However, DFT can fail for strongly correlated systems (transition metal oxides), van der Waals-dominated crystals (molecular solids), and systems where the energy differences between competing phases are very small (less than 10 meV/atom).

Can ML models replace DFT for crystal structure prediction?

Not entirely, but they dramatically accelerate the process. ML models trained on DFT data can predict formation energies and relative stabilities at a fraction of the computational cost, enabling the screening of millions of candidates. However, the final candidates still need DFT validation because ML models can have systematic errors, especially for compositions far from the training distribution. The best workflows use ML for broad screening and DFT for precise ranking.

What is XRD simulation and why does it matter for CSP?

XRD simulation calculates the diffraction pattern that a predicted crystal structure would produce if measured experimentally. This is critical for validating CSP results: if your predicted structure produces a simulated XRD pattern that matches an experimental measurement, it strongly supports the prediction. XRD simulation is also used for phase identification, helping researchers match unknown experimental patterns to known or predicted structures.

Try this yourself

500 free credits. No credit card required.