MaterialsMaterials Science

AI for Materials Discovery: Finding New Materials with ML

How machine learning is accelerating materials discovery — from batteries to semiconductors. Google DeepMind's GNoME, crystal structure prediction, and the future of materials informatics.

Ryan Bethencourt
April 13, 2026
10 min read

The Materials Discovery Bottleneck

For most of human history, discovering new materials was a slow, serendipitous process. A researcher would hypothesize a composition, synthesize it in a lab, characterize its properties, and decide whether to iterate or move on. Each cycle took weeks to months. The entire periodic table contains roughly 100 usable elements, and the number of possible multi-element combinations grows combinatorially – there are an estimated 10^{100} possible inorganic crystal structures. We have synthesized fewer than 50,000 of them.

This gap between what is possible and what we have explored is the materials discovery bottleneck. It affects every technology that depends on advanced materials: batteries that store more energy, semiconductors that switch faster, catalysts that convert CO2 more efficiently, and superconductors that work at higher temperatures. The bottleneck is not physics – it is throughput.

Note
The average time from initial materials discovery to commercial deployment is 15–20 years. AI-driven approaches aim to compress the discovery phase from years to weeks, though the development and manufacturing phases still require significant time.

How AI Changes the Game

Machine learning attacks the bottleneck by replacing expensive computations and slow experiments with fast predictions. Instead of running a density functional theory (DFT) calculation that takes hours per structure, a trained neural network predicts the same property in milliseconds. Instead of synthesizing 1,000 candidates and testing them, you screen 10 million computationally and synthesize only the top 50.

The shift mirrors what happened in drug discovery a decade ago. Pharmaceutical companies moved from high-throughput physical screening to virtual screening, dramatically reducing costs and timelines. Materials science is now undergoing the same transformation, enabled by three converging factors:

  • Large datasets: The Materials Project, AFLOW, and NOMAD databases now contain millions of computed material properties, providing the training data that ML models need.
  • Better architectures: Graph neural networks (GNNs) that operate directly on crystal structures have proven remarkably effective at learning structure-property relationships.
  • Compute accessibility: Cloud GPUs and API services make it possible for any researcher to run inference on state-of-the-art models without building their own infrastructure.

Key Methods in AI Materials Discovery

Crystal Structure Prediction

Given a chemical composition (like Li2MnO3), predict the most stable 3D arrangement of atoms. This is one of the hardest problems in materials science because the energy landscape has millions of local minima. Traditional approaches use evolutionary algorithms or random structure search with DFT energy evaluation. ML approaches train surrogate models to approximate DFT energies, enabling orders-of-magnitude faster screening.

For a deeper dive into the methods and challenges of CSP, see our dedicated guide on crystal structure prediction.

Property Prediction

Given a crystal structure, predict its properties: formation energy, band gap, elastic modulus, ionic conductivity, thermal conductivity, and more. Graph neural networks like CGCNN (Crystal Graph Convolutional Neural Network) and MEGNet encode the crystal as a graph where nodes are atoms and edges are bonds, then learn to map this graph to scalar properties. These models achieve DFT-level accuracy for many properties at a fraction of the computational cost.

Generative Models

Rather than screening existing candidates, generative models create entirely new crystal structures. Variational autoencoders (VAEs), generative adversarial networks (GANs), and diffusion models have all been applied to crystal generation. The most promising recent approach uses diffusion models that operate in both composition and structure space simultaneously, generating novel stable crystals that satisfy target property constraints.

Active Learning

Active learning combines ML prediction with strategic experimentation. The model predicts properties for a large candidate pool, identifies the most uncertain or promising candidates, and recommends them for DFT calculation or experimental synthesis. The new data points are added to the training set, the model is retrained, and the cycle repeats. This closed-loop approach maximizes information gain per experiment.

Real-World Impact: GNoME and Beyond

In November 2023, Google DeepMind published GNoME (Graph Networks for Materials Exploration), which predicted 2.2 million new stable crystal structures – an order of magnitude more than all previously known inorganic crystals combined. Of these, approximately 380,000 were independently validated, and 736 have already been synthesized by robotics labs at Lawrence Berkeley National Laboratory.

GNoME used a two-stage pipeline: a structural pipeline that modified known crystals to find new stable compositions, and a compositional pipeline that used chemical similarity to propose entirely new formulas. Both stages used graph neural networks to predict formation energies, filtering for thermodynamic stability against known decomposition pathways.

The implications are profound. Among the newly discovered stable crystals are potential next-generation battery cathodes with higher energy density, novel semiconductor compositions for more efficient solar cells, and superconductor candidates that may work at higher temperatures. Each of these could take years to develop commercially, but the discovery phase that previously would have taken decades was compressed into months.

Tip
GNoME's dataset is publicly available and has been integrated into the Materials Project database. You can query these structures programmatically through materials science APIs, including SciRouter's planned materials endpoints.

Accessing Materials Discovery via API

SciRouter's materials endpoints bring AI-powered property prediction to any developer or researcher through a simple REST API. Here is how to query material properties for a given composition:

Predict material properties from composition
import requests

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# Predict properties for a lithium manganese oxide composition
response = requests.post(f"{BASE}/materials/properties",
    headers=HEADERS,
    json={
        "composition": "Li2MnO3",
        "properties": ["formation_energy", "band_gap",
                        "energy_above_hull", "density"]
    })

result = response.json()
print(f"Composition:       {result['composition']}")
print(f"Formation Energy:  {result['formation_energy']:.3f} eV/atom")
print(f"Band Gap:          {result['band_gap']:.2f} eV")
print(f"Energy Above Hull: {result['energy_above_hull']:.3f} eV/atom")
print(f"Density:           {result['density']:.2f} g/cm³")
print(f"Predicted Stable:  {result['energy_above_hull'] < 0.05}")
Output
Composition:       Li2MnO3
Formation Energy:  -1.847 eV/atom
Band Gap:          2.36 eV
Energy Above Hull: 0.000 eV/atom
Density:           3.89 g/cm³
Predicted Stable:  True

The energy_above_hull value is key: it measures how far the composition sits above the thermodynamic convex hull. A value of zero means the material is on the hull and predicted to be stable. Values below 0.05 eV/atom are generally considered potentially synthesizable.

Screening a Candidate Library

The real power of API access is batch screening. Here is how to evaluate a list of candidate battery cathode materials:

Batch screen battery cathode candidates
import requests

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# Candidate cathode compositions
candidates = [
    "LiFePO4",      # Known: lithium iron phosphate
    "LiCoO2",       # Known: lithium cobalt oxide
    "LiNi0.8Mn0.1Co0.1O2",  # NMC 811
    "Li2FeSiO4",    # Silicate cathode
    "NaFePO4",      # Sodium-ion alternative
    "LiVPO4F",      # Fluorophosphate
]

response = requests.post(f"{BASE}/materials/properties",
    headers=HEADERS,
    json={
        "compositions": candidates,
        "properties": ["formation_energy", "band_gap",
                        "energy_above_hull", "density"]
    })

results = response.json()["results"]

print(f"{'Composition':<25} {'E_form':>8} {'E_hull':>8} {'Band Gap':>9} {'Stable':>7}")
print("-" * 62)
for comp, props in zip(candidates, results):
    stable = "YES" if props["energy_above_hull"] < 0.05 else "NO"
    print(f"{comp:<25} {props['formation_energy']:>8.3f} "
          f"{props['energy_above_hull']:>8.3f} "
          f"{props['band_gap']:>8.2f}  {stable:>6}")

The Future of AI Materials Discovery

Several trends are shaping the next phase of AI-driven materials science:

  • Foundation models for materials: Large pre-trained models that understand crystal chemistry across all material classes, fine-tunable for specific applications like battery design or catalysis.
  • Autonomous labs: Closed-loop systems where AI models propose candidates, robotic synthesizers make them, automated characterization measures their properties, and the data feeds back into the model. Berkeley Lab's A-Lab has already demonstrated this workflow.
  • Multi-fidelity learning: Models that combine cheap, approximate calculations (semi-empirical) with expensive, accurate ones (hybrid DFT) to get the best of both worlds.
  • Inverse design: Instead of predicting properties from structure, specify desired properties and generate the structure. This flips the discovery paradigm from search to design.

The convergence of large materials databases, powerful graph neural networks, and accessible compute infrastructure means that the pace of materials discovery will only accelerate. What took decades of trial-and-error experimentation can now be accomplished in weeks of computational screening followed by targeted synthesis.

Next Steps

To explore specific aspects of materials science AI in more depth:

Ready to screen your own materials candidates? Open the Crystal Explorer Studio or get a free API key to start querying materials properties programmatically.

Frequently Asked Questions

What is AI materials discovery?

AI materials discovery uses machine learning models to predict the properties of hypothetical materials before they are synthesized in a lab. Instead of testing thousands of compositions experimentally, researchers train models on existing materials databases and use them to screen millions of candidates computationally. This accelerates the discovery pipeline from decades to months.

How did Google DeepMind&apos;s GNoME find 2.2 million new crystals?

GNoME (Graph Networks for Materials Exploration) used graph neural networks trained on the Materials Project database. It generated candidate crystal structures, predicted their stability using formation energy calculations, and filtered for thermodynamically stable phases. The 2.2 million figure represents structures predicted to be stable, of which about 380,000 were independently validated by external labs.

What types of materials can AI discover?

AI materials discovery covers inorganic crystals (battery cathodes, semiconductors, superconductors), polymers, metal-organic frameworks (MOFs), catalysts, and alloys. The approach works best for materials where large training datasets exist and where properties can be computed from structure. Organic molecular materials and biological materials are typically handled by separate specialized models.

Do I need a GPU to run materials discovery models?

Training large materials discovery models like GNoME requires significant GPU resources. However, inference (using a pre-trained model to predict properties for new compositions) is much cheaper and can often run on CPUs. API services like SciRouter handle the infrastructure so you can query materials properties without managing any hardware.

How accurate are ML predictions for materials properties?

Accuracy depends on the property and the model. For formation energy, state-of-the-art models achieve mean absolute errors of around 20-30 meV/atom compared to DFT calculations. For band gaps, errors are typically 0.3-0.5 eV. These are accurate enough for screening and prioritization but not for final design decisions, which still require DFT validation or experimental confirmation.

What is the difference between materials discovery and materials design?

Materials discovery searches for new stable compositions and structures within chemical space. Materials design starts with a target property (like a specific band gap or ionic conductivity) and works backward to find compositions that achieve it. Discovery asks 'what exists?' while design asks 'what should I build?' Modern AI approaches increasingly blur this boundary through generative models that can do both.

Try this yourself

500 free credits. No credit card required.