For decades, if you wanted reliable pKa predictions in drug discovery, you had two bad options. You could pay for an empirical predictor like ChemAxon's Marvin or ACD, which worked well on molecules it had seen before and badly on novel scaffolds. Or you could run DFT, which gave you high-quality answers but took hours per compound. Neither scaled to modern screening workflows. Egret-1 changes that.
What Egret-1 is
Egret-1 is a neural network potential specialized for small-molecule organic chemistry. It is trained on a large dataset of DFT calculations across drug-like molecules, and it learns to produce energies, forces, and derived properties at DFT-quality accuracy with sub-second inference time.
You can think of it as the small-molecule cousin of universal materials potentials like MACE-MP-0. Both are equivariant neural networks, both are trained on broad DFT data, and both aim to replace DFT as the default for production workflows. Egret-1 focuses on organic chemistry and the properties medicinal chemists actually need.
What it can compute
Because Egret-1 is an energy model, everything downstream comes from the same forward pass. That means a single inference gives you:
- pKa — the acid dissociation constant, critical for understanding which protonation state a molecule adopts at physiological pH.
- logP — the octanol-water partition coefficient, a core descriptor of lipophilicity and a driver of membrane permeability and clearance.
- Bond dissociation energies — useful for reasoning about metabolic stability, radical reactions, and oxidative degradation pathways.
- Fukui functions — local reactivity indices that tell you where electrophiles and nucleophiles want to attack.
- Optimized geometries — relaxed 3D structures that look like what DFT would give you, in milliseconds instead of hours.
All of these can be retrieved through a single call to the quantum chemistry properties endpoint on SciRouter.
The speed story
A single DFT pKa calculation for a drug-like molecule typically involves relaxing the neutral and protonated forms, each at a few tens of minutes on a single CPU. Add solvation corrections and basis set extrapolation and you are at an hour or more. For a library of 1000 molecules, that is 1000 hours on a single node or several days on a cluster.
Egret-1 takes under a second per molecule. The same 1000 molecules run in under 20 minutes. This is not a small change. It is the difference between computing quantum chemistry on one or two interesting molecules versus computing it on every molecule in your virtual screen.
Accuracy: how good is “DFT-quality”?
The phrase “DFT-quality” means slightly different things in different benchmarks. For Egret-1 on organic small molecules:
- pKa: mean absolute error around 0.4 to 0.6 pKa units on standard benchmarks, comparable to or better than common DFT functionals and similar to well-tuned empirical predictors on in-distribution molecules.
- logP: mean absolute error in the 0.4 log unit range, competitive with the best empirical tools.
- BDE: mean absolute error of a few kilocalories per mole on common organic bonds.
For comparison, experimental pKa values themselves have scatter of 0.3 to 0.5 units when measured by different techniques. Egret-1 is therefore often within experimental noise, which is the best you can ask of any model.
Where Egret-1 wins over empirical predictors
Empirical pKa and logP predictors have served medicinal chemistry for decades. They are fast, they are cheap, and they are usually good. But they have two structural limitations:
- Novel scaffolds are hard. Empirical tools work by matching substructure fragments to training data. When you encounter a scaffold that is underrepresented in the training set, the prediction gets unreliable without warning.
- Environment matters. pKa depends on conformation, tautomer, and the specific microstate. Pure fragment-based methods struggle to capture this.
Egret-1, because it is grounded in the underlying quantum chemistry, is less sensitive to scaffold novelty. It sees the molecule as a full 3D structure with real electron density, not as a bag of substructures.
Where Egret-1 does not help
Egret-1 is specialized for small organic molecules. It is not the right tool for:
- Transition metal complexes.
- Very large molecules (beyond about 100 heavy atoms).
- Reactive chemistry that is far from the training data.
- Properties that depend on explicit solvent, electric fields, or external perturbations.
For these, DFT or more specialized tools are still the right answer.
A practical example
Suppose you have a medicinal chemistry lead and you want to know what protonation state it adopts at blood pH, whether it is likely to be orally bioavailable based on lipophilicity, and which of its bonds are most vulnerable to metabolic oxidation. With Egret-1 on SciRouter, one API call gives you:
- The pKa of every ionizable site.
- The dominant microspecies at pH 7.4.
- The logP of that microspecies.
- The weakest C-H bond dissociation energy.
- A Fukui function plot showing electrophilic attack sites.
All in under a second. Doing this with DFT would take several hours. For more detail on the API itself, see our developer guide to the quantum chemistry endpoint.
Bottom line
Egret-1 turns quantum-grade small-molecule property prediction from a rare careful activity into a routine screen. If you are doing medicinal chemistry and still waiting hours per molecule for DFT, this is the easiest upgrade you can make this year.