ProteinsESMFold

NVIDIA Proteina-Complexa: AI Protein Binder Design with 63% Hit Rates

Deep dive into NVIDIA's Proteina-Complexa — the open-source protein binder design model achieving 63.5% hit rates with picomolar affinities. How it works, ICLR 2026 results, and comparison to RFdiffusion and ProteinMPNN.

Ryan Bethencourt
March 25, 2026
11 min read

What Is Proteina-Complexa?

Proteina-Complexa is a generative AI model from NVIDIA designed for one of the hardest problems in protein engineering: creating new proteins that bind tightly and specifically to a chosen target. Released in March 2026 and accepted as an oral presentation at ICLR 2026, it represents a significant step forward in computational protein design.

The model uses a flow-matching framework to generate fully atomistic protein binders — not just backbone traces, but complete structures with side chains, hydrogen bond networks, and binding interfaces. In experimental validation, Proteina-Complexa achieved 63.5% hit rates with affinities reaching into the picomolar range. Perhaps most remarkably, it produced the first-ever de novo designed carbohydrate binders, a target class that has historically resisted computational design.

Note
Proteina-Complexa is fully open source under the NVIDIA Open Model License (Apache-style, commercial use permitted). Code is on GitHub at NVIDIA-Digital-Bio/proteina-complexa, and trained weights are on HuggingFace.

Why Protein Binder Design Matters

Protein binders are the workhorses of modern therapeutics and biotechnology. Antibodies, nanobodies, designed ankyrin repeat proteins (DARPins), and de novo binder scaffolds all work by binding to specific molecular targets with high affinity. The ability to computationally design these binders from scratch — rather than screening billions of candidates in the lab — could compress drug discovery timelines from years to weeks.

The applications extend well beyond therapeutics:

  • Therapeutic proteins: De novo binders as drug candidates targeting previously undruggable surfaces, including carbohydrate epitopes on pathogens
  • Enzyme engineering: Designing allosteric regulators and enzyme inhibitors with programmable specificity
  • Biosensors: Creating protein switches that change conformation upon binding, enabling real-time molecular detection
  • Targeted delivery: Engineering proteins that bind cell-surface markers for precision drug delivery or CAR-T cell targeting

How Proteina-Complexa Works

Proteina-Complexa is built on a flow-matching generative framework, a class of generative models closely related to diffusion models but with straighter sampling trajectories that improve generation speed and quality. The key architectural decisions that distinguish it from prior work:

All-Atom Generation

Unlike backbone-only methods such as RFdiffusion, Proteina-Complexa generates full atomic coordinates for the binder protein, including all side-chain atoms. This means the model directly optimizes for side-chain packing at the binding interface, hydrogen bonding networks, and van der Waals complementarity. There is no separate rotamer packing step — the model produces a complete structure in one pass.

Multi-Target Support

A single Proteina-Complexa model handles three target types: protein surfaces, small molecules, and carbohydrates. This unified approach is notable because carbohydrate binding has been an open problem — the shallow, polar surfaces of sugar molecules make them extremely difficult targets for traditional design methods. The model learns representations across all three modalities during training, enabling transfer between target types.

Flow-Matching vs. Diffusion

Flow-matching defines a continuous path from noise to data using optimal transport, resulting in straighter trajectories compared to the curved paths of standard diffusion. In practice, this means Proteina-Complexa requires fewer denoising steps to produce high-quality structures, improving inference speed without sacrificing accuracy. The model conditions generation on the target structure and binding hotspot specification.

Experimental Validation

The headline numbers from the Proteina-Complexa paper are striking. On a diverse set of protein targets, the model achieved a 63.5% experimental hit rate — meaning nearly two-thirds of computationally designed binders showed measurable binding in lab assays. The best binders reached picomolar affinities (Kd in the low nanomolar to picomolar range), which is competitive with affinity-matured antibodies.

The carbohydrate binder results are particularly significant. De novo computational design of proteins that bind carbohydrates had not been demonstrated before. Proteina-Complexa generated binders for several carbohydrate targets, and multiple designs showed experimentally confirmed binding — a first in the field.

The model has been validated by external groups including Novo Nordisk and Manifold Bio, lending independent credibility to the published benchmarks.

Proteina-Complexa vs. ProteinMPNN vs. RFdiffusion

These three tools occupy different but complementary niches in the protein design ecosystem. Understanding the distinctions is important for choosing the right approach:

  • Proteina-Complexa: Generates complete binder structures (backbone + side chains) from scratch using flow-matching. Handles protein, small molecule, and carbohydrate targets. Best for de novo binder design when you need a complete structure ready for experimental testing.
  • ProteinMPNN: Inverse folding model — given a fixed backbone, it designs amino acid sequences that will fold into that shape. Does not generate new structures, but excels at optimizing sequences for existing backbones. Typically used after a backbone generator to produce designable sequences.
  • RFdiffusion: Generates protein backbones using denoising diffusion, but produces backbone-only structures that need side-chain packing (usually via ProteinMPNN + Rosetta). Proven track record for protein-protein binder design but does not natively handle small molecules or carbohydrates.

In practice, many design campaigns use these tools together. RFdiffusion or Proteina-Complexa generates candidate backbones, ProteinMPNN optimizes sequences, and ESMFold validates that designed sequences fold into the intended structures. Proteina-Complexa's all-atom approach may reduce the need for the separate side-chain packing step, streamlining the pipeline.

Open Source and Accessibility

NVIDIA released Proteina-Complexa under the NVIDIA Open Model License, an Apache-style license that permits commercial use, modification, and redistribution. This is a deliberate choice to maximize adoption in both academic and industry settings. The full release includes:

  • Code: GitHub repository at NVIDIA-Digital-Bio/proteina-complexa with training and inference scripts
  • Weights: Pre-trained model weights on HuggingFace, ready for inference
  • GPU requirements: Inference runs on a single A100 GPU, making it accessible to academic labs with standard compute allocations
  • Documentation: Detailed tutorials for binder design workflows, including target preparation and output interpretation

The open-source release follows NVIDIA's broader strategy with the Proteina model family (which also includes Proteina for unconditional protein generation) of building open infrastructure for computational biology.

Practical Considerations

If you are considering using Proteina-Complexa in a design campaign, a few practical points:

  • Target preparation: You need a 3D structure of the target (protein, ligand, or carbohydrate) and a specification of the desired binding site or hotspot residues
  • Generation diversity: Like all generative models, running multiple independent generations and filtering candidates improves success rates. Plan to generate 100+ designs and filter computationally before experimental testing.
  • Validation pipeline: Combine with structure prediction (ESMFold or AlphaFold) to check that designed sequences fold correctly, and with binding energy estimation to rank candidates
  • Experimental follow-up: Even with 63.5% hit rates, experimental validation remains essential. High-throughput binding assays (SPR, BLI, or yeast display) are the standard next step

What This Means for the Field

Proteina-Complexa joins a rapidly growing ecosystem of open generative models for protein design. The trajectory is clear: just as language models moved from closed to open, protein design models are following the same path. Open models from NVIDIA, the Baker lab (RFdiffusion, ProteinMPNN), and others are making computational protein design accessible to any lab with basic GPU infrastructure.

The carbohydrate binder breakthrough is particularly exciting because it opens a new target class. Carbohydrates on pathogen surfaces, tumor-associated glycans, and glycosylation sites are biologically important but have been largely out of reach for computational design. If Proteina-Complexa's results replicate broadly, this could unlock an entirely new category of designed therapeutics.

SciRouter Integration Outlook

We are actively evaluating Proteina-Complexa for integration into the SciRouter platform. The model's open license and single-GPU inference requirements make it a strong candidate for cloud API deployment. Our goal is to let you submit a target structure and receive designed binder candidates through the same API you already use for ProteinMPNN sequence design and ESMFold structure prediction.

In the meantime, you can run Proteina-Complexa locally using the code and weights from GitHub and HuggingFace. For an end-to-end binder design workflow today, combine SciRouter's existing tools: use ProteinMPNN for sequence design, ESMFold for structure validation, and the SciRouter dashboard to orchestrate multi-step design campaigns.

Frequently Asked Questions

What is Proteina-Complexa?

Proteina-Complexa is a generative AI model from NVIDIA for designing protein binders. It uses flow-matching to generate fully atomistic protein structures that bind to a given target protein, small molecule, or carbohydrate. Released in March 2026, it achieved 63.5% experimental hit rates and produced the first-ever de novo carbohydrate binders. It is open source under the NVIDIA Open Model License.

How does NVIDIA's protein design model work?

Proteina-Complexa uses a flow-matching generative framework that operates on full atomic coordinates rather than just backbone traces. Given a target structure and a binding site specification, the model generates binder proteins by iteratively denoising random coordinates into physically plausible protein structures optimized for target affinity. This all-atom approach captures side-chain packing and hydrogen bonding networks that backbone-only methods miss.

What is AI protein binder design?

AI protein binder design is the computational generation of new proteins that bind tightly and specifically to a chosen target. Applications include therapeutic antibodies, enzyme inhibitors, biosensors, and targeted drug delivery. Modern approaches like Proteina-Complexa, RFdiffusion, and ProteinMPNN use deep learning to design binders that can be synthesized and tested in the lab, drastically reducing the time from concept to candidate.

How does Proteina-Complexa compare to RFdiffusion?

Both are generative models for protein design, but they differ in architecture and scope. RFdiffusion uses a denoising diffusion framework operating on backbone frames, while Proteina-Complexa uses flow-matching on full atomic coordinates. Proteina-Complexa natively handles protein, small-molecule, and carbohydrate targets in a single model, whereas RFdiffusion focuses on protein-protein interactions. Proteina-Complexa reports 63.5% experimental hit rates on tested targets, which is competitive with or exceeding published RFdiffusion benchmarks.

Can I use Proteina-Complexa commercially?

Yes. Proteina-Complexa is released under the NVIDIA Open Model License, which is an Apache-style license that permits commercial use, modification, and redistribution. The code is available on GitHub (NVIDIA-Digital-Bio/proteina-complexa) and the trained weights are hosted on HuggingFace. You can run it locally on an A100 GPU or wait for cloud API integrations.

Try It Free

No Login Required

Try this yourself

500 free credits. No credit card required.