What Is Proteina-Complexa?
Proteina-Complexa is a generative AI model from NVIDIA designed for one of the hardest problems in protein engineering: creating new proteins that bind tightly and specifically to a chosen target. Released in March 2026 and accepted as an oral presentation at ICLR 2026, it represents a significant step forward in computational protein design.
The model uses a flow-matching framework to generate fully atomistic protein binders — not just backbone traces, but complete structures with side chains, hydrogen bond networks, and binding interfaces. In experimental validation, Proteina-Complexa achieved 63.5% hit rates with affinities reaching into the picomolar range. Perhaps most remarkably, it produced the first-ever de novo designed carbohydrate binders, a target class that has historically resisted computational design.
Why Protein Binder Design Matters
Protein binders are the workhorses of modern therapeutics and biotechnology. Antibodies, nanobodies, designed ankyrin repeat proteins (DARPins), and de novo binder scaffolds all work by binding to specific molecular targets with high affinity. The ability to computationally design these binders from scratch — rather than screening billions of candidates in the lab — could compress drug discovery timelines from years to weeks.
The applications extend well beyond therapeutics:
- Therapeutic proteins: De novo binders as drug candidates targeting previously undruggable surfaces, including carbohydrate epitopes on pathogens
- Enzyme engineering: Designing allosteric regulators and enzyme inhibitors with programmable specificity
- Biosensors: Creating protein switches that change conformation upon binding, enabling real-time molecular detection
- Targeted delivery: Engineering proteins that bind cell-surface markers for precision drug delivery or CAR-T cell targeting
How Proteina-Complexa Works
Proteina-Complexa is built on a flow-matching generative framework, a class of generative models closely related to diffusion models but with straighter sampling trajectories that improve generation speed and quality. The key architectural decisions that distinguish it from prior work:
All-Atom Generation
Unlike backbone-only methods such as RFdiffusion, Proteina-Complexa generates full atomic coordinates for the binder protein, including all side-chain atoms. This means the model directly optimizes for side-chain packing at the binding interface, hydrogen bonding networks, and van der Waals complementarity. There is no separate rotamer packing step — the model produces a complete structure in one pass.
Multi-Target Support
A single Proteina-Complexa model handles three target types: protein surfaces, small molecules, and carbohydrates. This unified approach is notable because carbohydrate binding has been an open problem — the shallow, polar surfaces of sugar molecules make them extremely difficult targets for traditional design methods. The model learns representations across all three modalities during training, enabling transfer between target types.
Flow-Matching vs. Diffusion
Flow-matching defines a continuous path from noise to data using optimal transport, resulting in straighter trajectories compared to the curved paths of standard diffusion. In practice, this means Proteina-Complexa requires fewer denoising steps to produce high-quality structures, improving inference speed without sacrificing accuracy. The model conditions generation on the target structure and binding hotspot specification.
Experimental Validation
The headline numbers from the Proteina-Complexa paper are striking. On a diverse set of protein targets, the model achieved a 63.5% experimental hit rate — meaning nearly two-thirds of computationally designed binders showed measurable binding in lab assays. The best binders reached picomolar affinities (Kd in the low nanomolar to picomolar range), which is competitive with affinity-matured antibodies.
The carbohydrate binder results are particularly significant. De novo computational design of proteins that bind carbohydrates had not been demonstrated before. Proteina-Complexa generated binders for several carbohydrate targets, and multiple designs showed experimentally confirmed binding — a first in the field.
The model has been validated by external groups including Novo Nordisk and Manifold Bio, lending independent credibility to the published benchmarks.
Proteina-Complexa vs. ProteinMPNN vs. RFdiffusion
These three tools occupy different but complementary niches in the protein design ecosystem. Understanding the distinctions is important for choosing the right approach:
- Proteina-Complexa: Generates complete binder structures (backbone + side chains) from scratch using flow-matching. Handles protein, small molecule, and carbohydrate targets. Best for de novo binder design when you need a complete structure ready for experimental testing.
- ProteinMPNN: Inverse folding model — given a fixed backbone, it designs amino acid sequences that will fold into that shape. Does not generate new structures, but excels at optimizing sequences for existing backbones. Typically used after a backbone generator to produce designable sequences.
- RFdiffusion: Generates protein backbones using denoising diffusion, but produces backbone-only structures that need side-chain packing (usually via ProteinMPNN + Rosetta). Proven track record for protein-protein binder design but does not natively handle small molecules or carbohydrates.
In practice, many design campaigns use these tools together. RFdiffusion or Proteina-Complexa generates candidate backbones, ProteinMPNN optimizes sequences, and ESMFold validates that designed sequences fold into the intended structures. Proteina-Complexa's all-atom approach may reduce the need for the separate side-chain packing step, streamlining the pipeline.
Open Source and Accessibility
NVIDIA released Proteina-Complexa under the NVIDIA Open Model License, an Apache-style license that permits commercial use, modification, and redistribution. This is a deliberate choice to maximize adoption in both academic and industry settings. The full release includes:
- Code: GitHub repository at NVIDIA-Digital-Bio/proteina-complexa with training and inference scripts
- Weights: Pre-trained model weights on HuggingFace, ready for inference
- GPU requirements: Inference runs on a single A100 GPU, making it accessible to academic labs with standard compute allocations
- Documentation: Detailed tutorials for binder design workflows, including target preparation and output interpretation
The open-source release follows NVIDIA's broader strategy with the Proteina model family (which also includes Proteina for unconditional protein generation) of building open infrastructure for computational biology.
Practical Considerations
If you are considering using Proteina-Complexa in a design campaign, a few practical points:
- Target preparation: You need a 3D structure of the target (protein, ligand, or carbohydrate) and a specification of the desired binding site or hotspot residues
- Generation diversity: Like all generative models, running multiple independent generations and filtering candidates improves success rates. Plan to generate 100+ designs and filter computationally before experimental testing.
- Validation pipeline: Combine with structure prediction (ESMFold or AlphaFold) to check that designed sequences fold correctly, and with binding energy estimation to rank candidates
- Experimental follow-up: Even with 63.5% hit rates, experimental validation remains essential. High-throughput binding assays (SPR, BLI, or yeast display) are the standard next step
What This Means for the Field
Proteina-Complexa joins a rapidly growing ecosystem of open generative models for protein design. The trajectory is clear: just as language models moved from closed to open, protein design models are following the same path. Open models from NVIDIA, the Baker lab (RFdiffusion, ProteinMPNN), and others are making computational protein design accessible to any lab with basic GPU infrastructure.
The carbohydrate binder breakthrough is particularly exciting because it opens a new target class. Carbohydrates on pathogen surfaces, tumor-associated glycans, and glycosylation sites are biologically important but have been largely out of reach for computational design. If Proteina-Complexa's results replicate broadly, this could unlock an entirely new category of designed therapeutics.
SciRouter Integration Outlook
We are actively evaluating Proteina-Complexa for integration into the SciRouter platform. The model's open license and single-GPU inference requirements make it a strong candidate for cloud API deployment. Our goal is to let you submit a target structure and receive designed binder candidates through the same API you already use for ProteinMPNN sequence design and ESMFold structure prediction.
In the meantime, you can run Proteina-Complexa locally using the code and weights from GitHub and HuggingFace. For an end-to-end binder design workflow today, combine SciRouter's existing tools: use ProteinMPNN for sequence design, ESMFold for structure validation, and the SciRouter dashboard to orchestrate multi-step design campaigns.