What Are Complementarity Determining Regions?
Every antibody has a Y-shaped structure with two functional domains: the constant region that mediates immune effector functions, and the variable region that binds to a specific antigen. Within the variable region, six short loops – three on the heavy chain and three on the light chain – form the actual antigen-binding site. These loops are called Complementarity Determining Regions, or CDRs.
CDR-H1, CDR-H2, and CDR-H3 sit on the heavy chain variable domain (VH). CDR-L1, CDR-L2, and CDR-L3 sit on the light chain variable domain (VL). Of these, CDR-H3 is the longest, most diverse, and most critical for determining what the antibody binds to. It is also the hardest to model computationally because its conformations are less constrained by canonical structural rules than the other five CDRs.
When you engineer an antibody – whether for better binding affinity, altered specificity, or improved developability – you are almost always modifying CDR sequences. The framework regions between CDRs provide structural scaffolding and are typically left unchanged. This is what makes CDR design the central task of antibody engineering.
Why Structure-Aware CDR Design Matters
Traditional antibody optimization relies on random mutagenesis and screening: you create thousands of CDR variants in the lab, express them, and test which ones bind. This works, but it is slow and expensive. Computational approaches can narrow the search space by predicting which CDR sequences are likely to fold into the desired structure and maintain binding.
The key insight behind structure-aware design is that CDR sequences and CDR structures are tightly coupled. A CDR loop must fold into a specific 3D conformation to present the right residues at the right positions for antigen contact. If you change the sequence in a way that disrupts the loop conformation, binding will be lost regardless of what residues you introduce.
This is where AntiFold excels. Unlike sequence-only models that treat CDR design as a text generation problem, AntiFold is an inverse folding model: it takes a 3D antibody structure as input and predicts amino acid sequences that are likely to fold into that structure. This structural grounding means AntiFold's designs respect the geometric constraints of the binding site, producing sequences that are far more likely to fold correctly and maintain function.
How AntiFold Works
AntiFold is built on the inverse folding paradigm – given backbone coordinates of a protein structure, predict the amino acid sequence that would fold into those coordinates. The model architecture uses a graph neural network that encodes the 3D structure of the antibody as a graph, where nodes represent residues and edges encode spatial relationships.
Training on Antibody-Specific Data
AntiFold is trained exclusively on antibody structures from the Structural Antibody Database (SAbDab), which contains thousands of experimentally determined antibody crystal structures. This antibody-specific training gives AntiFold several advantages over general-purpose inverse folding models like ProteinMPNN:
- Canonical CDR classes: AntiFold learns the discrete structural classes that CDR-H1, CDR-H2, CDR-L1, CDR-L2, and CDR-L3 adopt. It generates sequences compatible with the canonical form of each loop.
- CDR-H3 diversity: CDR-H3 does not follow canonical rules, so AntiFold learns the broader distribution of H3 conformations from thousands of examples.
- VH/VL interface: The model encodes how heavy and light chain variable domains pack together, ensuring designs maintain proper chain pairing.
- Framework compatibility: Designed CDR sequences are conditioned on the surrounding framework residues, maintaining structural compatibility at the CDR-framework boundaries.
The Design Process
When you call AntiFold through SciRouter, the following happens:
- The input PDB structure is parsed and the antibody chains are identified
- CDR regions are located using antibody numbering (IMGT or Chothia scheme)
- The 3D graph representation is built from backbone atom coordinates
- The model autoregressively generates amino acid probabilities at each CDR position, conditioned on the structure and any fixed framework residues
- Multiple sequences are sampled at the specified temperature
- Each design is scored by its log-likelihood under the model
Getting Started: Prerequisites
You need Python 3.8+ and a SciRouter API key. Sign up at scirouter.ai/register for 500 free credits per month – enough for dozens of design runs.
pip install scirouterexport SCIROUTER_API_KEY="sk-sci-your-api-key-here"Hands-On: CDR Design with SciRouter
Step 1: Fold the Starting Antibody
AntiFold requires a 3D structure as input. If you already have a crystal structure PDB file, you can use it directly. If you only have sequences, fold them first with ImmuneBuilder. Here we start with trastuzumab (Herceptin), a well-characterized anti-HER2 antibody:
from scirouter import SciRouter
client = SciRouter()
# Trastuzumab variable region sequences
heavy_chain = (
"EVQLVESGGGLVQPGGSLRLSCAASGFNIKDTYIHWVRQAPGKGLEWVARIYPTNGYTRYADSVKG"
"RFTISADTSKNTAYLQMNSLRAEDTAVYYCSRWGGDGFYAMDYWGQGTLVTVSS"
)
light_chain = (
"DIQMTQSPSSLSASVGDRVTITCRASQDVNTAVAWYQQKPGKAPKLLIYSASFLYSGVPSRFSGSR"
"SGTDFTLTISSLQPEDFATYYCQQHYTTPPTFGQGTKVEIK"
)
# Predict the 3D structure
structure = client.antibodies.fold(
heavy_chain=heavy_chain,
light_chain=light_chain,
)
print(f"Structure predicted. Mean pLDDT: {structure.mean_plddt:.1f}")
print(f"PDB size: {len(structure.pdb)} bytes")Step 2: Design CDR-H3 Variants
Start by designing the most impactful region – CDR-H3. This loop contributes the most to antigen binding specificity and is where mutations have the highest chance of changing binding behavior:
# Design new CDR-H3 sequences
designs = client.antibodies.design(
pdb=structure.pdb,
num_sequences=10,
regions=["CDR-H3"],
temperature=0.2,
)
print(f"Generated {len(designs.sequences)} CDR-H3 variants:\n")
for i, seq in enumerate(designs.sequences):
print(f"Variant {i+1}:")
print(f" CDR-H3: {seq.cdr_h3}")
print(f" Recovery: {seq.sequence_recovery:.1%}")
print(f" Log-LL: {seq.log_likelihood:.2f}")
print()Step 3: Expand to Multiple CDR Regions
Once you have validated single-region designs, you can design multiple CDRs simultaneously for broader optimization of the binding interface:
# Design all three heavy chain CDRs
multi_designs = client.antibodies.design(
pdb=structure.pdb,
num_sequences=20,
regions=["CDR-H1", "CDR-H2", "CDR-H3"],
temperature=0.15, # lower temperature for multi-region stability
)
print(f"Generated {len(multi_designs.sequences)} multi-CDR variants:\n")
for i, seq in enumerate(multi_designs.sequences[:5]):
print(f"Variant {i+1}:")
print(f" H1: {seq.cdr_h1}")
print(f" H2: {seq.cdr_h2}")
print(f" H3: {seq.cdr_h3}")
print(f" Log-LL: {seq.log_likelihood:.2f}")
print()Interpreting AntiFold Results
AntiFold returns several metrics for each designed sequence. Understanding these metrics is critical for selecting candidates worth testing experimentally.
Log-Likelihood
The log-likelihood score reflects the model's confidence that the designed sequence will fold into the target structure. Higher (less negative) values indicate better structural compatibility. Compare log-likelihoods across designs to rank candidates, but note that the absolute values depend on the scaffold and are not directly comparable across different antibodies.
Sequence Recovery
Sequence recovery is the fraction of designed positions that match the original (wild-type) sequence. A recovery of 0.8 means 80% of CDR residues are unchanged. High recovery (above 0.7) indicates conservative designs that maintain the original binding mode. Low recovery (below 0.4) suggests more radical redesigns that may adopt different binding mechanisms.
Per-Position Probabilities
AntiFold also provides per-position amino acid probability distributions. Positions with high entropy (many amino acids with similar probabilities) are tolerant of mutation, while positions with low entropy (one dominant amino acid) are structurally constrained and should be kept unchanged:
# Examine position-level details for the top design
top_design = max(designs.sequences, key=lambda s: s.log_likelihood)
print(f"Top design CDR-H3: {top_design.cdr_h3}")
print(f"Log-likelihood: {top_design.log_likelihood:.2f}")
print(f"Sequence recovery: {top_design.sequence_recovery:.1%}")
# Identify mutable positions (high entropy)
if hasattr(top_design, "position_entropies"):
for pos, entropy in enumerate(top_design.position_entropies):
marker = "<-- mutable" if entropy > 1.5 else ""
print(f" Position {pos}: entropy={entropy:.2f} {marker}")Iterating on Designs: The Fold-Design-Validate Loop
The most powerful workflow with AntiFold is an iterative loop: design CDRs, fold the new sequences to validate they form good structures, then optionally feed the best structures back into AntiFold for further refinement. This mimics computational directed evolution.
import json
results = []
# Round 1: Design CDR-H3 variants
print("=== Round 1: Initial design ===")
designs = client.antibodies.design(
pdb=structure.pdb,
num_sequences=10,
regions=["CDR-H3"],
temperature=0.2,
)
# Validate each design by re-folding
ranked = sorted(designs.sequences, key=lambda s: s.log_likelihood, reverse=True)
for i, design in enumerate(ranked[:5]):
validation = client.antibodies.fold(
heavy_chain=design.full_heavy_chain,
light_chain=light_chain,
)
passed = validation.mean_plddt >= 75
results.append({
"round": 1,
"variant": i + 1,
"cdr_h3": design.cdr_h3,
"log_likelihood": design.log_likelihood,
"plddt": validation.mean_plddt,
"passed": passed,
})
status = "PASS" if passed else "FAIL"
print(f" Variant {i+1}: pLDDT={validation.mean_plddt:.1f} [{status}]")
# Round 2: Refine the best passing design
if passed and i == 0:
print("\n=== Round 2: Refine best variant ===")
refined = client.antibodies.design(
pdb=validation.pdb,
num_sequences=10,
regions=["CDR-H3"],
temperature=0.15, # tighter sampling for refinement
)
for j, ref_design in enumerate(refined.sequences[:3]):
ref_val = client.antibodies.fold(
heavy_chain=ref_design.full_heavy_chain,
light_chain=light_chain,
)
ref_passed = ref_val.mean_plddt >= 75
results.append({
"round": 2,
"variant": j + 1,
"cdr_h3": ref_design.cdr_h3,
"log_likelihood": ref_design.log_likelihood,
"plddt": ref_val.mean_plddt,
"passed": ref_passed,
})
status = "PASS" if ref_passed else "FAIL"
print(f" Refined {j+1}: pLDDT={ref_val.mean_plddt:.1f} [{status}]")
# Save results
with open("cdr_design_results.json", "w") as f:
json.dump(results, f, indent=2)
print(f"\nTotal candidates: {len(results)}, Passed: {sum(1 for r in results if r['passed'])}")Controlling Design Diversity with Temperature
The temperature parameter is your primary control over how different the designed CDRs are from the original sequence. Choosing the right temperature depends on your goals:
- Temperature 0.1: Very conservative. Designs differ by 1 to 2 mutations from the wild-type. Best for fine-tuning an already-good binder.
- Temperature 0.2: Moderate. Designs differ by 2 to 5 mutations. Good default for affinity maturation.
- Temperature 0.3: Exploratory. Designs may have 30 to 50% new residues. Useful for generating diverse libraries.
- Temperature 0.5: Aggressive. Significant sequence divergence. Use this when you want to explore entirely new binding modes.
# Generate designs at different temperatures
for temp in [0.1, 0.2, 0.3, 0.5]:
designs = client.antibodies.design(
pdb=structure.pdb,
num_sequences=5,
regions=["CDR-H3"],
temperature=temp,
)
avg_recovery = sum(s.sequence_recovery for s in designs.sequences) / len(designs.sequences)
avg_ll = sum(s.log_likelihood for s in designs.sequences) / len(designs.sequences)
print(f"Temperature {temp}: avg recovery={avg_recovery:.1%}, avg log-LL={avg_ll:.2f}")Targeting Specific CDR Positions
Sometimes you know which positions in a CDR are critical for binding (from alanine scanning or structural analysis) and want to keep them fixed while redesigning the rest. AntiFold supports this through position masking:
# Design CDR-H3 but keep positions 100 and 100a fixed (IMGT numbering)
# These are the key contact residues from crystal structure analysis
designs = client.antibodies.design(
pdb=structure.pdb,
num_sequences=10,
regions=["CDR-H3"],
fixed_positions=["H100", "H100a"], # IMGT numbering
temperature=0.25,
)
for i, seq in enumerate(designs.sequences[:5]):
print(f"Variant {i+1}: CDR-H3={seq.cdr_h3} (LL={seq.log_likelihood:.2f})")Combining AntiFold with Other SciRouter Tools
CDR design is most powerful when combined with other tools in a multi-step pipeline. Here are three common workflows:
Design + Structure Validation
Use AntiFold for CDR design, then ImmuneBuilder to validate that the designed sequences fold correctly. This is the workflow shown in the examples above.
Design + Docking
After designing CDR variants and validating their structures, dock them against the target antigen using DiffDock or Boltz-2 to predict binding affinity. This adds a binding-quality filter to your structural designs.
End-to-End Antibody Discovery
SciRouter's Antibody Design Studio chains all of these steps into a single pipeline: fold the scaffold, design CDR variants, validate structures, and rank candidates by predicted binding quality.
Best Practices for CDR Design
- Start with a good scaffold: Use an experimental crystal structure when available. Predicted structures work but introduce additional uncertainty in the backbone coordinates.
- Design CDR-H3 first: It contributes the most to specificity. Once you have good H3 variants, optionally expand to other CDRs.
- Always validate by re-folding: A high log-likelihood from AntiFold does not guarantee the sequence will fold well. Re-fold every candidate and check pLDDT scores.
- Generate more candidates than you need: Expect 30 to 50% of designs to fail structural validation. Generate 20 or more candidates to get 5 to 10 good ones.
- Check for liabilities: After selecting structurally valid designs, screen for sequence liabilities like N-glycosylation motifs (N-X-S/T), deamidation hotspots (NG, NS), and unpaired cysteines.
- Use multiple temperatures: Generate a diverse pool by sampling at different temperatures, then merge and rank by structural quality.
What Running AntiFold Locally Requires
For context, here is what you would need to run AntiFold on your own machine:
- PyTorch with CUDA support (NVIDIA GPU required)
- PyTorch Geometric for graph neural network operations
- ESM library for antibody language model features
- ANARCI for antibody numbering (requires HMMER installation)
- Custom trained model weights (~500 MB)
- Careful version pinning across all dependencies
- Setup time: 1 to 3 hours for an experienced engineer
SciRouter eliminates all of this. AntiFold runs on pre-deployed GPU instances and is accessible through two lines of Python.
Next Steps
You now have the tools to design antibody CDRs computationally. Use AntiFold for structure-aware CDR design and ImmuneBuilder for structure validation. For a fully automated pipeline from antigen to ranked antibody candidates, try the Antibody Design Studio.
To evaluate binding to a specific antigen, dock your designed antibodies with DiffDock or predict complex structures with Boltz-2. For nanobody-specific design, see our guide on nanobody engineering with AI.
Sign up at scirouter.ai/register for 500 free credits and start designing antibody CDRs today.