Binding Affinity Prediction ($K_d$, $IC_{50}$) is the regression task of estimating the exact thermodynamic strength of the drug-target binding interaction — quantifying how tightly a drug molecule grips its protein target, measured by the dissociation constant $K_d$ (the concentration at which half the binding sites are occupied) or the inhibitory concentration $IC_{50}$ (the drug concentration needed to inhibit 50% of target activity), directly determining whether a candidate drug is potent enough for therapeutic use.
What Is Binding Affinity Prediction?
- Definition: Binding affinity quantifies the equilibrium between the bound drug-target complex $[DT]$ and the free components $[D] + [T]$: $K_d = frac{[D][T]}{[DT]}$. Lower $K_d$ means tighter binding — nanomolar ($nM$) affinity is typical for drug candidates, picomolar ($pM$) for exceptional binders. The Gibbs free energy relates to binding: $Delta G = RT ln K_d$, where tighter binding corresponds to more negative $Delta G$ (thermodynamically favorable).
- Prediction Approaches: (1) Physics-based scoring: AutoDock Vina, Glide, GOLD use force field calculations to estimate $Delta G$ from the 3D complex. Fast (~seconds/molecule) but inaccurate (typical $R^2 approx 0.3$). (2) ML scoring functions: OnionNet, PIGNet, PotentialNet train on experimental affinity data to predict $K_d$ from protein-ligand complex features. More accurate ($R^2 approx 0.5$–$0.7$) but require 3D complex structures. (3) Sequence-based: DeepDTA predicts affinity from drug SMILES + protein sequence without 3D structures. Least accurate but most scalable.
- PDBbind Benchmark: The standard dataset for binding affinity prediction — ~20,000 protein-ligand complexes with experimentally measured $K_d$ or $K_i$ values, curated from the Protein Data Bank. The refined set (~5,000 high-quality complexes) and core set (~300 diverse complexes) provide standardized train/test splits for benchmarking affinity prediction methods.
Why Binding Affinity Prediction Matters
- Drug Potency Determination: A drug candidate must bind its target with sufficient affinity to be therapeutically effective at safe doses. If $K_d$ is too high (weak binding), the drug requires dangerously high concentrations to achieve therapeutic effect. If $K_d$ is too low (extremely tight binding), the drug may be difficult to clear from the body, causing prolonged side effects. Predicting $K_d$ accurately enables the selection of candidates in the optimal affinity window.
- Lead Optimization: Medicinal chemistry iteratively modifies a lead compound to improve binding affinity — each structural modification has a predicted $DeltaDelta G$ contribution. Accurate affinity prediction enables computational triage of proposed modifications, focusing synthetic chemistry effort on the modifications most likely to improve potency rather than testing all possibilities experimentally.
- Selectivity Prediction: A drug must bind its intended target strongly while avoiding off-targets. Selectivity is the ratio of binding affinities: $ ext{Selectivity} = K_d^{ ext{off-target}} / K_d^{ ext{on-target}}$. Accurate multi-target affinity prediction enables the design of highly selective drugs that minimize side effects.
- Free Energy Perturbation (FEP): The gold standard for affinity prediction is alchemical free energy perturbation — rigorous thermodynamic calculations that "morph" one ligand into another to compute $DeltaDelta G$ differences. While highly accurate ($< 1$ kcal/mol error), FEP requires days of GPU computation per compound. ML models aim to match FEP accuracy at 1000× lower cost.
Binding Affinity Prediction Methods
| Method | Input | Accuracy ($R^2$) | Speed |
|--------|-------|-----------------|-------|
| AutoDock Vina | 3D complex | ~0.3 | Seconds/mol |
| RF-Score | 3D interaction fingerprint | ~0.5 | Milliseconds/mol |
| OnionNet-2 | 3D complex + rotation augmentation | ~0.6 | Milliseconds/mol |
| DeepDTA | SMILES + sequence (no 3D) | ~0.4 | Microseconds/mol |
| FEP+ | MD simulation | ~0.8 | Days/mol |
Binding Affinity Prediction is measuring the molecular grip — quantifying exactly how tightly a drug molecule clings to its protein target, the single most critical number that determines whether a candidate molecule has the potency required for therapeutic efficacy.