Why ΔΔ beats absolute docking scores
Docking scores are bad at predicting binding affinity but good at ranking. Here is why the wild-type-minus-mutant difference is the number you should trust.
A docking score looks like an affinity. It comes back as a number in kcal/mol, it has a minus sign, and a more negative one feels like a tighter binder. That intuition is the single most common way to misread a docking run. Scoring functions are mediocre at predicting absolute binding free energy and much better at ranking related things against each other. The practical consequence: the number worth trusting is rarely a single score — it is the differencebetween two scores computed the same way.
Why absolute scores are unreliable
A docking score is a fast approximation of binding free energy, and the approximations are aggressive. Most scoring functions treat the receptor as rigid or nearly so, model solvent only implicitly or not at all, and ignore or crudely estimate the entropic cost of freezing a flexible ligand into one pose. Each of those shortcuts introduces error, and the errors do not cancel cleanly across chemically different molecules.
The field has measured this directly. Warren et al. (2006) ran ten docking programs and 37 scoring functions across eight protein targets and found that while docking could often place a ligand in roughly the right pose, no scoring function reliably predicted binding affinity — the correlation between score and measured potency was weak and target-dependent. Twenty years of subsequent work has improved the details without overturning the headline: scoring functions are not affinity meters. Reading an absolute docking score as a predicted Kd is the mistake the literature has been warning about since before most of today’s tools existed.
What cancels when you take a difference
Here is the useful part. Many of those systematic errors are shared between two closely related calculations, so they subtract out when you compare. If you dock the same ligand against a wild-type receptor and a point-mutant of that receptor, the parts of the score that come from the ligand’s own internal energy, its desolvation, and the bulk of the unchanged pocket are nearly identical in both runs. What is left in the difference is dominated by the one thing that actually changed: the mutated residue and its local contacts.
That difference is a ΔΔ — a change in a change. It approximates how much the mutation shifts the binding free energy of that ligand, and it is far more robust than either absolute number, because the noise floor partly cancels. This is the same logic that makes rigorous relative binding free energy (RBFE) methods the gold standard for lead optimization: Wang et al. (2015) showed that modern free-energy perturbation, which computes the ΔΔG between two ligands rather than two absolute values, predicted relative potency within about 1 kcal/mol across thousands of compounds — accuracy that no absolute docking score approaches. Docking ΔΔ is the cheap, fast cousin of that idea: less rigorous, but built on the same cancellation principle.
The two comparisons worth making
- Wild-type vs mutant, one ligand. Does this drug lose grip on the resistance mutant? A large positive ΔΔ (the mutant scores worse) is the structural signature of resistance — it is what you see when you dock a first-generation inhibitor against a gatekeeper or solvent-front mutation.
- Two ligands, same receptor. Which of my two analogs binds the mutant better? Rank them by their scores against the same structure; trust the ordering far more than the magnitudes.
In both cases you are asking a relative question and answering it with a relative quantity. The moment you start quoting a single score as if it were a measured affinity, you have left the regime where the method is trustworthy.
Caveats that keep ΔΔ honest
Cancellation is not magic. It works best when the two calculations are as similar as possible: same protonation states, same docking protocol, same box, poses that occupy the same sub-pocket. If a mutation triggers a real backbone rearrangement rather than a simple side-chain swap, a rigid-receptor ΔΔ will miss it — the assumption that “only the mutated residue changed” breaks. And a ΔΔ near zero is genuinely ambiguous: it can mean no effect, or it can mean two larger errors that happened not to cancel. Treat small differences as noise and large, reproducible ones as signal.
Try the comparison yourself
This is exactly what Liganx is built around. Open Studio and pick any target from the catalog along with a clinically important mutation — EGFR with T790M, BCR-ABL with T315I, ALK with G1202R. Dock the same ligand against the wild-type and mutant receptors in one run, and read the ΔΔ Liganx reports between them rather than fixating on either absolute score. A drug that holds its number against the mutant is one the mutation does not defeat; a drug that loses a kcal/mol or two is showing you the resistance mechanism directly.
Liganx is molecular docking online: a free, browser-based platform that runs the wild-type and mutant side by side so the ΔΔ is the first thing you see. Using molecular docking this way — as a difference engine rather than an affinity meter — is how you get reliable answers out of an unreliable score.
Primary sources
- Warren GL, et al. A Critical Assessment of Docking Programs and Scoring Functions. J Med Chem 49, 5912–5931 (2006). doi:10.1021/jm050362n
- Wang L, et al. Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field. J Am Chem Soc 137, 2695–2703 (2015). doi:10.1021/ja512751q
- Cournia Z, Allen B, Sherman W. Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations. J Chem Inf Model 57, 2911–2937 (2017). doi:10.1021/acs.jcim.7b00564