Relative binding free energy (FEP): when docking isn't enough
What free energy perturbation actually computes, why it reaches ~1 kcal/mol on congeneric series, and where it fits between fast docking triage and wet-lab assays.
Docking ranks compounds quickly, but a docking score is an empirical approximation, not a binding free energy. When you are deep in lead optimization and trying to decide whether adding a methyl group will buy you tenfold potency or cost you it, you need a method that estimates ΔG with real statistical mechanics behind it. That method is free energy perturbation — FEP — and over the past decade it has gone from a specialist’s tool to something that routinely guides which analog gets made next.
What FEP actually computes
FEP does not try to compute the absolute binding free energy of one ligand from scratch. Instead it computes the difference in binding free energy between two closely related ligands — say, ligand A and ligand B that differ by a single substituent. The trick, which goes back to Zwanzig’s 1954 free-energy perturbation theory, is a thermodynamic cycle: rather than physically pulling each ligand out of the pocket (expensive and poorly converged), you computationally morph ligand A into ligand B in two environments — once free in solvent, once bound in the protein — and take the difference.
Because ΔG is a state function, the two “alchemical” legs of that cycle give you ΔΔGbind = ΔG(B) − ΔG(A) directly. You never have to simulate the physical binding event. You only have to sample the two end states well enough, usually through a series of intermediate lambda windows bridging A and B, with molecular dynamics doing the sampling and explicit water in the box. The output is a number in kcal/mol with an error bar, not a unitless score.
How accurate is it, really?
The landmark prospective benchmark was Wang et al. (JACS 2015), which ran a modern FEP protocol across eight drug-discovery targets and ~200 ligands and reported root-mean-square errors near 1 kcal/mol on congeneric series. One kcal/mol is roughly a factor of five in binding affinity at body temperature — accurate enough to triage which analog to synthesize, and the dataset behind that paper (the “JACS set”) became a standard for validating new methods.
That accuracy is not free or universal. It depends on a good starting pose, a well-behaved congeneric series (FEP is far more reliable for small R-group changes than for scaffold hops), adequate sampling of slow protein motions, and force-field quality. Independent assessments (e.g. Communications Chemistry 2023) put the practical ceiling of current rigorous methods in the ~1 kcal/mol range — meaningfully better than docking for ranking close analogs, but not a substitute for the assay.
Where it sits relative to docking
Think of it as a funnel, not a competition:
- Docking — milliseconds to seconds per pose. Screens thousands to millions of compounds, generates poses, and ranks them coarsely. The right tool for “which 50 of these 50,000 are worth a closer look,” and for getting the bound pose that FEP needs as a starting point.
- FEP — hours of GPU time per ligand pair. Applied to the handful of analogs that survived triage, to rank them by predicted potency before anyone runs a synthesis. The right tool for “of these eight analogs, which three should the chemist make first.”
- The assay — the ground truth FEP is calibrated against, and the only thing that closes the loop.
The two methods share a deep dependency: FEP’s answer is only as good as the pose it starts from. A docking workflow that produces a physically sensible, well-validated pose is the foundation a downstream free-energy calculation is built on, which is why pose validation and interaction-fingerprint sanity checks matter even when the eventual goal is a kcal/mol number.
The practical takeaway for a docking program
If you are comparing wildly different chemotypes, lean on docking and good judgment; FEP between non-congeneric ligands is fragile. If you are optimizing within a series — adding a halogen, swapping a ring, extending into a subpocket — that is exactly where relative free energy methods earn their keep, and where a 1 kcal/mol prediction can save a synthesis cycle. The discipline is to use each method where its error bars are smallest.
Try the docking yourself
Open Studio and dock a small congeneric series against your target to generate the starting poses. Use the docking ranking to triage, then reserve the expensive free-energy step for the analogs that survive. Getting a clean, validated pose first is the part that makes everything downstream trustworthy.
Liganx is molecular docking online: free, browser-based, no install. Molecular docking is the fast front end of the funnel — it produces the poses and the coarse ranking that a relative binding free energy calculation then refines into kcal/mol.
Primary sources
- Wang L, et al. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. J. Am. Chem. Soc. 137, 2695-2703 (2015). doi:10.1021/ja512751q
- Zwanzig RW. High-temperature equation of state by a perturbation method. I. Nonpolar gases. J. Chem. Phys. 22, 1420-1426 (1954). doi:10.1063/1.1740409
- The maximal and current accuracy of rigorous protein-ligand binding free energy calculations. Communications Chemistry 6, 222 (2023). doi:10.1038/s42004-023-01019-9