Sizing the docking search box: the parameter people get wrong
How the grid box (search space) size and placement quietly determine docking accuracy, why too big and too small both fail, and a defensible rule of thumb.
Almost every docking failure people blame on the scoring function actually starts one step earlier, with the search box. The grid box defines the region of space the docking engine is allowed to explore, and it is one of the few parameters a user sets by hand. Get it wrong and you can turn a perfectly good scoring function into a random number generator, or quietly bias every result toward the same wrong pose. It is worth understanding what the box actually controls.
What the box is
In AutoDock Vina and most descendants, the search space is a rectangular box defined by a center (x, y, z) and three side lengths. The docking engine samples ligand poses only inside that box. Anything outside it is invisible to the search. The box is not a constraint on where atoms can clash; it is a constraint on where the conformational search is allowed to look. That distinction is the source of most of the trouble.
Why too big hurts
The intuitive instinct is to make the box large so you "do not miss anything." This backfires for two reasons. First, the search problem grows with the volume: Vina has a fixed sampling budget (the exhaustiveness setting), so a larger box means fewer effective samples per unit volume and a higher chance the search never finds the true pose even when it is physically reachable. Second, a large box invites the engine to place the ligand in irrelevant surface pockets that score deceptively well, polluting your ranking. In virtual screening this is especially damaging, because the false positives are not random; they are systematically the compounds that happen to fit some off-target groove.
Why too small hurts
Make the box too tight and you clip the accessible pose space. A box drawn snugly around the co-crystallized ligand pre-supposes the answer: any candidate that needs to sit slightly differently, extend into an adjacent subpocket, or adopt an induced-fit geometry gets truncated at the wall. You will still get poses and scores, but they are conditioned on a binding mode you assumed rather than discovered. For scaffold hopping or fragment growing, an over-tight box is a quiet way to miss the very chemistry you were screening for.
A defensible rule of thumb
Feinstein and Brylinski studied this systematically across thousands of protein-ligand complexes and proposed scaling the box to the ligand rather than picking a fixed size. Their result: pose-prediction accuracy peaks when the box side length is roughly 2.9 times the radius of gyration of the docking compound. The practical takeaway is that the right box size depends on the size of the molecule you are docking, not just the pocket, and that a box comfortably larger than the ligand but tightly centered on the known site beats both extremes.
- Center on the binding site you care about, using the co-crystallized ligand centroid or the pocket residues, not the protein center of mass.
- Size to comfortably contain the largest ligand in your set plus room to reorient, scaling with ligand size rather than reusing one box for every compound.
- Hold it constant across a screen so scores are comparable; changing the box between compounds makes the ranking meaningless.
When you do not know the pocket
If there is no known site, blind docking over the whole protein is tempting but weak, for exactly the too-big reasons above. The better move is a cavity-detection step first (geometry or energy based) to nominate candidate pockets, then a properly sized box on each. That converts an unfocused search into several focused ones, which is both more accurate and more interpretable. Treat blind docking as a hypothesis generator for where to look, not as a final answer.
Try the docking yourself
On Liganx the search box is handled for you: each catalog target ships with a curated, validated box centered on the relevant binding site, so you do not have to guess coordinates or rediscover the pocket. Open Studio and pick a target such as EGFR or KRAS, then dock your ligand and inspect the pose; the box is already tuned to the orthosteric site so your scores are comparable across runs and across the mutant and wild-type receptors.
Liganx is molecular docking online: free, browser-based, and set up so the parameters that quietly sink most docking runs are taken care of. If you want to try molecular docking without hand-tuning a grid box, that is the fastest path to a sensible first result.
Primary sources
- Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31, 455–461 (2010). doi:10.1002/jcc.21334
- Feinstein WP, Brylinski M. Calculating an optimal box size for ligand docking and virtual screening against experimental and predicted binding pockets. J Cheminform 7, 18 (2015). doi:10.1186/s13321-015-0067-5
- Eberhardt J, Santos-Martins D, Tillack AF, Forli S. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. J Chem Inf Model 61, 3891–3898 (2021). doi:10.1021/acs.jcim.1c00203