Bayesian A/B testing is not immune to peeking (www.alexmolas.com)

0 points 246 days ago ago | visit original

🤖 AI Summary

RevenueCat ran simulations to test the common claim that Bayesian A/B tests are immune to “peeking” (stopping as soon as the posterior crosses a threshold). They simulated two arms with identical Bernoulli conversion rates (r ∈ {0.1%, 1%, 10%}), started from an uninformative Beta(1,1) prior, and updated the posterior after every N observations (N ∈ {10^2, 10^3, 10^4, 10^5, 10^6}). Using a stopping rule that declares a winner when P(B>A) > 0.95, they found false positive rates rose dramatically as peeking became more frequent — for example, checking every 100 observations produced ~80% false positives even though the arms were identical. The takeaway for the AI/ML community: Bayesian posteriors remain interpretable at any sample size (you can read P(B>A) meaningfully whenever you look), but treating a fixed posterior threshold as a frequentist test and stopping on success inflates Type I error just like optional stopping in classical tests. If you need to control frequentist error under continuous monitoring, you must use explicit sequential methods or pre-specified stopping rules (group-sequential designs, adjusted thresholds, Bayes-factors/decision-theoretic criteria, or other calibration), rather than relying on unconstrained peeking with a 0.95 posterior cutoff.

Loading comments...

loading comments...