🤖 AI Summary
A recent blog post revisits the classic one-pass “naive” variance formula Var(X) = E[X^2] − (E[X])^2, explains why it’s attractive for streaming and single-pass workflows, and highlights a common numerical pitfall: catastrophic cancellation. In floating-point arithmetic, E[X^2] and (E[X])^2 can both be very large when the mean is large, so taking their difference to get a (possibly tiny) variance amplifies round-off error — even producing large positive or negative answers when the true variance is near zero. This is especially relevant in AI/ML pipelines that compute statistics online (FP32 or mixed-precision training, streaming feature normalization, telemetry aggregation), where incorrect variances can break downstream algorithms.
The post proposes a simple, low-cost fix: shift the data by a constant c (e.g., the first sample) and compute E[(X−c)^2] − (E[X−c])^2 in one pass. Because variance is invariant to additive shifts, this reduces the magnitude of accumulated values and greatly lowers cancellation risk while preserving single-pass performance. The method is easy to implement in streaming code and harmless when variance is large; for extreme precision needs the post’s analysis implies using more robust alternatives (Welford’s algorithm or pairwise aggregation) remains advisable.
Loading comments...
login to comment
loading comments...
no comments yet