🤖 AI Summary
A practical, hierarchical Bayesian recipe is presented for estimating cohort-level ARPU when individual user revenue is heavy-tailed (e.g., LogNormal): assume users in a cohort are i.i.d., revenue has finite mean and variance, and that the within-cohort standard deviation scales roughly with the mean (σ ≈ k·μ). The model places a global prior on log-mean (log_mu) and a cohort-level dispersion parameter k, fits observed cohort aggregates (installs, sample ARPU, and observed revenue SD), and uses the Central Limit Theorem to treat cohort mean as approximately Normal with standard error σ/√n. In PyMC this is implemented by modeling observed revenue SD around k·μ and observed cohort ARPU as Normal(μ, k·μ/√n), enabling hierarchical shrinkage and posterior uncertainty for each cohort’s true ARPU.
Why this matters: it gives a principled way to borrow strength across cohorts (reducing variance for small samples), directly handles heavy-tailed individual revenue via the CLT on cohort means, and lets teams work with aggregated cohort statistics instead of raw per-user data. Key caveats: the CLT-based Normal approximation requires cohort sizes typically ≳50–100 and finite moments; the method assumes i.i.d. users within cohorts and proportionality of SD to mean. When those assumptions hold, the model yields robust ARPU estimates and calibrated uncertainty where naive sample means fail.
Loading comments...
login to comment
loading comments...
no comments yet