PKBoost: Gradient boosting that adjusts to concept drift in imbalanced data (github.com)

0 points 5 days ago ago | visit original

🤖 AI Summary

PKBoost is a new Rust‑built gradient‑boosting library designed for extreme class imbalance and drifting data distributions. The author demonstrates strong results on credit‑card fraud (0.2% positive rate): PKBoost achieves 87.8% PR‑AUC and sustains under covariate shift with only ~1.8% degradation, whereas XGBoost and LightGBM suffer 31.8% and 42.5% drops respectively. Key practical features include automatic class weighting, PR‑AUC early stopping, auto‑tuning (auto_tune_principled), histogram‑based trees with quantile binning/median imputation, Rayon‑based parallelism, and built‑in metrics—all focused on binary classification for streaming fraud, medical monitoring, and anomaly detection. Technically, PKBoost fuses information theory with Newton boosting: splits maximize Gain = GradientGain + λ * InformationGain, where λ adapts to imbalance severity, and Shannon entropy plus mutual‑information regularization help prioritize rare positives. It also includes an experimental AdversarialLivingBooster that tracks “vulnerability” scores and can trigger an adaptive “metamorphosis” (feature pruning/retraining). Tradeoffs: training is slower than default XGBoost (≈45s vs 12s on 170k samples) but reduces or eliminates expensive hyperparameter tuning; limitations include binary‑only targets, no native categorical encoding, and smaller datasets (<1k) where variance is high. Code and full benchmarks are available on the project’s GitHub.

Loading comments...

loading comments...