LightGBM Explained (yanisfalaki.com)

0 points 13 hours ago ago | visit original

🤖 AI Summary

A practitioner implemented LightGBM from scratch and wrote a technical explainer to demystify why the library is so effective on tabular data. Starting from standard gradient-boosted decision trees (GBDT)—which fit trees to negative gradients (residuals) of a loss—the post walks through the costly part of tree learning: evaluating split points. It contrasts the presorted split method (O(#data × #features)) with LightGBM’s histogram-based approach, which quantile-bins continuous features into histograms (histogram build O(#data × #features), split search O(#bins × #features)) so #bins ≪ #data and building dominates runtime. The writeup also makes explicit the split “gain” used in GBDT: a normalized sum-of-gradients-squared over left/right children that penalizes mixed-sign residuals and favors pure bins. Crucially, the post explains LightGBM’s two scalability tricks. Gradient-based One-Side Sampling (GOSS) reduces #data by keeping the top a% of largest-gradient instances and randomly sampling b% from the rest, then multiplies the small-gradient samples by (1−a)/b when computing gain to correct distributional bias. Exclusive Feature Bundling (EFB) (introduced) tackles high-dimensional sparsity by bundling mutually exclusive sparse features to reduce #features. Together these techniques preserve accuracy while drastically cutting computation and memory—which explains LightGBM’s dominance on large, sparse real-world/tabular tasks and Kaggle leaderboards.

Loading comments...

loading comments...