Data Science Weekly – Issue 622 (datascienceweekly.substack.com)

🤖 AI Summary
Data Science Weekly Issue #622 (Oct 23, 2025) is a compact, practitioner-focused roundup that spotlights practical guides, tooling updates, and methodological deep dives across ML, data engineering, and statistics. Editor’s picks range from a playful Bayesian exercise predicting a song’s release date from names, to a beginner-to-intermediate causal-inference book (covering potential outcomes, causal graphs, and later CATE/personalization), and a piece arguing pivot tables’ continuing relevance as a low-code REPL for business analytics. The newsletter also calls out reproducibility and pedagogy touches (the perils of blind use of random.seed(42)) and an interview on embedding research and LLM science. Technically useful items include a primer on instrumental-variable regression and its assumptions for causal claims; a production RAG post summarizing lessons from processing millions of documents (scaling, retrieval, and vector-store trade-offs); a comparison of Elasticsearch aggregation strategies (Sampler, Composite, Terms) for scalable analytics; spatial ML workflow comparisons across R frameworks (caret, tidymodels, mlr3); and a note on CRPS for evaluating probabilistic forecasts. Tooling news: Xeus‑Octave brings GNU Octave to JupyterLite via WebAssembly, enabling browser-side Matlab-compatible computation. Overall, the issue is a practical snapshot for teams balancing research, production robustness, and “doable” analytics techniques.
Loading comments...
loading comments...