Streaming SQL for Real-Time AI and Machine Learning (www.timeplus.com)

🤖 AI Summary
Timeplus announced Python user-defined functions (UDFs) for its streaming SQL engine, letting developers embed Python code — and thus standard ML libraries like scikit-learn, XGBoost, and PyCaret — directly inside SQL queries. This bridges SQL’s high-performance, declarative data retrieval with Python’s flexibility for complex transformations, model inference and feature engineering, enabling unified workflows such as batch model training plus real-time stream inference on one platform. Timeplus already offered JavaScript and remote UDFs, but Python UDFs stand out by supporting community ML libraries (JS UDFs can’t load external libs; remote UDFs are limited by network overhead and not ideal for heavy aggregation). Technically, Python UDFs operate on columnar batches: scalar UDFs receive arrays for per-row operations (example: add_five that iterates over an input column array), while aggregate UDFs are stateful classes with lifecycle hooks — process, finalize, serialize/deserialize and merge — so they work in windowed/grouped queries and distributed execution. The aggregation interface supports tumble/windowed GROUP BY queries, persistence across restarts via pickling, and merging of parallel states. Overall, this makes it straightforward to implement real-time feature pipelines, streaming aggregations, and in-query model inference with familiar Python ML tooling, while preserving SQL performance and scale.
Loading comments...
loading comments...