"We Have No Idea How Models Will Behave in Production Until Production": ML Ops (arxiv.org)

0 points 5 hours ago ago | visit original

🤖 AI Summary

Researchers conducted semi-structured ethnographic interviews with 18 machine learning engineers (MLEs) across domains like chatbots, autonomous vehicles, and finance to probe how models are actually operationalized. They find MLEs run a practical, iterative workflow of (i) data preparation, (ii) experimentation, (iii) evaluation across multi-staged deployments, and (iv) continual monitoring and response — with real system behavior often only becoming visible once models run in production and ingest fresh data. Operational work requires hybrid data-science and engineering skills, extensive cross-functional collaboration with data scientists and product stakeholders, and a mix of ad hoc and institutional communication tools (from Slack to org-wide ticketing and reporting). The paper frames MLOps around three core virtues — velocity (rapid iteration), visibility (observability of models and data), and versioning (tracking models, data, and pipelines) — that engineers continually balance as programs mature. For the AI/ML community, these findings underscore that research and tooling should prioritize robust monitoring, reproducible versioning of data+models, and improved visibility across deployment stages. The study surfaces concrete design opportunities: better observability platforms, integrated version-control for datasets and models, and tooling that reduces handoffs and contextual loss between teams — all aimed at taming the unpredictability that only production reveals.

Loading comments...

loading comments...