“We Have No Idea How Models will Behave in Production until Production”: ML Ops Study [2024] (arxiv.org)

0 points 5 hours ago ago | visit original

🤖 AI Summary

A semi‑structured ethnographic study interviewed 18 ML engineers across domains (chatbots, autonomous vehicles, finance) to map how models are actually operationalized in production. The researchers distilled a four‑stage, team‑centric MLOps workflow—data preparation, iterative experimentation, staged evaluation/deployment, and continual monitoring/response—and identified the “3Vs” engineers prioritize: velocity (fast iteration), visibility (clear data/metric lineage and alerts), and versioning (models, datasets, and code). Rather than a mostly automated handoff, MLEs reported heavy manual work: driving feature selection and labeling, choosing retraining cadences, gatekeeping multi‑stage rollouts, maintaining multiple evaluation datasets (including subpopulation checks), and resolving alert fatigue and “pipeline jungles.” Technically, the study underscores that production ML remains human‑in‑the‑loop: automated ingestion coexists with manual validation, experimentation resists blind AutoML due to vast search spaces, and deployments use fractional rollouts plus stakeholder sign‑offs. Practical implications include the need for tools that improve cross‑team visibility, robust dataset and model versioning, smarter alerting and root‑cause tracing, and ergonomics for continual evaluation. The paper reframes MLOps challenges as socio‑technical—requiring tooling that supports collaboration and the 3Vs, not just algorithmic automation.

Loading comments...

loading comments...