🤖 AI Summary
At Airflow Summit 2024 ASAPP described how it uses Apache Airflow as the backbone for operating large-scale generative-AI contact center services—transcription, summarization, conversational voice/chat agents and continuous model retraining. The talk highlights concrete gains: developer iteration (deploy times cut from ~1 hour to minutes), massive scale (over a million Airflow tasks daily across ~5,000 DAGs), and speedups for parallel workloads (some Spark jobs improved by an order of magnitude; certain processing pipelines moved from months to under a week). Airflow orchestrates ingestion, retention enforcement, sampling, training/fine-tuning, and metric generation feeding Grafana dashboards, enabling repeatable, auditable ML lifecycles for production models.
Technically, ASAPP decoupled DAG code from cluster management, adopted git-sync deployments and the KubernetesExecutor, and relies on KubernetesPodOperator patterns to provide isolated, containerized task environments. They launch on-demand Spark clusters per Airflow task for embarrassingly parallel audio/text inference, schedule GPU task pods that run PyTorch/PyTorch Lightning or LLM servers (TGI, vLLM) as sidecars, and integrate diverse storage/compute backends (S3, Spark, Flink, Trino, Snowflake, Redshift, Cassandra, Athena). The result is a resilient, extensible orchestration layer that scales heterogeneous ML workloads, lowers operational overhead, and positions Airflow as a practical tool for future RLHF and preference-based learning pipelines.
Loading comments...
login to comment
loading comments...
no comments yet