To escalate, or not to escalate, that is the question (fin.ai)

🤖 AI Summary
Fin build: they replaced an LLM-based escalation router with a custom multi-task encoder that decides, in real time, whether to escalate a conversational support session (escalate now, offer escalation, or let the bot continue), predicts the escalation reason (8 classes), and cites which business-configured guideline(s) triggered the decision. This matters because escalation errors either overload human teams or leave customers stranded; the new system reduces latency, improves control over thresholds, and supplies explicit, auditable guideline citations. Technically, the model uses a single encoder backbone with three heads (softmax heads for the two classification tasks and a sigmoid multi-label head for guideline citation). Guideline spans are represented by mean-pooling the contextual token embeddings for the span and scored with a linear layer and sigmoid. Trained end-to-end on ~4M examples with combined cross-entropy and binary cross-entropy losses, the model yields ~97.4% escalation accuracy, ~97% reason accuracy and 98.7% citation AUC in evaluation, and in production handles ~90% of traffic with >98% accuracy; the remaining ~10% (long or complex cases) fall back to an LLM. A/B tests show significantly higher resolution rates (p<0.01), 0.5s lower detection latency and ~3% lower cost per resolution. Key takeaways: multi-task encoders can outperform LLM teachers for high-throughput routing, but robust fallbacks and production-grade validation are essential.
Loading comments...
loading comments...