LLM Observability in the Wild – Why OpenTelemetry Should Be the Standard (signoz.io)

0 points 6 hours ago ago | visit original

🤖 AI Summary

In a recent live discussion with Pranav (co-founder of Chatwoot), real-world pain with LLM observability in production was exposed: Chatwoot’s multi-channel AI agent “Captain” sometimes replied incorrectly (even in Spanish) with no clear way to trace why. The core needs are straightforward but critical—know which documents were retrieved for RAG, which tool calls ran, and the exact inputs/outputs at each step—yet current tooling fragments that visibility. Several vendors were evaluated: OpenAI’s tracing is rich but tied to its agent framework and atomic traces; New Relic supports OpenTelemetry but makes debugging slow in the UI; Phoenix (OpenInference) has expressive AI-specific spans but lacks Ruby SDKs and doesn’t fully honor OpenTelemetry conventions, rendering OTel-formatted spans “unknown.” The broader implication: two competing standards are emerging. OpenTelemetry (OTel) is ubiquitous, production-ready, and has SDKs across languages but only basic span kinds (internal, server, client, producer, consumer). OpenInference provides AI-native span types (LLM, tool, chain, embedding, agent) that make querying and filtering easy but is newer, less widely supported, and not truly OTel-compatible in practice. The pragmatic advice: pick one telemetry backbone (prefer OTel if it’s already your stack), augment spans with richer attributes until OTel GenAI semantics mature, and contribute use cases to the OTel GenAI working group. SigNoz is investing in OTel-native LLM observability—dashboards, LangChain/LlamaIndex guidance, and semantic-convention-aligned defaults—to avoid fragmenting monitoring and keep LLMs visible alongside the rest of your system.

Loading comments...

loading comments...