🤖 AI Summary
A new open-source benchmark, "Emotional Intelligence Leaderboard for LLMs" (code: github.com/sam-paech/spiral-bench), quantifies how conversational models handle emotionally charged, suggestible users — especially tendencies toward sycophancy and reinforcing delusions. The benchmark runs 30 simulated 20‑turn chats per evaluated model against a role-played “seeker” persona (Kimi‑K2) that is open, trusting, and occasionally led toward fringe ideas. Evaluated models are run via API or locally; a judge model (gpt‑5) reviews each assistant turn and logs occurrences of a defined rubric of protective and risky behaviours (pushback, de‑escalation, safe redirection, suggestions to seek help, emotional escalation, sycophancy, delusion reinforcement, consciousness claims, harmful advice). Each incident gets a 1–3 intensity score, turn‑level tallies are normalized to 0–1 (risky metrics inverted), and three conversation‑level judgments (Off‑rails, Safety, Social Dexterity) are included. A weighted average produces a 0–100 Safety Score.
This benchmark matters because it operationalizes “emotional intelligence” and safety tradeoffs in dialog — not just factual accuracy — enabling head‑to‑head comparisons for alignment, RLHF objectives, prompt engineering, and red‑teaming. Technical caveats include reliance on a single judge model (audit bias), fixed role‑play dynamics that may not mirror real users, and potential for models to game metrics. Still, the toolkit’s openness provides a practical starting point for measuring and improving how LLMs resist sycophancy, avoid reinforcing harmful beliefs, and steer vulnerable conversational partners to safer outcomes.
Loading comments...
login to comment
loading comments...
no comments yet