Show HN: We built an AI judge for a live hackathon, then red-teamed it (basicscandal.github.io)

0 points 2 hours ago ago | visit original

🤖 AI Summary

A new AI judging system was showcased during a recent hackathon, providing real-time evaluations of team demos. This innovative AI judge utilized a multi-model ensemble approach, connecting to the Gemini Live API to stream audio and video, generating live observations as teams presented. Each demo was assessed by three different AI models—Gemini, Claude, and Groq—using techniques like outlier detection and Python-side arithmetic to prevent manipulation of the scoring system. The judge also featured a regex denylist and multi-language detection capabilities, allowing it to assess presentations in seven different languages. The significance of this development lies in its potential to enhance the fairness and accuracy of judging in competitive coding events. With its ability to deliver persona-driven reviews via Text-to-Speech (TTS) that are emotion-tagged for effective voice modulation, the AI judge presented scores and justifications in an engaging format. In a unique twist, it also performed a “red team” analysis post-event to identify vulnerabilities. This technological advancement not only paves the way for smarter evaluations in hackathons but also demonstrates the potential for AI systems to blend coaching and judging roles in tech competitions.

Loading comments...

loading comments...