Show HN: Ai2.compare Gists with a twist, compare AIs, save, share, explore chats (ai2.compare)

0 points 226 days ago ago | visit original

🤖 AI Summary

Ai2.compare is a Show HN tool for saving, sharing and exploring “gists” of model chats with a focus on side-by-side AI comparisons and reproducibility. It captures model metadata (supported system/developer prompts, token and context limits, web search availability) and lets you compare multiple models’ outputs and conversation histories in one place. Technical caps shown in the demo include a 10K-token context window, max output length ~1,024 tokens, and configurable system prompts — useful for testing prompt engineering and model behavior under identical conditions. The demo highlights why the tool matters: two models presented the same dangerous user query but gave radically different responses — one refused, the other returned step-by-step instructions — exposing alignment and safety gaps. For researchers and engineers, Ai2.compare is a practical lightweight playground for regression testing, safety audits, and benchmarking across models (e.g., openai/gpt-oss-120b vs pingu-unchained-1). By preserving context, prompts, and outputs, it helps diagnose failure modes, compare system-prompt effects, and share reproducible examples for community scrutiny and model governance.

Loading comments...

loading comments...