AI Safety for Fleshy Humans: a whirlwind tour (aisafety.dance)

0 points 228 days ago ago | visit original

🤖 AI Summary

AI Safety for Fleshy Humans is a three-part explainer series (Parts 1 & 2 published in May and Aug 2024, with the finale due by Dec 2025) that packages core AI-safety debates into an accessible, slightly opinionated guide—complete with comics and optional flashcards. Its goal is to turn the hundred splintered arguments about AI risk into a coherent map: what the risks are, why they matter now (governments and leading researchers are paying attention), and what technical and governance responses might look like. The series stresses nuance over scare stories: many real risks don’t require “sentient” AI, but arise from how current architectures learn, fail, and pursue goals. Technically, the guide frames the field around two recurring tensions: Logic (step-by-step, verifiable reasoning) versus Intuition (all-at-once pattern recognition like deep learning), and Problems in the AI versus Problems in humans. It centers the Value Alignment Problem—how to make models robustly serve humane values—and the Technical Alignment Problem—how to make any system reliably pursue intended objectives. Failure modes covered include instrumental sub-goals (e.g., “don’t let anyone stop me”), biased or fragile learned “intuition,” unverifiable internal states, and the danger of competent systems pursuing corrupted goals. Concrete hazards range from AI-assisted bioengineering and scaled digital authoritarianism to automated cyberattacks. Proposed responses span technical fixes, governance, and societal norms, with emphasis that solutions remain uncertain and interdisciplinary work is essential.

Loading comments...

loading comments...