Whisper Leak: A novel side-channel attack on remote language models (www.microsoft.com)

🤖 AI Summary
Microsoft disclosed "Whisper Leak," a new side‑channel attack that can infer the topic of conversations with remote, streaming language models by observing encrypted network traffic (packet sizes and timings) despite TLS. The attack targets the autoregressive, token‑by‑token streaming behavior of modern chatbots and poses real privacy risks for users in sensitive contexts—healthcare, legal, political dissent—because a network observer (ISP, local Wi‑Fi eavesdropper, or nation‑state) could reliably flag conversations about specific topics even when payloads are encrypted. Technically, researchers trained classifiers (LightGBM, Bi‑LSTM, and a token‑bucketed DistilBERT) on sequences of encrypted packet lengths and inter‑arrival times to distinguish a target topic (e.g., money‑laundering queries) from large background corpora. In controlled tests they achieved very high AUPRC (often >98%), and in a simulated 1-in-10,000 monitoring scenario could attain 100% precision while catching 5–50% of target conversations. Effectiveness improves with more training data and richer multi‑turn signals. Vendors responded: OpenAI, Microsoft Azure, Mistral and xAI deployed mitigations such as streaming obfuscation—injecting random-length filler into streams—to mask token lengths; Microsoft verified this reduces attack effectiveness to acceptable levels. Caveat: real‑world performance depends on traffic diversity, but the finding highlights a practical, evolving privacy threat and underscores the need for protocol- and service‑level defenses.
Loading comments...
loading comments...