Hiding a message in my PyTorch weights (blog.gabornyeki.com)

🤖 AI Summary
Steganotorchy is a small tool that hides arbitrary messages inside neural network parameters by tweaking the low bits of floating‑point mantissas (e.g., IEEE‑754 32‑bit floats). It writes a compact header (message length encoded in ternary with a 11 terminator) followed by the message bytes into the lowest up to 8 bits per parameter, storing and extracting data from safetensors (and therefore any framework that reads tensor files). Command examples show embedding ~5,182 bytes and inspecting capacity; using the wrong bits-per-parameter setting simply yields garbage. Because small changes to low mantissa bits often don’t measurably degrade models, this offers a high‑capacity, stealthy carrier (1 KiB needs ~1,024 32‑bit parameters at 8 bits/param). This matters for ML security, provenance and forensics: weights can be abused for covert exfiltration, secret payload delivery, or undocumented metadata, and conversely used for watermarking or provenance labels. Detection is possible but nontrivial — embedding changes the distribution of zero bits in low mantissa positions (e.g., ASCII plaintext creates predictable zero-bit patterns), so statistical tests against an expected pre‑embedding distribution can flag anomalies. However, realistic counterfactuals are hard to define because training RNGs and optimization already produce non‑uniform bit distributions, so reliable detection, mitigation (quantization/pruning/noise injection), and policy responses deserve careful study.
Loading comments...
loading comments...