No Prior, No Leakage: Reconstruction Attacks in Trained Neural Networks (arxiv.org)

🤖 AI Summary
This paper revisits reconstruction attacks that try to recover training examples from trained neural network parameters and shows both theoretically and empirically that such attacks are fundamentally limited unless they incorporate prior knowledge about the data. The authors prove that, without priors, there are infinitely many alternative input-label sets consistent with the model that can lie arbitrarily far from the true training set, so parameter-only reconstructions are non-unique and unreliable. Empirical experiments back this up: exact duplication of training examples in reconstructions appears to happen only by chance, not as a deterministic consequence of learning dynamics. The work refines previous concerns that implicit biases (e.g., margin maximization) make models inherently vulnerable; paradoxically, the paper shows that stronger implicit bias from more extensive training can make models harder to reconstruct. Practical implications are twofold: attackers must incorporate realistic data priors or auxiliary leakage to produce faithful reconstructions, and defenders may gain privacy benefits from training regimes that strengthen generalization bias. The results don’t rule out all leakage — attacks using side information, different model classes, or alternative assumptions may still succeed — but they provide a rigorous baseline showing when parameter-only reconstruction is intrinsically ambiguous and offer guidance for both evaluating privacy risk and designing mitigations.
Loading comments...
loading comments...