Context-Aware Membership Inference Attacks Against Pre-Trained LLMs (arxiv.org)

0 points 2 hours ago ago | visit original

🤖 AI Summary

Researchers introduce a new membership inference attack (MIA) specifically designed for pre-trained large language models (LLMs), addressing why prior MIA approaches—built for classification—fail when applied to generative token-sequence models. Instead of treating an example as a single decision, the paper adapts classical statistical MIA tests to the temporal "perplexity dynamics" of subsequences within an input: they analyze how token-level log-probabilities (perplexities) evolve across sliding subsequences and use those dynamics as the signal for membership. This context-aware framing captures where and how an LLM memorizes fragments of training data, and the method reportedly outperforms previous baseline MIAs on pre-trained LLMs. The work is significant because it exposes a finer-grained, context-dependent leakage mode in generative models: memorization is not uniform across an example but concentrated in particular subsequences, making naive defenses or privacy audits ineffective. Technical implications include the need to evaluate membership risk at the token/subsequence level, rethink mitigation strategies (e.g., targeted regularization, sequence-level differential privacy, or memorization-aware filtering), and update data-governance practices for pre-training corpora. The attack also provides a new diagnostic tool for researchers to quantify how much and where an LLM has memorized training content.

Loading comments...

loading comments...