OpenFake: An Open Dataset and Platform Toward Large-Scale Deepfake Detection (arxiv.org)

0 points 10 hours ago ago | visit original

🤖 AI Summary

Researchers released OpenFake, a large-scale, politically-focused dataset and accompanying crowdsourced platform designed to push deepfake detection toward real-world, modern threats. The dataset pairs three million real images with descriptive captions and uses those captions to generate 963k high-quality synthetic images produced by a mix of proprietary and open-source generative models. Motivated by social-media analysis and a human perception study showing that recent proprietary generators produce images increasingly indistinguishable from real photos, the project explicitly targets multiple dissemination modalities (not just single-face images) to better reflect how synthetic content spreads online. Technically, OpenFake fills gaps in prior benchmarks by offering scale, caption-conditioned synthetic creation, and a continuously updated adversarial pipeline: contributors are incentivized to submit hard-to-detect fakes, keeping the benchmark aligned with evolving generative capabilities. For the AI/ML community this means richer training data and more realistic evaluation — exposing detectors to high-fidelity, multimodal attacks and enabling stress-testing against proprietary models — while also raising dual-use and privacy considerations that call for responsible access and governance. Overall, OpenFake aims to make detection methods more robust and adaptive to the current generation of generative models.

Loading comments...

loading comments...