Show HN: A 0.3B model that redacts PII in all 24 EU languages offline (huggingface.co)

0 points 49 days ago ago | visit original

🤖 AI Summary

A new model, "bardsai/eu-pii-anonimization-multilang," has been launched that effectively detects and redacts personally identifiable information (PII) across all 24 official EU languages, addressing crucial compliance requirements under the GDPR and AI Act. Unlike many existing open-source models that are primarily trained in English, this model is trained end-to-end on multilingual data and recognizes 36 entity classes, including biometric, genetic, and health data—categories that are vital for meeting regulatory expectations. The model is designed for production use, featuring ONNX export capabilities with quantized weights that enable it to operate efficiently on CPU infrastructure without the need for GPUs, making it accessible for a broader range of applications. This development is significant for the AI/ML community as it fills a crucial gap in privacy and data protection tools tailored for the EU's stringent legislative environment. Compliance and privacy engineers can leverage this model to sanitize datasets, redact sensitive information from various communication formats, and enhance data governance. With its versatile application in real-time redaction processes and dataset preparation for machine learning workflows, this model has the potential to streamline compliance efforts while maintaining the integrity and usability of data.

Loading comments...

loading comments...