Data-Purpose Algebra: Modeling Data Usage Policies [pdf] (dig.csail.mit.edu)

0 points 3 hours ago ago | visit original

🤖 AI Summary

MIT CSAIL researchers introduced a Data-Purpose Algebra: a formal, compositional framework for annotating data with provenance (agent/source), category, and authorized purposes and for computing how those usage restrictions change as data is processed, combined, or transferred. They use the algebra to model concrete legal constraints — notably parts of the U.S. Privacy Act (5 USC §552a) and Systems of Records Notices (SORNs) — and show how recipient-authorized purposes (Z = intersection of item purposes, recipient routine uses, and SORN constraints) can be derived. The approach flags violations (e.g., an item whose derived purpose set becomes empty) and propagates invalidity through subsequent inferences, enabling precise accountability audits. Technically, each data item is an annotated tuple (content, agent, category, purposes). Unary and multi-input processes produce new items via functions on content, category, and purposes; transfer rules compute applicable routine uses A(i,s,r) and resulting authorized purposes R(i,s,r). The team implemented the algebra in Scheme, plans a system that converts XML logs to RDF, annotates provenance and purposes, and records derivations in PML for explainable proofs. Practical implications: automated compliance checking, policy-aware data flow analysis across agencies or services, and clearer handling of anonymization, recombination, and legal provenance — all useful tools for privacy, ML pipeline governance, and accountable AI systems.

Loading comments...

loading comments...