The FSF considers large language models (lwn.net)

🤖 AI Summary
At the GNU Tools Cauldron session, the Free Software Foundation’s Licensing and Compliance Lab dug into how large language models (LLMs) intersect with free-software licensing. The FSF is surveying projects to understand current practices and isn’t moving to a GPLv4 yet; instead it may first adjust the Free Software Definition. The core concerns are whether LLM‑generated code can be copyrighted and thus legitimately covered by copyleft, whether model outputs infringe training-data authors, and how proprietary models and training pipelines—often non‑free—affect the ethics and legality of accepting LLM contributions. Technically, the FSF highlighted several practical and legal friction points: courts haven’t resolved copyrightability of purely machine‑generated code, but adding human creative effort or “creative prompts” might make outputs copyrightable. Training-data leakage and prompt‑driven “in the style of” requests risk reproducing copyrighted code; models trained only on permissively licensed material still fail to preserve copyright notices, and some model ToS even claim rights over outputs. Recommended project safeguards include requiring contributors to declare the LLM (and version), prompt and available training‑data info, mark LLM code clearly, and record any use restrictions—while balancing accessibility needs where assistive LLM use is necessary. Overall, the FSF concluded the community needs better provenance, contributor education, and metadata-driven workflows rather than immediate license rewrites.
Loading comments...
loading comments...