🤖 AI Summary
A simple, practical pattern was proposed to eliminate multi-tenant data leakage when using LLMs to generate SQL: never expose raw tables or client_id columns to the model. Instead, the backend injects a security layer of server-side Common Table Expressions (CTEs) that pre-select only the authenticated client’s rows and strip client_id from the visible schema. The LLM then receives a system prompt that lists only those sanitized CTE names (e.g., orders, products, customers) and generates business logic against them — so even a malicious prompt can’t access other tenants’ data.
Technically, the pattern uses parameterized CTEs like WITH orders AS (SELECT ... FROM dataset.orders WHERE client_id = :client_id) and similar CTEs for related tables; joins and relationships can be preserved by joining CTEs (or by joining filtered base tables in CTEs) so the LLM can still produce complex analytics. Benefits: it prevents prompt-injection and human error from producing cross-tenant WHERE lapses, makes queries auditable for SOC2/GDPR, localizes schema changes to CTE definitions, and can improve performance by reducing dataset size before heavy operations. This approach cleanly separates security (data scoping) from business logic (LLM-generated SQL).
Loading comments...
login to comment
loading comments...
no comments yet