Friend or Foe: Delegating to an AI Whose Alignment Is Unknown (arxiv.org)

🤖 AI Summary
Researchers analyze how a decision-maker should selectively reveal patient attributes to an AI whose alignment with the decision-maker’s objectives is uncertain. In a treatment-allocation setting, giving the AI more information boosts outcomes if the AI is aligned, but also increases potential harm if it’s misaligned. The authors derive an optimal disclosure strategy that balances these opposing effects: reveal attributes that reliably identify rare subpopulations with high treatment need, while pooling (withholding differentiating information about) the remaining majority. This approach protects most patients from amplified errors or adversarial behavior while still leveraging the AI’s strengths for high-impact, identifiable cases. Technically, the paper frames disclosure as a trade-off driven by the designer’s prior belief about AI reliability and the statistical structure of need across attributes. The optimal policy depends on how informative features are about high-need events and how rare those events are; informative features pinpointing scarce, high-value cases should be disclosed, whereas features that mainly refine predictions for common cases should be suppressed to limit downside risk. For AI/ML practitioners, the work highlights selective feature disclosure and interface design as practical levers for robust delegation under alignment uncertainty, with implications for medical AI deployment, feature selection, privacy-preserving protocols, and policies that constrain system inputs to mitigate misalignment risk.
Loading comments...
loading comments...