When Preferences Fail to Become Incentives: A Utility-Behavior Gap in Large Language Models


This is a companion discussion topic for the original entry at https://arxiv.org/abs/2606.22974