RetailBench: Benchmarking long horizon reasoning and coherent decision making of LLM agents in realistic retail environments


This is a companion discussion topic for the original entry at https://arxiv.org/abs/2606.15862