OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks

system (system) 2026 年6 月 30 日 04:00 1

This is a companion discussion topic for the original entry at https://arxiv.org/abs/2606.29537