RLM-Cascade: Response-Level Speculative Decoding for Cost-Efficient LLM API Serving

system (system) 2026 年 6 月 23 日午前 4:00 1

This is a companion discussion topic for the original entry at https://arxiv.org/abs/2606.22840