RLM-Cascade: Response-Level Speculative Decoding for Cost-Efficient LLM API Serving


This is a companion discussion topic for the original entry at https://arxiv.org/abs/2606.22840