DSDE: Dynamic Speculative Decoding with KLD Stability for Real-World Serving
양민규, 최재영, 문기효, 장민성, 전은주
학회/저널
The 2025 IEEE International Conference on Big Data
년도
2025년
연구분야
Cryptography and Privacy
Abstract
Speculative decoding accelerates large language model inference, but its reliance
on a fixed speculation length is suboptimal in large-batch serving environments with
diverse requests. This paper explores a new direction for dynamic adaptation by
investigating a novel class of post-hoc, diagnostic signals. We propose Dynamic
Speculative Decoding Engine (DSDE), a training-free framework built on two pri
mary components: (1) a predictive signal based on the variance of the Kullback
Leibler (KLD) divergence, which diagnoses the generation’s regional stability, and
(2) an adaptive speculation length cap to mitigate the straggler problem in per
sequence decoding. Experiments demonstrate the potential of using KLD-based
stability signals for dynamic adaptation. An algorithm guided by these signals
achieves end-to-end latency competitive with leading baselines and exhibits supe
rior robustness across diverse workloads. This robustness is particularly valuable in
challenging low-acceptance-rate regimes, where the proposed signal maintains its di
agnostic utility. Collectively, these findings validate post-hoc signals as a valuable
component for building more robust and intelligent LLM inference systems, and
highlight a promising direction for future research on dynamic speculation length
adaptation