Quantizing Whisper-small: How design choices affect ASR performance

arXiv — cs.CLWednesday, November 12, 2025 at 5:00:00 AM
The study on Whisper-small, published on arXiv, highlights the challenges of deploying large speech recognition models due to their computational demands. By evaluating various post-training quantization (PTQ) methods across libraries like PyTorch and Optimum-Quanto, the research identifies dynamic int8 quantization as the most effective approach, achieving a 57% reduction in model size while enhancing accuracy. In contrast, static quantization was less effective, likely due to the model's Transformer architecture. More aggressive quantization formats, while achieving up to 71% compression, compromised accuracy in noisy conditions. These findings underscore the importance of selecting appropriate PTQ methods to enable efficient deployment of Whisper-small on edge devices, thus facilitating broader access to advanced speech recognition technology.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it