arXiv:2601.08743v1 Announce Type: new 
Abstract: In Text-to-SQL tasks, existing LLM-based methods often include extensive database schemas in prompts, leading to long context lengths and increased prefilling latency. While user queries typically focus on recurrent table sets-offering an opportunity for KV cache sharing across queries-current inference engines, such as SGLang and vLLM, generate redundant prefix cache copies when processing user queries with varying table orders. To address this inefficiency, we propose precomputing table representations as KV caches offline and querying the required ones online. A key aspect of our approach is the computation of table caches while preserving primary foreign key relationships between tables. Additionally, we construct a Table Trie structure to facilitate efficient KV cache lookups during inference. To enhance cache performance, we introduce a cache management system with a query reranking strategy to improve cache hit rates and a computation loading pipeline for parallelizing model inference and cache loading. Experimental results show that our proposed TableCache achieves up to a 3.62x speedup in Time to First Token (TTFT) with negligible performance degradation.

تم اقتراح نهج جديد يسمى TableCache لتحسين زمن الاستجابة المنخفض في مهام Text-to-SQL من خلال حساب مسبق لمخازن القيم الرئيسية (KV) خارج الخط، مع الحفاظ على علاقات المفتاح الأجنبي الأساسي بين الجداول. يعالج هذا الأسلوب عدم الكفاءة في محركات الاستدلال الحالية مثل SGLang وvLLM، التي تنتج نسخًا زائدة من المخازن عند معالجة الاستفسارات بترتيبات جداول مختلفة.

Se ha propuesto un nuevo enfoque llamado TableCache para mejorar la baja latencia en tareas de Text-to-SQL mediante el precálculo de cachés de clave-valor (KV) fuera de línea, preservando al mismo tiempo las relaciones de clave foránea primaria entre las tablas. Este método aborda las ineficiencias de los motores de inferencia existentes como SGLang y vLLM, que generan copias de caché redundantes al procesar consultas con diferentes órdenes de tablas.

Une nouvelle approche appelée TableCache a été proposée pour améliorer la faible latence dans les tâches Text-to-SQL en précalculant les caches de clés-valeurs (KV) hors ligne tout en préservant les relations de clé étrangère primaire entre les tables. Cette méthode s'attaque aux inefficacités des moteurs d'inférence existants comme SGLang et vLLM, qui génèrent des copies de cache redondantes lors du traitement de requêtes avec des ordres de table variés.

A new approach called TableCache has been proposed to enhance low latency in Text-to-SQL tasks by precomputing key-value (KV) caches offline while preserving primary foreign key relationships between tables. This method addresses inefficiencies in existing inference engines like SGLang and vLLM, which generate redundant cache copies when processing queries with varying table orders.

TableCache: Primary Foreign Key Guided KV Cache Precomputation for Low Latency Text-to-SQL

Was this article worth reading? Share it

LucidQuery AI

AI2sql

Langtail

Langfuse

AQ

Galaxy

Ready to build your own newsroom?