REIS: A High-Performance and Energy-Efficient Retrieval System with In-Storage Processing

arXiv — cs.CLWednesday, November 12, 2025 at 5:00:00 AM
The introduction of REIS marks a significant advancement in the field of artificial intelligence, particularly in the context of large language models (LLMs). Traditional LLMs are limited by their static training data, but Retrieval-Augmented Generation (RAG) offers a solution by incorporating external knowledge. However, the retrieval stage of RAG often becomes a bottleneck due to the overheads incurred during Approximate Nearest Neighbor Search (ANNS). REIS proposes a novel approach by employing In-Storage Processing (ISP) techniques, which allow computations to occur within the storage system, thus reducing data movement and accelerating retrieval operations. This system is the first of its kind tailored specifically for RAG, addressing previous limitations and enhancing the overall efficiency of data retrieval. The implications of REIS are profound, as it not only optimizes the performance of RAG but also paves the way for more effective integration of external knowledge into LLMs,…
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification
PositiveArtificial Intelligence
The article presents a new framework called GMAT, which enhances Multiple Instance Learning (MIL) for whole slide image (WSI) classification. By integrating vision-language models (VLMs), GMAT aims to improve the generation of clinical descriptions that are more expressive and medically specific. This addresses limitations in existing methods that rely on large language models (LLMs) for generating descriptions, which often lack domain grounding and detailed medical specificity, thus improving alignment with visual features.
Silenced Biases: The Dark Side LLMs Learned to Refuse
NegativeArtificial Intelligence
Safety-aligned large language models (LLMs) are increasingly used in sensitive applications where fairness is crucial. Evaluating their fairness is complex, often relying on standard question-answer methods that misinterpret refusal responses as indicators of fairness. This paper introduces the concept of silenced biases, which are unfair preferences hidden within the models' latent space, masked by safety-alignment. Previous methods have limitations, prompting the need for new approaches to uncover these biases effectively.
Fair In-Context Learning via Latent Concept Variables
PositiveArtificial Intelligence
The paper titled 'Fair In-Context Learning via Latent Concept Variables' explores the in-context learning (ICL) capabilities of large language models (LLMs) in handling tabular data. It highlights the potential for LLMs to inherit biases from pre-training data, which can lead to discrimination in high-stakes applications. The authors propose an optimal demonstration selection method using latent concept variables to enhance task adaptation and fairness, alongside data augmentation strategies to minimize correlations between sensitive variables and predictive outcomes.
Modeling and Predicting Multi-Turn Answer Instability in Large Language Models
NeutralArtificial Intelligence
The paper titled 'Modeling and Predicting Multi-Turn Answer Instability in Large Language Models' discusses the evaluation of large language models (LLMs) in terms of their robustness during user interactions. The study employs multi-turn follow-up prompts to assess changes in model answers and accuracy dynamics using Markov chains. Results indicate vulnerabilities in LLMs, with a 10% accuracy drop for Gemini 1.5 Flash after a 'Think again' prompt over nine turns, and a 7.5% drop for Claude 3.5 Haiku with a reworded question. The findings suggest that accuracy can be modeled over time.
Identifying and Analyzing Performance-Critical Tokens in Large Language Models
NeutralArtificial Intelligence
The paper titled 'Identifying and Analyzing Performance-Critical Tokens in Large Language Models' explores how large language models (LLMs) utilize in-context learning (ICL) for few-shot learning. It categorizes tokens in ICL prompts into content, stopword, and template tokens, aiming to identify those that significantly impact LLM performance. The study reveals that template and stopword tokens have a greater influence on performance than informative content tokens, challenging existing assumptions about human attention to informative words.
LDC: Learning to Generate Research Idea with Dynamic Control
PositiveArtificial Intelligence
Recent advancements in large language models (LLMs) highlight their potential in automating scientific research ideation. Current methods often produce ideas that do not meet expert standards of novelty, feasibility, and effectiveness. To address these issues, a new framework is proposed that combines Supervised Fine-Tuning (SFT) and controllable Reinforcement Learning (RL) to enhance the quality of generated research ideas through a two-stage approach.
A Multifaceted Analysis of Negative Bias in Large Language Models through the Lens of Parametric Knowledge
NeutralArtificial Intelligence
A recent study published on arXiv examines the phenomenon of negative bias in large language models (LLMs), which refers to their tendency to generate negative responses in binary decision tasks. The research highlights that previous studies have primarily focused on identifying negative attention heads that contribute to this bias. The authors introduce a new evaluation pipeline that categorizes responses based on the model's parametric knowledge, revealing that the format of prompts significantly influences the responses more than the semantics of the content itself.
Beyond the Surface: Probing the Ideological Depth of Large Language Models
PositiveArtificial Intelligence
Large language models (LLMs) exhibit distinct political leanings, but their consistency in representing these orientations varies. This study introduces the concept of ideological depth, defined by a model's ability to follow political instructions reliably and the richness of its internal political representations, assessed using sparse autoencoders. The research compares Llama-3.1-8B-Instruct and Gemma-2-9B-IT, revealing that Gemma is significantly more steerable and activates approximately 7.3 times more distinct political features than Llama.