Sliced ReLU attention: Quasi-linear contextual expressivity via sorting

arXiv — cs.LGMonday, December 15, 2025 at 5:00:00 AM
  • A new attention mechanism called sliced ReLU attention has been introduced, which operates on one-dimensional projections of key-query differences and utilizes sorting to achieve quasi-linear complexity. This method diverges from traditional softmax and ReLU-based approaches, allowing for efficient computation in O(n log(n)) time, making it suitable for processing very long contexts.
  • The development of sliced ReLU attention is significant as it retains strong theoretical expressive power, demonstrating the ability to perform complex sequence-to-sequence tasks while maintaining computational efficiency. This could enhance various applications in natural language processing and machine learning.
  • This advancement reflects ongoing trends in artificial intelligence research, particularly in improving the efficiency and effectiveness of attention mechanisms. It aligns with broader efforts to address challenges in large language models and multi-intent spoken language understanding, emphasizing the need for innovative solutions that balance computational demands with expressive capabilities.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
RecTok: Reconstruction Distillation along Rectified Flow
PositiveArtificial Intelligence
RecTok has been introduced as a novel approach to enhance high-dimensional visual tokenizers in diffusion models, addressing the inherent trade-off between dimensionality and generation quality. By employing flow semantic distillation and reconstruction-alignment distillation, RecTok aims to improve the semantic richness of the forward flow used in training diffusion transformers.
Event Camera Meets Mobile Embodied Perception: Abstraction, Algorithm, Acceleration, Application
NeutralArtificial Intelligence
A comprehensive survey has been conducted on event-based mobile sensing, highlighting its evolution from 2014 to 2025. The study emphasizes the challenges posed by high data volume, noise, and the need for low-latency processing in mobile applications, particularly in the context of event cameras that offer high temporal resolution.
How a Bit Becomes a Story: Semantic Steering via Differentiable Fault Injection
NeutralArtificial Intelligence
A recent study published on arXiv explores how low-level bitwise perturbations, or fault injections, in large language models (LLMs) can affect the semantic meaning of generated image captions while maintaining grammatical integrity. This research highlights the vulnerability of transformers to subtle hardware bit flips, which can significantly alter the narratives produced by AI systems.
Inference Time Feature Injection: A Lightweight Approach for Real-Time Recommendation Freshness
PositiveArtificial Intelligence
A new approach called Inference Time Feature Injection has been introduced to enhance real-time recommendation systems in long-form video streaming. This method allows for the selective injection of recent user watch history at inference time, overcoming the limitations of static user features that are updated only daily. The technique has shown a statistically significant increase in user engagement metrics by 0.47%.
How Does Fourier Analysis Network Work? A Mechanism Analysis and a New Dual-Activation Layer Proposal
PositiveArtificial Intelligence
The Fourier Analysis Network (FAN) has been proposed as a method to enhance neural network performance by integrating sine and cosine functions in place of some ReLU activations. Research indicates that while sine functions contribute positively to performance, cosine functions may hinder it. This study clarifies that the benefits arise from the sine function's local behavior, particularly near zero, which helps address the vanishing-gradient problem.
Low-rank MMSE filters, Kronecker-product representation, and regularization: a new perspective
PositiveArtificial Intelligence
A new method has been proposed for efficiently determining the regularization parameter for low-rank MMSE filters using a Kronecker-product representation. This approach highlights the importance of selecting the correct regularization parameter, which is closely tied to rank selection, and demonstrates significant improvements over traditional methods through simulations.
Neural Modular Physics for Elastic Simulation
PositiveArtificial Intelligence
A new approach called Neural Modular Physics (NMP) has been introduced for elastic simulation, combining the strengths of neural networks with the reliability of traditional physics simulators. This method decomposes elastic dynamics into meaningful neural modules, allowing for direct supervision of intermediate quantities and physical constraints.
Joint Learning of Unsupervised Multi-view Feature and Instance Co-selection with Cross-view Imputation
PositiveArtificial Intelligence
A novel method for joint learning of unsupervised multi-view feature and instance co-selection with cross-view imputation has been proposed, addressing the challenges of missing data in multi-view datasets. This approach enhances the interaction between co-selection and imputation processes, aiming to improve the effectiveness of data analysis in scenarios where some samples are incomplete.

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about