Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions

arXiv — cs.LG•Wednesday, November 19, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

The research delves into the optimality and NP
The findings underscore the limitations of current transformer models, suggesting that while they are powerful, they struggle with complex structured functions, which could impact their application in real
The exploration of transformer capabilities aligns with ongoing discussions in AI regarding model efficiency and interpretability, as seen in emerging frameworks like X

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.CV19 hours ago

Attention Via Convolutional Nearest Neighbors

PositiveArtificial Intelligence

The article introduces Convolutional Nearest Neighbors (ConvNN), a framework that unifies Convolutional Neural Networks (CNNs) and Transformers by viewing convolution and self-attention as neighbor selection and aggregation methods. ConvNN allows for a systematic exploration of the spectrum between these two architectures, serving as a drop-in replacement for convolutional and attention layers. The framework's effectiveness is validated through classification tasks on CIFAR-10 and CIFAR-100 datasets.

Read full article

via arXiv — cs.CV

arXiv — cs.LG19 hours ago

Near-Lossless Model Compression Enables Longer Context Inference in DNA Large Language Models

PositiveArtificial Intelligence

Recent advancements in DNA large language models (LLMs) have led to the introduction of FOCUS, a near-lossless model compression technique. This innovation addresses the challenges of high computational costs and memory requirements during autoregressive decoding, which have previously limited the effectiveness of LLMs in processing ultra-long genomic sequences. By integrating a progressive context-compression module, FOCUS enhances the ability of these models to retain distant information, thereby improving their performance in DNA sequence modeling.

Read full article

via arXiv — cs.LG

arXiv — stat.ML19 hours ago

Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer

PositiveArtificial Intelligence

The article discusses a new approach to attention mechanisms in artificial intelligence, inspired by biological synaptic plasticity. This method aims to improve energy efficiency in spiking neural networks (SNNs) compared to traditional Transformers, which rely on dot-product similarity. The research highlights the limitations of current spiking attention models and proposes a biologically inspired spiking neuromorphic transformer that could reduce the carbon footprint associated with large language models (LLMs) like GPT.

Read full article

via arXiv — stat.ML

arXiv — cs.LG19 hours ago

Benchmark on Drug Target Interaction Modeling from a Drug Structure Perspective

PositiveArtificial Intelligence

The article discusses advancements in predicting drug-target interactions, a critical aspect of drug discovery and design. Recent methods utilizing deep learning technologies, particularly graph neural networks (GNNs) and Transformers, have shown remarkable performance by effectively extracting structural information. However, the benchmarking of these methods varies significantly, affecting algorithmic progress. The authors conducted a comprehensive survey and benchmark to integrate various structure learning algorithms for improved modeling.

Read full article

via arXiv — cs.LG

arXiv — cs.CV19 hours ago

Can You Learn to See Without Images? Procedural Warm-Up for Vision Transformers

PositiveArtificial Intelligence

This study explores a novel approach to enhance vision transformers (ViTs) by pretraining them on procedurally-generated data that lacks visual or semantic content. Utilizing simple algorithms, the research aims to instill generic biases in ViTs, allowing them to internalize abstract computational priors. The findings indicate that this warm-up phase, followed by standard image-based training, significantly boosts data efficiency, convergence speed, and overall performance, with notable improvements observed on ImageNet-1k.

Read full article

via arXiv — cs.CV

arXiv — cs.LG2 days ago

X-VMamba: Explainable Vision Mamba

PositiveArtificial Intelligence

The X-VMamba model introduces a controllability-based interpretability framework for State Space Models (SSMs), particularly the Mamba architecture. This framework aims to clarify how Vision SSMs process spatial information, which has been a challenge due to the absence of transparent mechanisms. The proposed methods include a Jacobian-based approach for any SSM architecture and a Gramian-based method for diagonal SSMs, both designed to enhance understanding of internal state dynamics while maintaining computational efficiency.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

Decoupling Positional and Symbolic Attention Behavior in Transformers

NeutralArtificial Intelligence

The article discusses the independent encoding of positional and symbolic information in language understanding and production, particularly within Transformers. It highlights the use of Positional Encodings (PEs), focusing on the Rotary PE (RoPE), which has shown empirical success. The authors argue that RoPE's effectiveness stems from its ability to encode robust positional and semantic information through varying frequencies. The study explores the dichotomy of attention head behavior, providing definitions, proving mutual exclusivity, and developing a metric for quantification.

Read full article

via arXiv — cs.LG