SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

arXiv — cs.LG•Thursday, November 20, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

The introduction of SLA (Sparse
This development is crucial as it allows for more efficient processing of long sequences, potentially improving the performance and applicability of video generation technologies in various fields.
The evolution of attention mechanisms, such as SLA, reflects ongoing efforts in the AI community to enhance model efficiency and effectiveness, paralleling advancements in audio

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Fusionads.ai

Generate professional ads in seconds with AI-powered creative automation.

AI & DataTry the app

Sprello

Transform your media assets into high-performing user-generated video ads effortlessly.

AI & DataTry the app

Dubsmart LLC

Multilingual AI dubbing and voice cloning for global video content localization.

AI & DataTry the app

Continue Readings

arXiv — cs.CVa day ago

NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering

PositiveArtificial Intelligence

The introduction of Neighborhood Attention Filtering (NAF) represents a significant advancement in the field of Vision Foundation Models (VFMs), allowing for zero-shot feature upsampling without the need for retraining. This innovative method utilizes Cross-Scale Neighborhood Attention and Rotary Position Embeddings to adaptively learn spatial and content weights from high-resolution images, outperforming existing VFM-specific upsamplers across various tasks.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation

PositiveArtificial Intelligence

MammothModa2, a new unified autoregressive-diffusion framework, has been introduced to enhance multimodal understanding and generation. This framework aims to bridge the gap between discrete semantic reasoning and high-fidelity visual synthesis, utilizing a serial design that couples autoregressive semantic planning with diffusion-based generation.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

MINDiff: Mask-Integrated Negative Attention for Controlling Overfitting in Text-to-Image Personalization

PositiveArtificial Intelligence

A new method called Mask-Integrated Negative Attention Diffusion (MINDiff) has been proposed to tackle overfitting in text-to-image personalization, particularly when learning from limited images. This approach introduces negative attention to suppress subject influence in irrelevant areas, enhancing semantic control and text alignment during inference. Users can adjust a scale parameter to balance subject fidelity and text alignment.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

Matching-Based Few-Shot Semantic Segmentation Models Are Interpretable by Design

PositiveArtificial Intelligence

A new study has introduced an innovative method for interpreting Few-Shot Semantic Segmentation (FSS) models, which are designed to segment novel classes with minimal labeled examples. The Affinity Explainer approach utilizes structural properties of matching-based FSS models to generate attribution maps, highlighting the contribution of support images to query segmentation predictions.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models

NeutralArtificial Intelligence

The introduction of BackdoorVLM marks a significant advancement in the evaluation of backdoor attacks on vision-language models (VLMs), addressing a critical gap in the understanding of these threats within multimodal machine learning systems. This benchmark categorizes backdoor threats into five distinct types, including targeted refusal and perceptual hijack, providing a structured approach to analyze their impact on tasks like image captioning and visual question answering.

Read full article

via arXiv — cs.CV

arXiv — cs.LGa day ago

Importance-Weighted Non-IID Sampling for Flow Matching Models

PositiveArtificial Intelligence

A new framework for importance-weighted non-IID sampling has been proposed to enhance flow-matching models, which are crucial for accurately representing complex distributions. This method addresses the challenge of estimating expectations from limited samples, particularly in scenarios where rare outcomes significantly influence results.

Read full article

via arXiv — cs.LG

arXiv — cs.CLa day ago

PoETa v2: Toward More Robust Evaluation of Large Language Models in Portuguese

PositiveArtificial Intelligence

The PoETa v2 benchmark has been introduced as the most extensive evaluation of Large Language Models (LLMs) for the Portuguese language, comprising over 40 tasks. This initiative aims to systematically assess more than 20 models, highlighting performance variations influenced by computational resources and language-specific adaptations. The benchmark is accessible on GitHub.

Read full article

via arXiv — cs.CL

arXiv — cs.CVa day ago

SciPostLayoutTree: A Dataset for Structural Analysis of Scientific Posters

PositiveArtificial Intelligence

The SciPostLayoutTree dataset has been introduced to enhance the structural analysis of scientific posters, comprising approximately 8,000 annotated posters that detail reading order and parent-child relationships. This initiative addresses a significant gap in research, as previous studies predominantly focused on academic papers rather than posters, which are crucial for visual communication in academia.

Read full article

via arXiv — cs.CV