VeLU: Variance-enhanced Learning Unit for Deep Neural Networks

arXiv — cs.CV•Wednesday, December 3, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

The introduction of VeLU, a Variance-enhanced Learning Unit, aims to address the limitations of traditional activation functions in deep neural networks, particularly the ReLU, which is known for issues like gradient sparsity and dead neurons. VeLU employs a combination of ArcTan-ArcSin transformations and adaptive scaling to enhance training stability and optimize gradient flow based on local activation variance.
This development is significant as it offers a solution to the persistent challenges faced by deep learning practitioners in optimizing neural network performance. By improving the adaptability of activation functions, VeLU could lead to more efficient training processes and better generalization in various applications of deep learning.
The ongoing exploration of activation functions reflects a broader trend in artificial intelligence research, where enhancing model performance and reducing computational inefficiencies are paramount. This aligns with recent studies addressing similar challenges in neural network architectures, emphasizing the need for innovative approaches to improve inference speed and reduce latency in applications like Private Inference and generalized estimators.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Fakeface

Swap faces instantly with advanced AI technology for realistic results.

Tech & Developer ToolsTry the app

Fluidwave

Access AI productivity tools and hire human assistants from our marketplace.

AI & DataTry the app

Video Face Swap AI

Swap faces in videos instantly with AI for fun and creative content.

Marketing & CommerceTry the app

Continue Readings

arXiv — cs.CV18 hours ago

Cross-Cancer Knowledge Transfer in WSI-based Prognosis Prediction

PositiveArtificial Intelligence

A new study introduces CROPKT, a framework for cross-cancer prognosis knowledge transfer using Whole-Slide Images (WSI). This approach challenges the traditional cancer-specific model by leveraging a large dataset (UNI2-h-DSS) that includes 26 different cancers, aiming to enhance prognosis predictions, especially for rare tumors.

Read full article

via arXiv — cs.CV

arXiv — cs.CV18 hours ago

UCAgents: Unidirectional Convergence for Visual Evidence Anchored Multi-Agent Medical Decision-Making

PositiveArtificial Intelligence

The introduction of UCAgents, a hierarchical multi-agent framework, aims to enhance medical decision-making by enforcing unidirectional convergence through structured evidence auditing, addressing the reasoning detachment seen in Vision-Language Models (VLMs). This framework is designed to mitigate biases from single-model approaches by limiting agent interactions to targeted evidence verification, thereby improving clinical trust in AI diagnostics.

Read full article

via arXiv — cs.CV

arXiv — cs.CV18 hours ago

Superpixel Attack: Enhancing Black-box Adversarial Attack with Image-driven Division Areas

PositiveArtificial Intelligence

A new method called Superpixel Attack has been proposed to enhance black-box adversarial attacks in deep learning models, particularly in safety-critical applications like automated driving and face recognition. This approach utilizes superpixels instead of simple rectangles to apply perturbations, improving the effectiveness of adversarial attacks and defenses.

Read full article

via arXiv — cs.CV

arXiv — cs.CV18 hours ago

Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective

NeutralArtificial Intelligence

Recent research has introduced ReMindView-Bench, a benchmark designed to evaluate how Vision-Language Models (VLMs) construct and maintain spatial mental models across multiple viewpoints. This initiative addresses the challenges VLMs face in achieving geometric coherence and cross-view consistency in spatial reasoning tasks, which are crucial for understanding 3D environments.

Read full article

via arXiv — cs.CV

arXiv — cs.CV18 hours ago

ContourDiff: Unpaired Medical Image Translation with Structural Consistency

PositiveArtificial Intelligence

The introduction of ContourDiff, a novel framework for unpaired medical image translation, aims to enhance the accuracy of translating images between modalities like Computed Tomography (CT) and Magnetic Resonance Imaging (MRI). This framework utilizes Spatially Coherent Guided Diffusion (SCGD) to maintain anatomical fidelity, which is crucial for clinical applications such as segmentation models.

Read full article

via arXiv — cs.CV

arXiv — cs.CV18 hours ago

APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation

PositiveArtificial Intelligence

The APTx Neuron has been introduced as a novel neural computation unit that integrates non-linear activation and linear transformation into a single trainable expression, derived from the APTx activation function. This architecture eliminates the need for separate activation layers, enhancing optimization efficiency. Validation on the MNIST dataset demonstrated a test accuracy of 96.69% within 11 epochs using approximately 332K trainable parameters.

Read full article

via arXiv — cs.CV

arXiv — cs.CV18 hours ago

WSCF-MVCC: Weakly-supervised Calibration-free Multi-view Crowd Counting

PositiveArtificial Intelligence

A new method for multi-view crowd counting, named WSCF-MVCC, has been proposed, which operates without the need for camera calibrations or extensive crowd annotations. This weakly-supervised approach utilizes crowd count as supervision for the single-view counting module, employing a self-supervised ranking loss to enhance accuracy.

Read full article

via arXiv — cs.CV

arXiv — cs.CV18 hours ago

OmniBench: Towards The Future of Universal Omni-Language Models

NeutralArtificial Intelligence

OmniBench has been introduced as a benchmark to evaluate the performance of omni-language models (OLMs) in processing visual, acoustic, and textual inputs simultaneously, highlighting the limitations of current open-source multimodal large language models (MLLMs) in instruction-following and reasoning tasks.

Read full article

via arXiv — cs.CV