arXiv:2511.11781v1 Announce Type: cross 
Abstract: ReLU activations are the main bottleneck in Private Inference that is based on ResNet networks. This is because they incur significant inference latency. Reducing ReLU count is a discrete optimization problem, and there are two common ways to approach it. Most current state-of-the-art methods are based on a smooth approximation that jointly optimizes network accuracy and ReLU budget at once. However, the last hard thresholding step of the optimization usually introduces a large performance loss. We take an alternative approach that works directly in the discrete domain by leveraging Coordinate Descent as our optimization framework. In contrast to previous methods, this yields a sparse solution by design. We demonstrate, through extensive experiments, that our method is State of the Art on common benchmarks.

تناقش المقالة التحديات التي تطرحها تفعيلات ReLU في الاستدلال الخاص باستخدام شبكات ResNet، ويرجع ذلك أساسًا إلى التأخير الكبير في الاستدلال. وتبرز أن تقليل عدد تفعيلات ReLU هو مشكلة تحسين متقطعة، وعادة ما يتم تناولها من خلال تقريبات سلسة تعمل على تحسين دقة الشبكة وميزانية ReLU في آن واحد. ومع ذلك، فإن هذه الطرق غالبًا ما تعاني من فقدان كبير في الأداء خلال الخطوة النهائية من تحديد العتبة الصعبة. يقترح المؤلفون نهجًا بديلًا يستخدم الانحدار التناسبي، الذي يعمل مباشرة في المجال المتقطع ويؤدي إلى حل نادر، مما يظهر أداءً رائدًا في التجارب.

El artículo discute los desafíos que presentan las activaciones ReLU en la Inferencia Privada utilizando redes ResNet, principalmente debido a la significativa latencia de inferencia. Destaca que reducir la cantidad de activaciones ReLU es un problema de optimización discreta, que generalmente se aborda mediante aproximaciones suaves que optimizan tanto la precisión de la red como el presupuesto de ReLU. Sin embargo, estos métodos suelen sufrir una pérdida de rendimiento durante el último paso de umbral duro. Los autores proponen un enfoque alternativo que utiliza el Descenso por Coordenadas, …

L'article aborde les défis posés par les activations ReLU dans l'inférence privée utilisant des réseaux ResNet, principalement en raison d'une latence d'inférence significative. Il souligne que la réduction du nombre d'activations ReLU est un problème d'optimisation discrète, généralement abordé par des approximations lisses qui optimisent à la fois la précision du réseau et le budget ReLU. Cependant, ces méthodes souffrent souvent d'une perte de performance lors de la dernière étape de seuil dur. Les auteurs proposent une méthode alternative utilisant la descente de coordonnées, qui fonctionn…

The article discusses the challenges posed by ReLU activations in Private Inference using ResNet networks, primarily due to significant inference latency. It highlights that reducing the count of ReLU activations is a discrete optimization problem, typically approached through smooth approximations that optimize both network accuracy and ReLU budget. However, these methods often suffer from performance loss during the final hard thresholding step. The authors propose an alternative method utilizing Coordinate Descent, which directly operates in the discrete domain and results in a sparse solut…

Coordinate Descent for Network Linearization

arXiv:2512.02727v1 Announce Type: new 
Abstract: Modeling daily hand interactions often struggles with severe occlusions, such as when two hands overlap, which highlights the need for robust feature learning in 3D hand pose estimation (HPE). To handle such occluded hand images, it is vital to effectively learn the relationship between local image features (e.g., for occluded joints) and global context (e.g., cues from inter-joints, inter-hands, or the scene). However, most current 3D HPE methods still rely on ResNet for feature extraction, and such CNN's inductive bias may not be optimal for 3D HPE due to its limited capability to model the global context. To address this limitation, we propose an effective and efficient framework for visual feature extraction in 3D HPE using recent state space modeling (i.e., Mamba), dubbed Deformable Mamba (DF-Mamba). DF-Mamba is designed to capture global context cues beyond standard convolution through Mamba's selective state modeling and the proposed deformable state scanning. Specifically, for local features after convolution, our deformable scanning aggregates these features within an image while selectively preserving useful cues that represent the global context. This approach significantly improves the accuracy of structured 3D HPE, with comparable inference speed to ResNet-50. Our experiments involve extensive evaluations on five divergent datasets including single-hand and two-hand scenarios, hand-only and hand-object interactions, as well as RGB and depth-based estimation. DF-Mamba outperforms the latest image backbones, including VMamba and Spatial-Mamba, on all datasets and achieves state-of-the-art performance.

تم تقديم إطار عمل جديد يسمى DF-Mamba لتقدير وضع اليد ثلاثي الأبعاد، حيث يتناول التحديات المتعلقة بالانسدادات الشديدة أثناء التفاعلات اليدوية. يستفيد هذا النموذج من نمذجة الفضاء القابل للتشويه لتحسين استخراج الميزات بما يتجاوز الطرق التقليدية المعتمدة على الالتفاف، بهدف تحسين دقة التعرف على وضع اليد في السيناريوهات المعقدة.

Se ha introducido un nuevo marco llamado DF-Mamba para la estimación de la pose de la mano en 3D, abordando los desafíos relacionados con las oclusiones severas durante las interacciones manuales. Este modelo aprovecha la modelización de espacio de estado deformable para mejorar la extracción de características más allá de los métodos de convolución tradicionales, con el objetivo de mejorar la precisión del reconocimiento de la pose de la mano en escenarios complejos.

Un nouveau cadre nommé DF-Mamba a été introduit pour l'estimation de la pose des mains en 3D, abordant les défis liés aux occlusions sévères lors des interactions manuelles. Ce modèle utilise la modélisation d'espace d'état déformable pour améliorer l'extraction des caractéristiques au-delà des méthodes de convolution traditionnelles, visant à améliorer la précision de la reconnaissance de la pose des mains dans des scénarios complexes.

A new framework named DF-Mamba has been introduced for 3D hand pose estimation, addressing challenges related to severe occlusions during hand interactions. This model leverages deformable state space modeling to enhance feature extraction beyond traditional convolutional methods, aiming to improve the accuracy of hand pose recognition in complex scenarios.

DF-Mamba: Deformable State Space Modeling for 3D Hand Pose Estimation in Interactions

arXiv:2504.15051v2 Announce Type: replace-cross 
Abstract: Activation functions play a critical role in deep neural networks by shaping gradient flow, optimization stability, and generalization. While ReLU remains widely used due to its simplicity, it suffers from gradient sparsity and dead-neuron issues and offers no adaptivity to input statistics. Smooth alternatives such as Swish and GELU improve gradient propagation but still apply a fixed transformation regardless of the activation distribution. In this paper, we propose VeLU, a Variance-enhanced Learning Unit that introduces variance-aware and distributionally aligned nonlinearity through a principled combination of ArcTan-ArcSin transformations, adaptive scaling, and Wasserstein-2 regularization (Optimal Transport). This design enables VeLU to modulate its response based on local activation variance, mitigate internal covariate shift at the activation level, and improve training stability without adding learnable parameters or architectural overhead. Extensive experiments across six deep neural networks show that VeLU outperforms ReLU, ReLU6, Swish, and GELU on 12 vision benchmarks. The implementation of VeLU is publicly available in GitHub.

تهدف وحدة التعلم المعززة بالمتغيرات (VeLU) إلى معالجة القيود المفروضة على دوال التنشيط التقليدية في الشبكات العصبية العميقة، وخاصة دالة ReLU، المعروفة بمشاكل ندرة التدرجات ووجود خلايا عصبية ميتة. تستخدم VeLU مزيجًا من تحويلات ArcTan-ArcSin والتعديل التكيفي لتحسين استقرار التدريب وتحسين تدفق التدرجات بناءً على تباين التنشيط المحلي.

La introducción de VeLU, una Unidad de Aprendizaje Mejorada por Varianza, tiene como objetivo abordar las limitaciones de las funciones de activación tradicionales en redes neuronales profundas, especialmente el ReLU, que es conocido por problemas como la escasez de gradientes y neuronas muertas. VeLU emplea una combinación de transformaciones ArcTan-ArcSin y escalado adaptativo para mejorar la estabilidad del entrenamiento y optimizar el flujo de gradientes en función de la varianza de activación local.

L'introduction de VeLU, une Unité d'Apprentissage Améliorée par Variance, vise à résoudre les limitations des fonctions d'activation traditionnelles dans les réseaux de neurones profonds, en particulier le ReLU, connu pour ses problèmes de sparsité de gradient et de neurones morts. VeLU utilise une combinaison de transformations ArcTan-ArcSin et de mise à l'échelle adaptative pour améliorer la stabilité de l'entraînement et optimiser le flux de gradient en fonction de la variance d'activation locale.

The introduction of VeLU, a Variance-enhanced Learning Unit, aims to address the limitations of traditional activation functions in deep neural networks, particularly the ReLU, which is known for issues like gradient sparsity and dead neurons. VeLU employs a combination of ArcTan-ArcSin transformations and adaptive scaling to enhance training stability and optimize gradient flow based on local activation variance.

VeLU: Variance-enhanced Learning Unit for Deep Neural Networks

arXiv:2512.00088v1 Announce Type: cross 
Abstract: We propose SemImage, a novel method for representing a text document as a two-dimensional semantic image to be processed by convolutional neural networks (CNNs). In a SemImage, each word is represented as a pixel in a 2D image: rows correspond to sentences and an additional boundary row is inserted between sentences to mark semantic transitions. Each pixel is not a typical RGB value but a vector in a disentangled HSV color space, encoding different linguistic features: the Hue with two components H_cos and H_sin to account for circularity encodes the topic, Saturation encodes the sentiment, and Value encodes intensity or certainty. We enforce this disentanglement via a multi-task learning framework: a ColorMapper network maps each word embedding to the HSV space, and auxiliary supervision is applied to the Hue and Saturation channels to predict topic and sentiment labels, alongside the main task objective. The insertion of dynamically computed boundary rows between sentences yields sharp visual boundaries in the image when consecutive sentences are semantically dissimilar, effectively making paragraph breaks salient. We integrate SemImage with standard 2D CNNs (e.g., ResNet) for document classification. Experiments on multi-label datasets (with both topic and sentiment annotations) and single-label benchmarks demonstrate that SemImage can achieve competitive or better accuracy than strong text classification baselines (including BERT and hierarchical attention networks) while offering enhanced interpretability. An ablation study confirms the importance of the multi-channel HSV representation and the dynamic boundary rows. Finally, we present visualizations of SemImage that qualitatively reveal clear patterns corresponding to topic shifts and sentiment changes in the generated image, suggesting that our representation makes these linguistic features visible to both humans and machines.

تم تقديم إطار عمل جديد يسمى SemImage، والذي يمثل الوثائق النصية كصور دلالية ثنائية الأبعاد لمعالجتها بواسطة الشبكات العصبية التلافيفية (CNN). يتم تمثيل كل كلمة كبيكسل في صورة ثنائية الأبعاد، مع ترميزات لونية مميزة لميزات لغوية مثل الموضوع، والعاطفة، والشدة. تهدف هذه الطريقة المبتكرة إلى تحسين تمثيل البيانات اللغوية في نماذج التعلم الآلي.

Se ha introducido un nuevo marco llamado SemImage, que representa documentos de texto como imágenes semánticas bidimensionales para su procesamiento por redes neuronales convolucionales (CNN). Cada palabra se representa como un píxel en una imagen 2D, con codificaciones de color distintas para características lingüísticas como el tema, el sentimiento y la intensidad. Este enfoque innovador tiene como objetivo mejorar la representación de datos lingüísticos en modelos de aprendizaje automático.

Un nouveau cadre nommé SemImage a été introduit, représentant les documents textuels sous forme d'images sémantiques bidimensionnelles pour le traitement par des réseaux de neurones convolutifs (CNN). Chaque mot est représenté comme un pixel dans une image 2D, avec des codages de couleur distincts pour des caractéristiques linguistiques telles que le sujet, le sentiment et l'intensité. Cette approche innovante vise à améliorer la représentation des données linguistiques dans les modèles d'apprentissage automatique.

A novel framework named SemImage has been introduced, which represents text documents as two-dimensional semantic images for processing by convolutional neural networks (CNNs). Each word is depicted as a pixel in a 2D image, with distinct color encodings for linguistic features such as topic, sentiment, and intensity. This innovative approach aims to enhance the representation of linguistic data in machine learning models.

Coordinate Descent for Network Linearization

Was this article worth reading? Share it

LCW

ChartPixel

Brainactive