Dual LoRA: Enhancing LoRA with Magnitude and Direction Updates

arXiv — cs.CL•Thursday, December 4, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A novel method called Dual LoRA has been proposed to enhance the performance of Low-Rank Adaptation (LoRA) in fine-tuning large language models (LLMs). This method introduces two distinct groups within low-rank matrices: a magnitude group for controlling the extent of parameter updates and a direction group for determining the update direction, thereby improving the adaptation process.
The introduction of Dual LoRA is significant as it addresses the limitations of traditional LoRA methods, which often yield unsatisfactory results due to their low-rank assumptions. By incorporating an inductive bias, this approach aims to better simulate full fine-tuning processes, potentially leading to improved model performance in various applications.
This development reflects a broader trend in AI research towards enhancing parameter-efficient fine-tuning methods. Innovations such as ILoRA and AuroRA also seek to overcome challenges associated with LoRA, including client heterogeneity and low-rank bottlenecks. These advancements highlight the ongoing efforts to refine fine-tuning techniques, ensuring that large language models can be effectively adapted for diverse tasks while maintaining efficiency.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Continue Readings

arXiv — cs.CL20 hours ago

Network of Theseus (like the ship)

PositiveArtificial Intelligence

The Network of Theseus (NoT) introduces a novel approach in deep learning, allowing for the gradual transformation of a trained or untrained neural network architecture into a different target architecture while maintaining performance. This method challenges the traditional assumption that the architecture used during training must remain unchanged during inference.

Read full article

via arXiv — cs.CL

arXiv — cs.CV20 hours ago

MORPH: PDE Foundation Models with Arbitrary Data Modality

PositiveArtificial Intelligence

MORPH has been introduced as a modality-agnostic, autoregressive foundation model designed for partial differential equations (PDEs), utilizing a convolutional vision transformer backbone to manage diverse spatiotemporal datasets across various resolutions and data modalities. The model incorporates advanced techniques such as component-wise convolution and inter-field cross-attention to enhance its predictive capabilities.

Read full article

via arXiv — cs.CV

arXiv — cs.CL20 hours ago

Optimizing Fine-Tuning through Advanced Initialization Strategies for Low-Rank Adaptation

PositiveArtificial Intelligence

Recent advancements in fine-tuning methodologies have led to the introduction of IniLoRA, a novel initialization strategy designed to optimize Low-Rank Adaptation (LoRA) for large language models. IniLoRA initializes low-rank matrices to closely approximate original model weights, addressing limitations in performance seen with traditional LoRA methods. Experimental results demonstrate that IniLoRA outperforms LoRA across various models and tasks, with two additional variants, IniLoRA-$\alpha$ and IniLoRA-$\beta$, further enhancing performance.

Read full article

via arXiv — cs.CL

arXiv — cs.CV2 days ago

NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable Adaptation

PositiveArtificial Intelligence

The introduction of NAS-LoRA represents a significant advancement in the adaptation of the Segment Anything Model (SAM) for specialized tasks, particularly in medical and agricultural imaging. This new Parameter-Efficient Fine-Tuning (PEFT) method integrates a Neural Architecture Search (NAS) block to enhance SAM's performance by addressing its limitations in acquiring high-level semantic information due to the lack of spatial priors in its Transformer encoder.

Read full article

via arXiv — cs.CV

arXiv — cs.CL2 days ago

Idea-Gated Transformers: Enforcing Semantic Coherence via Differentiable Vocabulary Pruning

PositiveArtificial Intelligence

The Idea-Gated Transformer has been introduced as a novel architecture aimed at addressing the issue of 'Topic Drift' in Autoregressive Language Models (LLMs) during text generation. This model separates semantic planning from syntactic generation by utilizing an auxiliary 'Idea Head' that predicts future context, allowing for real-time vocabulary pruning to enhance coherence in generated text.

Read full article

via arXiv — cs.CL

arXiv — cs.CV2 days ago

LoRA Patching: Exposing the Fragility of Proactive Defenses against Deepfakes

NegativeArtificial Intelligence

A recent study highlights the vulnerabilities of proactive defenses against deepfakes, revealing that these defenses often lack the necessary robustness and reliability. The research introduces a novel technique called Low-Rank Adaptation (LoRA) patching, which effectively bypasses existing defenses by injecting adaptable patches into deepfake generators. This method also includes a Multi-Modal Feature Alignment loss to ensure semantic consistency in outputs.

Read full article

via arXiv — cs.CV

arXiv — cs.LG2 days ago

Scaling Multimodal Search and Recommendation with Small Language Models via Upside-Down Reinforcement Learning

PositiveArtificial Intelligence

A recent study has demonstrated the potential of small language models (SLMs) to effectively support multimodal search and recommendation tasks, utilizing a framework that integrates upside-down reinforcement learning and synthetic data distillation from larger models like Llama-3. The 100M-parameter GPT-2 model achieved relevance and diversity scores comparable to larger counterparts while significantly reducing inference latency and memory overhead.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

Delta Sampling: Data-Free Knowledge Transfer Across Diffusion Models

PositiveArtificial Intelligence

Delta Sampling (DS) has been introduced as a novel method for enabling data-free knowledge transfer across different diffusion models, particularly addressing the challenges faced when upgrading base models like Stable Diffusion. This method operates at inference time, utilizing the delta between model predictions before and after adaptation, thus facilitating the reuse of adaptation components across varying architectures.

Read full article

via arXiv — cs.LG