GRASP: GRouped Activation Shared Parameterization for Parameter-Efficient Fine-Tuning and Robust Inference of Transformers

arXiv — cs.LGFriday, December 5, 2025 at 5:00:00 AM
  • A new framework called GRASP (GRouped Activation Shared Parameterization) has been introduced for parameter-efficient fine-tuning of transformers, allowing for the training of large pre-trained models by updating only a small subset of parameters. This method partitions token representations into groups, learning shared scaling and shifting vectors to enhance model performance while significantly reducing the number of trainable parameters.
  • The development of GRASP is significant as it provides a scalable solution for adapting large language models like RoBERTa and GPT-2 to specific tasks without the need for extensive computational resources. This efficiency can lead to broader accessibility and application of advanced AI models in various fields, enhancing their usability in real-world scenarios.
  • This advancement aligns with ongoing trends in AI research focusing on optimizing model performance while minimizing resource consumption. Techniques such as the Length-MAX tokenizer and adaptive optimizers like AdamHD are also emerging, reflecting a collective effort in the AI community to improve the efficiency and robustness of language models, which are increasingly vital in applications ranging from natural language processing to multimodal tasks.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
Network of Theseus (like the ship)
PositiveArtificial Intelligence
The Network of Theseus (NoT) introduces a novel approach in deep learning by allowing the transformation of a guide network architecture into a different target architecture while maintaining performance. This method challenges the traditional assumption that the architecture used during training must remain unchanged during inference, thereby offering flexibility in model design and optimization.
Dual LoRA: Enhancing LoRA with Magnitude and Direction Updates
PositiveArtificial Intelligence
A novel method called Dual LoRA has been proposed to enhance the performance of Low-Rank Adaptation (LoRA) in fine-tuning large language models (LLMs). This method introduces two distinct groups within low-rank matrices: a magnitude group for controlling the extent of parameter updates and a direction group for determining the update direction, thereby improving the adaptation process.
Idea-Gated Transformers: Enforcing Semantic Coherence via Differentiable Vocabulary Pruning
PositiveArtificial Intelligence
The Idea-Gated Transformer has been introduced as a novel architecture aimed at addressing the issue of 'Topic Drift' in Autoregressive Language Models (LLMs) during text generation. This model separates semantic planning from syntactic generation by utilizing an auxiliary 'Idea Head' that predicts future context, allowing for real-time vocabulary pruning to enhance coherence in generated text.
Scaling Multimodal Search and Recommendation with Small Language Models via Upside-Down Reinforcement Learning
PositiveArtificial Intelligence
A recent study has demonstrated the potential of small language models (SLMs) to effectively support multimodal search and recommendation tasks, utilizing a framework that integrates upside-down reinforcement learning and synthetic data distillation from larger models like Llama-3. The 100M-parameter GPT-2 model achieved relevance and diversity scores comparable to larger counterparts while significantly reducing inference latency and memory overhead.
What Signals Really Matter for Misinformation Tasks? Evaluating Fake-News Detection and Virality Prediction under Real-World Constraints
NeutralArtificial Intelligence
An evaluation-driven study has been conducted on two key tasks related to online misinformation: fake-news detection and virality prediction. Utilizing the EVONS and FakeNewsNet datasets, the study compares various models, including RoBERTa and GRU, highlighting that textual content is a strong discriminator for fake-news detection, while numeric features remain viable under constraints.