Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer

arXiv — cs.LG•Wednesday, December 3, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

The introduction of the Language model-initialized Prompt Decision Transformer (LPDT) framework marks a significant advancement in offline reinforcement learning (RL) by enhancing the few-shot prompt ability of Decision Transformers. This framework utilizes pre-trained language models to improve performance on unseen tasks, addressing challenges related to data collection and the limitations of traditional Prompt-DT methods.
This development is crucial as it allows for more efficient use of pre-collected datasets in RL tasks, potentially reducing the costs and risks associated with data collection in specific environments. By improving the prompt capabilities of Decision Transformers, LPDT could lead to better performance in various RL applications.
The evolution of RL methodologies, including the integration of pre-trained language models and frameworks like LPDT, reflects a broader trend towards enhancing model efficiency and adaptability. This shift is underscored by ongoing research into parameter-efficient fine-tuning techniques and the exploration of new architectures, which aim to optimize performance while minimizing resource demands.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataTry the app

Chattermate

Build and deploy AI support agents without writing any code.

AI & DataTry the app

Airparser

Extract and parse data from documents using GPT-4 automation.

AI & DataTry the app

Continue Readings

arXiv — cs.CV18 hours ago

MORPH: PDE Foundation Models with Arbitrary Data Modality

PositiveArtificial Intelligence

MORPH has been introduced as a modality-agnostic, autoregressive foundation model designed for partial differential equations (PDEs), utilizing a convolutional vision transformer backbone to manage diverse spatiotemporal datasets across various resolutions and data modalities. The model incorporates advanced techniques such as component-wise convolution and inter-field cross-attention to enhance its predictive capabilities.

Read full article

via arXiv — cs.CV

arXiv — cs.CL18 hours ago

Optimizing Fine-Tuning through Advanced Initialization Strategies for Low-Rank Adaptation

PositiveArtificial Intelligence

Recent advancements in fine-tuning methodologies have led to the introduction of IniLoRA, a novel initialization strategy designed to optimize Low-Rank Adaptation (LoRA) for large language models. IniLoRA initializes low-rank matrices to closely approximate original model weights, addressing limitations in performance seen with traditional LoRA methods. Experimental results demonstrate that IniLoRA outperforms LoRA across various models and tasks, with two additional variants, IniLoRA-$\alpha$ and IniLoRA-$\beta$, further enhancing performance.

Read full article

via arXiv — cs.CL

arXiv — cs.CL2 days ago

Dual LoRA: Enhancing LoRA with Magnitude and Direction Updates

PositiveArtificial Intelligence

A novel method called Dual LoRA has been proposed to enhance the performance of Low-Rank Adaptation (LoRA) in fine-tuning large language models (LLMs). This method introduces two distinct groups within low-rank matrices: a magnitude group for controlling the extent of parameter updates and a direction group for determining the update direction, thereby improving the adaptation process.

Read full article

via arXiv — cs.CL

arXiv — cs.CV2 days ago

NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable Adaptation

PositiveArtificial Intelligence

The introduction of NAS-LoRA represents a significant advancement in the adaptation of the Segment Anything Model (SAM) for specialized tasks, particularly in medical and agricultural imaging. This new Parameter-Efficient Fine-Tuning (PEFT) method integrates a Neural Architecture Search (NAS) block to enhance SAM's performance by addressing its limitations in acquiring high-level semantic information due to the lack of spatial priors in its Transformer encoder.

Read full article

via arXiv — cs.CV

arXiv — cs.CL2 days ago

SkillFactory: Self-Distillation For Learning Cognitive Behaviors

PositiveArtificial Intelligence

SkillFactory has introduced a method for fine-tuning language models to learn cognitive skills through a supervised fine-tuning stage before reinforcement learning, utilizing samples from the model itself to create effective training data. This approach aims to enhance the reasoning capabilities of models that do not initially exhibit these skills.

Read full article

via arXiv — cs.CL

arXiv — cs.CV2 days ago

LoRA Patching: Exposing the Fragility of Proactive Defenses against Deepfakes

NegativeArtificial Intelligence

A recent study highlights the vulnerabilities of proactive defenses against deepfakes, revealing that these defenses often lack the necessary robustness and reliability. The research introduces a novel technique called Low-Rank Adaptation (LoRA) patching, which effectively bypasses existing defenses by injecting adaptable patches into deepfake generators. This method also includes a Multi-Modal Feature Alignment loss to ensure semantic consistency in outputs.

Read full article

via arXiv — cs.CV

arXiv — cs.LG2 days ago

Delta Sampling: Data-Free Knowledge Transfer Across Diffusion Models

PositiveArtificial Intelligence

Delta Sampling (DS) has been introduced as a novel method for enabling data-free knowledge transfer across different diffusion models, particularly addressing the challenges faced when upgrading base models like Stable Diffusion. This method operates at inference time, utilizing the delta between model predictions before and after adaptation, thus facilitating the reuse of adaptation components across varying architectures.

Read full article

via arXiv — cs.LG

arXiv — stat.ML2 days ago

Interactive and Hybrid Imitation Learning: Provably Beating Behavior Cloning

PositiveArtificial Intelligence

Recent research in imitation learning (IL) has demonstrated that interactive methods can outperform traditional Behavior Cloning (BC) when annotation costs are measured per state. The study introduces algorithms like Stagger and Warm Stagger, which leverage both offline demonstrations and interactive annotations to enhance learning efficiency.

Read full article

via arXiv — stat.ML