Model-Based Learning of Whittle indices

arXiv — cs.LG•Wednesday, November 26, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A new model-based algorithm named BLINQ has been introduced, which learns the Whittle indices of an indexable, communicating, and unichain Markov Decision Process (MDP). This approach builds an empirical estimate of the MDP and computes its Whittle indices using an enhanced version of an existing algorithm, demonstrating convergence and computational efficiency.
The significance of BLINQ lies in its ability to outperform traditional Q-learning methods, requiring fewer samples for accurate approximations while maintaining lower computational costs, which could enhance decision-making processes in various applications.
This development highlights a growing trend in reinforcement learning, where advancements in algorithms like BLINQ and frameworks addressing non-stationary environments, such as Non-stationary and Varying-discounting MDPs, are reshaping the landscape of AI. The integration of these methodologies could lead to more robust and adaptable systems in complex environments.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Portfolio Backtest

AI-powered portfolio backtesting for data-driven investment strategies.

AI & DataTry the app

Hypertune

Optimize machine learning models with automated hyperparameter tuning and experiment tracking.

Business & ProductivityTry the app

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataTry the app

Continue Readings

arXiv — cs.LG2 days ago

Complexity Reduction Study Based on RD Costs Approximation for VVC Intra Partitioning

NeutralArtificial Intelligence

A recent study has been conducted on the Versatile Video Codec (VVC) intra partitioning, focusing on reducing complexity in the Rate-Distortion Optimization (RDO) process. The research proposes two machine learning techniques that utilize the Rate-Distortion costs of neighboring blocks, aiming to enhance the efficiency of the exhaustive search typically required in video coding.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

Value Improved Actor Critic Algorithms

PositiveArtificial Intelligence

Recent advancements in Actor Critic algorithms have led to the proposal of a new framework that decouples the acting policy from the critic's policy, allowing for more aggressive updates to the critic while maintaining stability in the acting policy. This approach aims to enhance the learning process in decision-making problems by balancing greedification with stability.

Read full article

via arXiv — cs.LG

arXiv — cs.LG3 days ago

Physical Reinforcement Learning

NeutralArtificial Intelligence

Recent advancements in Contrastive Local Learning Networks (CLLNs) have demonstrated their potential for reinforcement learning (RL) applications, particularly in energy-limited environments. This study successfully applied Q-learning techniques to simulated CLLNs, showcasing their robustness and low power consumption compared to traditional digital systems.

Read full article

via arXiv — cs.LG

arXiv — cs.LG3 days ago

Reinforcement Learning for Self-Healing Material Systems

PositiveArtificial Intelligence

A recent study has framed the self-healing process of material systems as a Reinforcement Learning (RL) problem within a Markov Decision Process (MDP), demonstrating that RL agents can autonomously derive optimal policies for maintaining structural integrity while managing resource consumption. The research highlighted the superior performance of continuous-action agents, particularly the TD3 agent, in achieving near-complete material recovery compared to traditional heuristic methods.

Read full article

via arXiv — cs.LG

arXiv — stat.ML3 days ago

Non-stationary and Varying-discounting Markov Decision Processes for Reinforcement Learning

PositiveArtificial Intelligence

The introduction of the Non-stationary and Varying-discounting Markov Decision Processes (NVMDP) framework addresses the limitations faced by traditional stationary Markov Decision Processes (MDPs) in non-stationary environments. This framework allows for varying discount rates over time and transitions, making it applicable to both finite and infinite-horizon tasks.

Read full article

via arXiv — stat.ML

arXiv — cs.LG3 days ago

AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention

PositiveArtificial Intelligence

AVA-VLA is a newly proposed framework aimed at enhancing Vision-Language-Action (VLA) models by integrating Active Visual Attention (AVA) to improve visual processing in dynamic decision-making contexts. This approach addresses the limitations of traditional VLA models that operate independently at each timestep, which can hinder effective contextual understanding in sequential tasks.

Read full article

via arXiv — cs.LG

arXiv — cs.LG3 days ago

First-order Sobolev Reinforcement Learning

PositiveArtificial Intelligence

A new refinement in temporal-difference learning has been proposed, emphasizing first-order Bellman consistency. This approach trains the learned value function to align with both the Bellman targets and their derivatives, enhancing the stability and convergence of reinforcement learning algorithms like Q-learning and actor-critic methods.

Read full article

via arXiv — cs.LG

arXiv — cs.LG3 days ago

Q-Learning-Based Time-Critical Data Aggregation Scheduling in IoT

PositiveArtificial Intelligence

A novel Q-learning framework has been proposed for time-critical data aggregation scheduling in Internet of Things (IoT) networks, aiming to reduce latency in applications such as smart cities and industrial automation. This approach integrates aggregation tree construction and scheduling into a unified model, enhancing efficiency and scalability.

Read full article

via arXiv — cs.LG