False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize

arXiv — cs.CL•Friday, November 21, 2025 at 5:00:00 AM

NegativeArtificial Intelligence

The study reveals that probing
This finding is significant as it underscores the potential risks associated with deploying LLMs in sensitive contexts, where misinterpretation of harmful instructions could lead to serious consequences.
The challenges of ensuring the reliability and safety of LLMs are echoed in ongoing discussions about their applications, including music recommendation systems and the need for improved evaluation frameworks.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Continue Readings

arXiv — cs.CV2 days ago

Automatic Uncertainty-Aware Synthetic Data Bootstrapping for Historical Map Segmentation

PositiveArtificial Intelligence

The automated analysis of historical maps has significantly improved due to advancements in deep learning, particularly in computer vision. However, the scarcity of annotated training data for specific historical map corpora poses a challenge. To address this, a method for generating synthetic historical maps by transferring the cartographic style of original maps onto vector data has been proposed, enabling the creation of an unlimited number of training samples for machine learning tasks.

Read full article

via arXiv — cs.CV

arXiv — cs.LG2 days ago

Adaptive Guided Upsampling for Low-light Image Enhancement

PositiveArtificial Intelligence

Adaptive Guided Upsampling (AGU) is a novel method for enhancing low-light images by optimizing multiple quality characteristics simultaneously, such as noise reduction and sharpness improvement. This technique utilizes a guided image approach to transfer features from a reference image to the target image. AGU addresses the challenges posed by high noise levels and low brightness in low-light images, enabling real-time high-quality image rendering from low-resolution inputs.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

Attention-Based Feature Online Conformal Prediction for Time Series

PositiveArtificial Intelligence

The paper presents Attention-Based Feature Online Conformal Prediction (AFOCP) for time series analysis, enhancing online conformal prediction (OCP) by addressing limitations in output space and historical observation treatment. AFOCP utilizes feature space from pre-trained neural networks and incorporates an attention mechanism to adaptively weight historical data, improving prediction accuracy amidst non-stationarity and distribution shifts.

Read full article

via arXiv — cs.LG

arXiv — cs.CV2 days ago

Simple Lines, Big Ideas: Towards Interpretable Assessment of Human Creativity from Drawings

PositiveArtificial Intelligence

The paper proposes a data-driven framework for assessing human creativity through drawings, addressing the limitations of subjective expert scoring. It emphasizes that creativity can be evaluated based on both content and style, enhancing the understanding of artistic expression. The framework includes an enriched dataset and a conditional model that predicts creativity scores, content, and style simultaneously.

Read full article

via arXiv — cs.CV

$Multi-Objective $\textit{min-max}$ Online Convex Optimization$

arXiv — cs.LG2 days ago

Multi-Objective $\textit{min-max}$ Online Convex Optimization

NeutralArtificial Intelligence

The paper discusses advancements in multi-objective online convex optimization (OCO), where an algorithm must select actions based on multiple loss function sequences revealed over time. The focus is on minimizing the 'min-max' regret, which compares the algorithm's performance to an optimal offline benchmark that knows all sequences in advance. This approach broadens the scope of traditional OCO by addressing multiple objectives simultaneously.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

CaberNet: Causal Representation Learning for Cross-Domain HVAC Energy Prediction

PositiveArtificial Intelligence

CaberNet is a proposed deep sequence model aimed at improving cross-domain HVAC energy prediction. It addresses the challenges of data scarcity and variability across different buildings and climates, which often lead to overfitting and reliance on expert intervention. By learning invariant representations without prior knowledge, CaberNet enhances the robustness of energy predictions in diverse settings.

Read full article

via arXiv — cs.LG

arXiv — cs.CV2 days ago

Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO

PositiveArtificial Intelligence

The article discusses the introduction of Video-Next-Event Prediction (VNEP), a new modality for predicting the next event in videos using dynamic video responses. This approach aims to enhance procedural learning by providing intuitive visual answers instead of text-based predictions. The challenge lies in the need for models to understand multimodal inputs and reasoning conditioned on instructions.

Read full article

via arXiv — cs.CV

arXiv — cs.CL2 days ago

HalluClean: A Unified Framework to Combat Hallucinations in LLMs

PositiveArtificial Intelligence

HalluClean is a new framework designed to detect and correct hallucinations in large language models (LLMs). This task-agnostic framework enhances factual reliability by breaking down the process into planning, execution, and revision stages. It utilizes minimal task-routing prompts for zero-shot generalization across various domains, demonstrating significant improvements in factual consistency across multiple tasks such as question answering and dialogue.

Read full article

via arXiv — cs.CL