Adversarial versification in portuguese as a jailbreak operator in LLMs

arXiv — cs.CL•Thursday, December 18, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

Recent research indicates that versification of prompts serves as an effective adversarial mechanism against aligned large language models (LLMs), demonstrating that poetic instructions can lead to significantly higher safety failures compared to prose. The study highlights that manually crafted poems achieve an approximate 62% attack success rate, while automated versions reach about 43%, with some models exceeding 90% in single-turn interactions.
This development is crucial as it exposes vulnerabilities in LLMs that have been trained with reinforcement learning from human feedback (RLHF) and other advanced methodologies. The findings suggest that the structural integrity of these systems can be compromised through minimal changes in prompt formatting, raising concerns about their reliability and safety in real-world applications.
The implications of this research resonate with ongoing discussions about prompt fairness and disparities in LLM responses, as well as the effectiveness of various prompting strategies across languages. The exploration of adversarial techniques, such as those introduced in this study, contributes to a broader understanding of how LLMs can be manipulated, highlighting the need for improved guardrails and evaluation frameworks to ensure equitable and safe interactions.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataView app details

Langfuse

Debug, monitor, and improve your complex LLM applications with ease.

Tech & Developer ToolsView app details

Langtail

Build and deploy robust LLM applications quickly with your team.

Business & ProductivityView app details

ConsoleX

Connect to all major LLMs in one unified development playground.

Business & ProductivityView app details

Mockmaster

Practice coding interviews with realistic questions and personalized feedback.

Business & ProductivityView app details

EloquentAI

Elevate your writing with AI-powered suggestions for clarity and impact.

Business & ProductivityView app details

Continue Readings

Ars Technica — Alla day ago

LLMs’ impact on science: Booming publications, stagnating quality

NegativeArtificial Intelligence

Recent studies indicate that the rise of large language models (LLMs) has led to an increase in the number of published research papers, yet the quality of these publications remains stagnant. Researchers are increasingly relying on LLMs for their work, which raises concerns about the depth and rigor of scientific inquiry.

Read full article

via Ars Technica — All

arXiv — cs.LG2 days ago

3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model

PositiveArtificial Intelligence

The introduction of 3DLLM-Mem marks a significant advancement in the capabilities of Large Language Models (LLMs) by integrating long-term spatial-temporal memory for enhanced reasoning in dynamic 3D environments. This model is evaluated using the 3DMem-Bench, which includes over 26,000 trajectories and 2,892 tasks designed to test memory utilization in complex scenarios.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

INFORM-CT: INtegrating LLMs and VLMs FOR Incidental Findings Management in Abdominal CT

PositiveArtificial Intelligence

A novel framework named INFORM-CT has been proposed to enhance the management of incidental findings in abdominal CT scans by integrating large language models (LLMs) and vision-language models (VLMs). This approach automates the detection, classification, and reporting processes, significantly improving efficiency compared to traditional manual inspections by radiologists.

Read full article

via arXiv — cs.LG

arXiv — cs.CL2 days ago

Learning inflection classes using Adaptive Resonance Theory

NeutralArtificial Intelligence

A recent study explores the learnability of verbal inflection classes using Adaptive Resonance Theory (ART), a neural network model. The research focuses on unsupervised clustering of lexemes into inflection classes, applied to languages such as Latin, Portuguese, and Estonian. The findings indicate that the effectiveness of clustering varies with the complexity of the inflectional system, with optimal performance observed at a specific generalization parameter.

Read full article

via arXiv — cs.CL

arXiv — cs.CL2 days ago

Evaluating Metrics for Safety with LLM-as-Judges

NeutralArtificial Intelligence

Large Language Models (LLMs) are being increasingly integrated into critical information processes, such as patient care and nuclear facility operations, raising concerns about their reliability and safety. The paper discusses the need for robust evaluation metrics to ensure LLMs can safely replace human roles in these contexts.

Read full article

via arXiv — cs.CL

arXiv — cs.CL2 days ago

SoMe: A Realistic Benchmark for LLM-based Social Media Agents

NeutralArtificial Intelligence

A new benchmark called SoMe has been introduced to evaluate large language model (LLM)-based social media agents, addressing the need for comprehensive assessment of their capabilities in understanding media content and user behavior. SoMe includes 8 tasks, over 9 million posts, and nearly 7,000 user profiles, making it a significant resource for researchers and developers in the field of AI and social media.

Read full article

via arXiv — cs.CL

arXiv — cs.CL2 days ago

Revisiting the Reliability of Language Models in Instruction-Following

NeutralArtificial Intelligence

Recent research highlights the limitations of advanced large language models (LLMs) in reliably following nuanced instructions, despite achieving high accuracy on benchmarks like IFEval. The study introduces a new metric, reliable@k, and reveals that performance can drop by up to 61.8% with subtle prompt variations.

Read full article

via arXiv — cs.CL

arXiv — cs.CL2 days ago

TaP: A Taxonomy-Guided Framework for Automated and Scalable Preference Data Generation

PositiveArtificial Intelligence

The TaP framework has been introduced to automate and scale the generation of preference datasets for large language models (LLMs), addressing the challenges of resource-intensive dataset construction and the predominance of English datasets. This framework is based on a structured taxonomy that ensures diversity and comprehensive coverage in dataset composition.

Read full article

via arXiv — cs.CL

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about