On the Difficulty of Token-Level Modeling of Dysfluency and Fluency Shaping Artifacts

arXiv — cs.CL•Wednesday, December 3, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

A recent study highlights the challenges of automatic transcription of stuttered speech, revealing that current end-to-end automatic speech recognition (ASR) frameworks often overlook dysfluencies and fluency-shaping artifacts, leading to non-verbatim transcriptions. The researchers propose a parameter-efficient adaptation method to better decode these speech patterns, evaluated on both simulated and natural stuttered speech datasets.
This development is significant as it addresses a critical gap in the transcription of dysfluent speech, which has limited clinical and research value. By introducing a multi-step fine-tuning strategy with language-adaptive pretraining, the study aims to improve ASR performance, particularly for non-English languages like German, which have been historically underrepresented in ASR training data.
The findings resonate with ongoing discussions in the field of AI regarding the biases inherent in language models and the importance of inclusivity in training datasets. As the demand for more accurate and diverse speech recognition technologies grows, this research underscores the necessity for adaptive techniques that can enhance ASR systems' capabilities across various languages and dialects.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Humanize AI

Transform AI-generated text into undetectable, human-like content effortlessly.

Business & ProductivityTry the app

FluentDictation

Practice English dictation with any YouTube video to improve your listening skills.

Lifestyle & HealthTry the app

Airparser

Extract and parse data from documents using GPT-4 automation.

AI & DataTry the app

Continue Readings

arXiv — cs.CL18 hours ago

NLP Datasets for Idiom and Figurative Language Tasks

NeutralArtificial Intelligence

A new paper on arXiv presents datasets aimed at improving the understanding of idiomatic and figurative language in Natural Language Processing (NLP). These datasets are designed to assist large language models (LLMs) in better interpreting informal language, which has become increasingly prevalent in social media and everyday communication.

Read full article

via arXiv — cs.CL

arXiv — cs.CL18 hours ago

Robust Multimodal Sentiment Analysis of Image-Text Pairs by Distribution-Based Feature Recovery and Fusion

PositiveArtificial Intelligence

A new method for robust multimodal sentiment analysis of image-text pairs has been proposed, addressing challenges related to low-quality and missing modalities. The Distribution-based feature Recovery and Fusion (DRF) technique utilizes a feature queue for each modality to approximate feature distributions, enhancing sentiment prediction accuracy in real-world applications.

Read full article

via arXiv — cs.CL

arXiv — cs.CL18 hours ago

ZIP-RC: Optimizing Test-Time Compute via Zero-Overhead Joint Reward-Cost Prediction

PositiveArtificial Intelligence

The recent introduction of ZIP-RC, an adaptive inference method, aims to optimize test-time compute for large language models (LLMs) by enabling zero-overhead joint reward-cost prediction. This innovation addresses the limitations of existing test-time scaling methods, which often lead to increased costs and latency due to fixed sampling budgets and a lack of confidence signals.

Read full article

via arXiv — cs.CL

arXiv — cs.CL18 hours ago

Identifying attributions of causality in political text

NeutralArtificial Intelligence

A new framework has been introduced for identifying attributions of causality in political text, utilizing a lightweight causal language model to generate structured data sets of causal claims. This approach aims to enhance the systematic analysis of explanations in political science, an area that has been historically fragmented and underdeveloped.

Read full article

via arXiv — cs.CL

arXiv — cs.CL18 hours ago

A Group Fairness Lens for Large Language Models

PositiveArtificial Intelligence

A recent study introduces a group fairness lens for evaluating large language models (LLMs), proposing a novel hierarchical schema to assess bias and fairness. The research presents the GFAIR dataset and introduces GF-THINK, a method aimed at mitigating biases in LLMs, highlighting the critical need for broader evaluations of these models beyond traditional metrics.

Read full article

via arXiv — cs.CL

arXiv — cs.CL18 hours ago

Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping

PositiveArtificial Intelligence

The Culture Affordance Atlas has been introduced as a function-centric framework aimed at addressing cultural biases in mainstream Vision-Language datasets, which often favor higher-income, Western contexts. This initiative involves a re-annotation of the Dollar Street dataset, categorizing 288 objects based on 46 functions to enhance model generalizability across diverse cultural and economic backgrounds.

Read full article

via arXiv — cs.CL

arXiv — cs.CL18 hours ago

Teaching Old Tokenizers New Words: Efficient Tokenizer Adaptation for Pre-trained Models

PositiveArtificial Intelligence

Recent research has introduced a novel approach to tokenizer adaptation for pre-trained language models, focusing on vocabulary extension and pruning. The method, termed continued BPE training, enhances tokenization efficiency by continuing the BPE merge learning process on new data, while leaf-based vocabulary pruning removes redundant tokens without compromising model quality.

Read full article

via arXiv — cs.CL

arXiv — cs.CL18 hours ago

SELF: A Robust Singular Value and Eigenvalue Approach for LLM Fingerprinting

PositiveArtificial Intelligence

A novel intrinsic weight-based fingerprinting scheme named SELF has been proposed to enhance the protection of Intellectual Property (IP) in Large Language Models (LLMs). This approach utilizes singular value and eigenvalue decomposition of LLM attention weights to create unique and transformation-invariant fingerprints, addressing vulnerabilities in existing methods that are susceptible to false claims and weight manipulations.

Read full article

via arXiv — cs.CL