Sub-exponential Growth of New Words and Names Online: A Piecewise Power-Law Model

arXiv — cs.CL•Wednesday, November 12, 2025 at 5:00:00 AM

The recent study on the sub-exponential growth of new words and names online introduces a piecewise power-law model, challenging traditional S-shaped growth models. Analyzing a dataset of around one billion Japanese blog articles connected to Wikipedia vocabulary, researchers found that 55% of the examined diffusion patterns displayed sub-exponential growth, a phenomenon previously neglected in broader social contexts. This research not only highlights the prevalence of sub-exponential growth, with a shape parameter mode near 0.5, but also emphasizes that the peak diffusion scale is primarily influenced by the growth rate. By systematically analyzing 2,963 items, the study reveals consistent patterns across web search trends in English, Spanish, and Japanese, suggesting that the dynamics of language evolution in the digital realm are more complex than previously understood.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.CL2 days ago

Evaluating Modern Large Language Models on Low-Resource and Morphologically Rich Languages:A Cross-Lingual Benchmark Across Cantonese, Japanese, and Turkish

NeutralArtificial Intelligence

A recent study evaluates the performance of seven advanced large language models (LLMs) on low-resource and morphologically rich languages, specifically Cantonese, Japanese, and Turkish. The research highlights the models' effectiveness in tasks such as open-domain question answering, document summarization, translation, and culturally grounded dialogue. Despite impressive results in high-resource languages, the study indicates that the effectiveness of LLMs in these less-studied languages remains underexplored.

Read full article

via arXiv — cs.CL

arXiv — cs.CL2 days ago

LaoBench: A Large-Scale Multidimensional Lao Benchmark for Large Language Models

PositiveArtificial Intelligence

LaoBench is a newly introduced large-scale benchmark dataset aimed at evaluating large language models (LLMs) in the Lao language. It consists of over 17,000 curated samples that assess knowledge application, foundational education, and bilingual translation among Lao, Chinese, and English. The dataset is designed to enhance the understanding and reasoning capabilities of LLMs in low-resource languages, addressing the current challenges faced by models in mastering Lao.

Read full article

via arXiv — cs.CL

arXiv — cs.CV2 days ago

Comprehension of Multilingual Expressions Referring to Target Objects in Visual Inputs

PositiveArtificial Intelligence

The study on Referring Expression Comprehension (REC) focuses on localizing objects in images using natural language descriptions. Despite the global need for multilingual applications, existing research has been primarily English-centric. This work introduces a unified multilingual dataset covering 10 languages, created by expanding 12 English benchmarks through machine translation, resulting in about 8 million expressions across 177,620 images and 336,882 annotated objects. Additionally, a new attention-anchored neural architecture is proposed to enhance REC performance.

Read full article

via arXiv — cs.CV

arXiv — cs.CL2 days ago

TEDxTN: A Three-way Speech Translation Corpus for Code-Switched Tunisian Arabic - English

PositiveArtificial Intelligence

The TEDxTN project introduces the first publicly available speech translation dataset for Tunisian Arabic to English. This dataset includes 108 TEDx talks, totaling 25 hours of speech, featuring code-switching and various regional accents from Tunisia. The corpus aims to address the data scarcity issue for Arabic dialects and is accompanied by publicly available annotation guidelines, enabling future expansions.

Read full article

via arXiv — cs.CL