IndicGEC: Powerful Models, or a Measurement Mirage?

arXiv — cs.CL•Thursday, November 20, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

TeamNRC participated in the BHASHA-Task 1 Grammatical Error Correction task, achieving significant results in Telugu and Hindi while exploring the effectiveness of smaller language models across five Indian languages.
This development highlights the growing capabilities of language models in addressing grammatical errors, which is crucial for improving language processing technologies in diverse linguistic contexts.
The findings resonate with ongoing discussions about the efficacy of large versus small language models, as well as the importance of high-quality datasets and appropriate evaluation metrics for Indian languages.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.CL10 hours ago

Investigating Hallucination in Conversations for Low Resource Languages

NeutralArtificial Intelligence

Large Language Models (LLMs) have shown exceptional ability in text generation but often produce factually incorrect statements, known as 'hallucinations'. This study investigates hallucinations in conversational data across three low-resource languages: Hindi, Farsi, and Mandarin. The analysis of various LLMs, including GPT-3.5 and GPT-4o, reveals that while Mandarin has few hallucinated responses, Hindi and Farsi exhibit significantly higher rates of inaccuracies.

Read full article

via arXiv — cs.CL

arXiv — cs.LG10 hours ago

HinTel-AlignBench: A Framework and Benchmark for Hindi-Telugu with English-Aligned Samples

NeutralArtificial Intelligence

HinTel-AlignBench is a newly proposed framework aimed at evaluating multilingual Vision-Language Models (VLMs) in Indian languages, specifically Hindi and Telugu, with English-aligned samples. The framework addresses limitations in current evaluations, such as reliance on unverified translations and narrow task coverage. It includes a semi-automated dataset creation process that combines back-translation and human verification, contributing to the advancement of equitable AI for low-resource languages.

Read full article

via arXiv — cs.LG

arXiv — cs.CLa day ago

Automatic Fact-checking in English and Telugu

NeutralArtificial Intelligence

The research paper explores the challenge of false information and the effectiveness of large language models (LLMs) in verifying factual claims in English and Telugu. It presents a bilingual dataset and evaluates various approaches for classifying the veracity of claims. The study aims to enhance the efficiency of fact-checking processes, which are often labor-intensive and time-consuming.

Read full article

via arXiv — cs.CL

arXiv — cs.CLa day ago

Segmentation Beyond Defaults: Asymmetrical Byte Pair Encoding for Optimal Machine Translation Performance

PositiveArtificial Intelligence

Current research in Machine Translation (MT) typically employs symmetric Byte Pair Encoding (BPE) for word segmentation, applying the same number of merge operations to both source and target languages. This study reveals that such an approach does not ensure optimal performance across various language pairs and data sizes. By utilizing asymmetric BPE, which allows different merge operations for source and target languages, significant improvements in MT performance were observed, particularly in low-resource settings.

Read full article

via arXiv — cs.CL