CC30k: A Citation Contexts Dataset for Reproducibility-Oriented Sentiment Analysis

arXiv — cs.CLWednesday, November 12, 2025 at 5:00:00 AM
The introduction of the CC30k dataset marks a significant advancement in the field of sentiment analysis, particularly focusing on reproducibility in machine learning research. With 30,734 citation contexts labeled as Positive, Negative, or Neutral, the dataset provides a robust resource for understanding community sentiments about the reproducibility of cited works. Notably, 25,829 of these labels were generated through crowdsourcing, ensuring a high labeling accuracy of 94%. This initiative not only fills a critical gap in existing resources for computational reproducibility studies but also enhances the performance of large language models in sentiment classification tasks. By systematically studying the correlation between sentiments and reproducibility, researchers can better assess the validity of published findings, thereby fostering greater trust in scientific literature.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
Mobile Jamming Mitigation in 5G Networks: A MUSIC-Based Adaptive Beamforming Approach
PositiveArtificial Intelligence
Mobile jammers present a significant threat to 5G networks, especially in military settings. An innovative anti-jamming framework has been proposed, utilizing Multiple Signal Classification (MUSIC) for precise Direction-of-Arrival (DoA) estimation and Minimum Variance Distortionless Response (MVDR) beamforming for adaptive interference suppression. Machine learning enhances DoA prediction for mobile jammers. Simulations indicate an average Signal-to-Noise Ratio (SNR) improvement of 9.58 dB and a DoA estimation accuracy of up to 99.8%, showcasing the framework's effectiveness in dynamic environ…
Soft-Label Training Preserves Epistemic Uncertainty
PositiveArtificial Intelligence
The article discusses the concept of soft-label training in machine learning, which preserves epistemic uncertainty by treating annotation distributions as ground truth. Traditional methods often collapse diverse human judgments into single labels, leading to misalignment between model certainty and human perception. Empirical results show that soft-label training reduces KL divergence from human annotations by 32% and enhances correlation between model and annotation entropy by 61%, while maintaining accuracy comparable to hard-label training.
Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning
PositiveArtificial Intelligence
The paper presents SEPAL, a Scalable Embedding Propagation Algorithm aimed at improving the use of large knowledge graphs in machine learning. Current models face limitations in optimizing for link prediction and require extensive engineering for large graphs due to GPU memory constraints. SEPAL addresses these issues by ensuring global embedding consistency through localized optimization and message passing, evaluated across seven large-scale knowledge graphs for various downstream tasks.
DataSage: Multi-agent Collaboration for Insight Discovery with External Knowledge Retrieval, Multi-role Debating, and Multi-path Reasoning
PositiveArtificial Intelligence
DataSage is a novel multi-agent framework designed to enhance insight discovery in data analytics. It addresses limitations of existing data insight agents by incorporating external knowledge retrieval, a multi-role debating mechanism, and multi-path reasoning. These features aim to improve the depth of analysis and the accuracy of insights generated, thereby assisting organizations in making informed decisions in a data-driven environment.
Automatic Fact-checking in English and Telugu
NeutralArtificial Intelligence
The research paper explores the challenge of false information and the effectiveness of large language models (LLMs) in verifying factual claims in English and Telugu. It presents a bilingual dataset and evaluates various approaches for classifying the veracity of claims. The study aims to enhance the efficiency of fact-checking processes, which are often labor-intensive and time-consuming.
A Machine Learning-Based Multimodal Framework for Wearable Sensor-Based Archery Action Recognition and Stress Estimation
PositiveArtificial Intelligence
A new machine learning-based multimodal framework has been developed for wearable sensor-based archery action recognition and stress estimation. This innovative system utilizes a wrist-worn device equipped with an accelerometer and photoplethysmography (PPG) sensor to collect synchronized motion and physiological data during archery sessions. The framework achieves high accuracy in motion recognition and stress estimation, marking a significant advancement in the analysis of athletes' performance in precision sports.
FlakyGuard: Automatically Fixing Flaky Tests at Industry Scale
PositiveArtificial Intelligence
Flaky tests, which unpredictably pass or fail, hinder developer productivity and delay software releases. FlakyGuard is introduced as a solution that leverages large language models (LLMs) to automatically repair these tests. Unlike previous methods like FlakyDoctor, FlakyGuard effectively addresses the context problem by structuring code as a graph and selectively exploring relevant contexts. Evaluation of FlakyGuard on real-world tests indicates a repair success rate of 47.6%, with 51.8% of fixes accepted by developers, marking a significant improvement over existing approaches.
GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification
PositiveArtificial Intelligence
The article presents a new framework called GMAT, which enhances Multiple Instance Learning (MIL) for whole slide image (WSI) classification. By integrating vision-language models (VLMs), GMAT aims to improve the generation of clinical descriptions that are more expressive and medically specific. This addresses limitations in existing methods that rely on large language models (LLMs) for generating descriptions, which often lack domain grounding and detailed medical specificity, thus improving alignment with visual features.