CC30k: A Citation Contexts Dataset for Reproducibility-Oriented Sentiment Analysis
PositiveArtificial Intelligence
The introduction of the CC30k dataset marks a significant advancement in the field of sentiment analysis, particularly focusing on reproducibility in machine learning research. With 30,734 citation contexts labeled as Positive, Negative, or Neutral, the dataset provides a robust resource for understanding community sentiments about the reproducibility of cited works. Notably, 25,829 of these labels were generated through crowdsourcing, ensuring a high labeling accuracy of 94%. This initiative not only fills a critical gap in existing resources for computational reproducibility studies but also enhances the performance of large language models in sentiment classification tasks. By systematically studying the correlation between sentiments and reproducibility, researchers can better assess the validity of published findings, thereby fostering greater trust in scientific literature.
— via World Pulse Now AI Editorial System
