Creation of the Estonian Subjectivity Dataset: Assessing the Degree of Subjectivity on a Scale
NeutralArtificial Intelligence
- The Estonian Subjectivity Dataset has been created to assess document-level subjectivity in the Estonian language, comprising 1,000 documents rated on a scale from 0 (objective) to 100 (subjective) by four annotators. Initial experiments using a large language model (LLM) like GPT-5 for automatic subjectivity analysis showed promising results, although some discrepancies with human annotations were noted.
- This dataset is significant as it enhances the understanding of subjectivity in Estonian texts, potentially benefiting various applications in natural language processing, sentiment analysis, and automated content evaluation. The findings may also inform future developments in LLMs and their integration into language-specific tasks.
- The creation of this dataset aligns with ongoing efforts to improve language processing tools for underrepresented languages, such as Estonian and Basque. These initiatives highlight the importance of developing localized datasets that can enhance the performance of AI models in diverse linguistic contexts, addressing challenges in translation and automated scoring systems.
— via World Pulse Now AI Editorial System


