Automated Analysis of Learning Outcomes and Exam Questions Based on Bloom's Taxonomy

arXiv — cs.CLMonday, November 17, 2025 at 5:00:00 AM
- The study focuses on the automatic classification of exam questions and learning outcomes using Bloom's Taxonomy, processing a dataset of 600 sentences across six cognitive categories. This research highlights the effectiveness of machine learning models, particularly Support Vector Machines, which achieved a notable 94% accuracy, indicating a significant advancement in educational assessment methodologies. The findings underscore the challenges of training complex models on limited data, reflecting a broader trend in AI research towards optimizing performance while managing overfitting.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
SmallML: Bayesian Transfer Learning for Small-Data Predictive Analytics
PositiveArtificial Intelligence
Small and medium-sized enterprises (SMEs) constitute 99.9% of U.S. businesses but face challenges in accessing AI technologies due to their limited data. The paper presents SmallML, a Bayesian transfer learning framework that enables accurate predictions with small datasets of 50-200 observations. It features a three-layer architecture that combines transfer learning, hierarchical Bayesian modeling, and conformal prediction to enhance predictive analytics for SMEs.
Spectral Neuro-Symbolic Reasoning II: Semantic Node Merging, Entailment Filtering, and Knowledge Graph Alignment
PositiveArtificial Intelligence
The report on Spectral Neuro-Symbolic Reasoning II introduces enhancements to the existing framework, focusing on three key areas: transformer-based node merging to reduce redundancy, sentence-level entailment validation for improved edge quality, and alignment with external knowledge graphs to provide additional context. These modifications aim to enhance the fidelity of knowledge graphs while maintaining the spectral reasoning pipeline. Experimental results indicate accuracy gains of up to 3.8% across various benchmarks, including ProofWriter and CLUTRR.
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
NeutralArtificial Intelligence
The study examines the performance of pretrained transformer-encoder models, specifically ModernBERT and DeBERTaV3. While ModernBERT claims improved performance on various benchmarks, the lack of shared training data complicates the assessment of these gains. A controlled study pretraining ModernBERT on the same dataset as CamemBERTaV2 reveals that DeBERTaV3 outperforms ModernBERT in sample efficiency and overall benchmark performance, although ModernBERT offers advantages in long context support and training speed.
Analysing Personal Attacks in U.S. Presidential Debates
PositiveArtificial Intelligence
Personal attacks have increasingly characterized U.S. presidential debates, influencing public perception during elections. This study presents a framework for analyzing such attacks using manual annotation of debate transcripts from the 2016, 2020, and 2024 election cycles. By leveraging advancements in deep learning, particularly BERT and large language models, the research aims to enhance the detection of harmful language in political discourse, providing valuable insights for journalists and the public.
Reinforcing Stereotypes of Anger: Emotion AI on African American Vernacular English
NegativeArtificial Intelligence
Automated emotion detection systems are increasingly utilized in various fields, including mental health and hiring. However, these models often fail to accurately recognize emotional expressions in dialects like African American Vernacular English (AAVE) due to reliance on dominant cultural norms. A study analyzing 2.7 million tweets from Los Angeles found that emotion recognition models exhibited significantly higher false positive rates for anger in AAVE compared to General American English (GAE), highlighting the limitations of current emotion AI technologies.
AttentiveGRUAE: An Attention-Based GRU Autoencoder for Temporal Clustering and Behavioral Characterization of Depression from Wearable Data
PositiveArtificial Intelligence
The study introduces AttentiveGRUAE, an attention-based gated recurrent unit (GRU) autoencoder aimed at temporal clustering and predicting depression outcomes from wearable data. The model optimizes three objectives: learning a compact latent representation of daily behaviors, predicting end-of-period depression rates, and identifying behavioral subtypes through Gaussian Mixture Model (GMM) clustering. Evaluated on longitudinal sleep data from 372 participants, AttentiveGRUAE outperformed baseline models in clustering quality and depression classification metrics.
Learn to Select: Exploring Label Distribution Divergence for In-Context Demonstration Selection in Text Classification
PositiveArtificial Intelligence
The article discusses a novel approach to in-context learning (ICL) for text classification, emphasizing the importance of selecting appropriate demonstrations. Traditional methods often prioritize semantic similarity, neglecting label distribution alignment, which can impact performance. The proposed method, TopK + Label Distribution Divergence (L2D), utilizes a fine-tuned BERT-like small language model to generate label distributions and assess their divergence. This dual focus aims to enhance the effectiveness of demonstration selection in large language models (LLMs).
How Data Quality Affects Machine Learning Models for Credit Risk Assessment
PositiveArtificial Intelligence
Machine Learning (ML) models are increasingly used for credit risk evaluation, with their effectiveness dependent on data quality. This research investigates the impact of data quality issues such as missing values, noisy attributes, outliers, and label errors on the predictive accuracy of ML models. Using an open-source dataset, the study assesses the robustness of ten commonly used models, including Random Forest, SVM, and Logistic Regression, revealing significant differences in model performance based on data degradation.