Comparative Analysis of Vision Transformer, Convolutional, and Hybrid Architectures for Mental Health Classification Using Actigraphy-Derived Images

arXiv — cs.LGTuesday, December 2, 2025 at 5:00:00 AM
  • A comparative analysis was conducted on three image-based methods—VGG16, ViT-B/16, and CoAtNet-Tiny—to classify mental health conditions such as depression and schizophrenia using actigraphy-derived images. The study utilized wrist-worn activity signals from the Psykose and Depresjon datasets, converting them into images for evaluation. CoAtNet-Tiny emerged as the most reliable method, achieving the highest average accuracy and stability across different data folds.
  • The findings are significant as they highlight the potential of advanced machine learning architectures in improving mental health diagnostics. The superior performance of CoAtNet-Tiny, particularly in identifying underrepresented classes like depression and schizophrenia, suggests that such technologies could enhance clinical assessments and interventions in mental health care.
  • This research aligns with ongoing advancements in artificial intelligence applications in healthcare, particularly in image recognition and classification. The integration of Vision Transformers and convolutional networks reflects a broader trend towards utilizing sophisticated algorithms for more accurate medical assessments, echoing similar developments in areas like histopathology and noise image recognition.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
LightHCG: a Lightweight yet powerful HSIC Disentanglement based Causal Glaucoma Detection Model framework
PositiveArtificial Intelligence
A new framework named LightHCG has been introduced for glaucoma detection, leveraging HSIC disentanglement and advanced AI models like Vision Transformers and VGG16. This model aims to enhance the accuracy of glaucoma diagnosis by analyzing retinal images, addressing the limitations of traditional diagnostic methods that rely heavily on subjective assessments and manual measurements.
Parameter Reduction Improves Vision Transformers: A Comparative Study of Sharing and Width Reduction
PositiveArtificial Intelligence
A recent study on Vision Transformers (ViTs) highlights the effectiveness of two parameter-reduction strategies, GroupedMLP and ShallowMLP, which improve model accuracy and training stability while reducing the number of parameters by 32.7%. The GroupedMLP variant achieved 81.47% top-1 accuracy, while ShallowMLP reached 81.25% accuracy with increased inference throughput. Both models surpassed the baseline accuracy of 81.05% for ViT-B/16 trained on ImageNet-1K.