Comparative Analysis of Vision Transformer, Convolutional, and Hybrid Architectures for Mental Health Classification Using Actigraphy-Derived Images

Comparative Analysis of Vision Transformer, Convolutional, and Hybrid Architectures for Mental Health Classification Using Actigraphy-Derived Images

arXiv — cs.LG•Tuesday, December 2, 2025 at 5:00:00 AM

A comparative analysis was conducted on three image-based methods—VGG16, ViT-B/16, and CoAtNet-Tiny—to classify mental health conditions such as depression and schizophrenia using actigraphy-derived images. The study utilized wrist-worn activity signals from the Psykose and Depresjon datasets, converting them into images for evaluation. CoAtNet-Tiny emerged as the most reliable method, achieving the highest average accuracy and stability across different data folds.
The findings are significant as they highlight the potential of advanced machine learning architectures in improving mental health diagnostics. The superior performance of CoAtNet-Tiny, particularly in identifying underrepresented classes like depression and schizophrenia, suggests that such technologies could enhance clinical assessments and interventions in mental health care.
This research aligns with ongoing advancements in artificial intelligence applications in healthcare, particularly in image recognition and classification. The integration of Vision Transformers and convolutional networks reflects a broader trend towards utilizing sophisticated algorithms for more accurate medical assessments, echoing similar developments in areas like histopathology and noise image recognition.

— via World Pulse Now AI Editorial System

Comparative Analysis of Vision Transformer, Convolutional, and Hybrid Architectures for Mental Health Classification Using Actigraphy-Derived Images