An Improved Ensemble-Based Machine Learning Model with Feature Optimization for Early Diabetes Prediction

arXiv — cs.LG•Wednesday, December 3, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A new machine learning model has been developed for early diabetes prediction, utilizing the BRFSS dataset, which includes over 253,680 records. The model employs various supervised learning techniques, including ensemble methods like stacking, achieving a strong ROC-AUC performance of approximately 0.96 with models such as Random Forest, XGBoost, CatBoost, and LightGBM.
This advancement is significant as it enhances the accuracy and comprehensibility of diabetes classification, which is crucial for timely clinical decision-making and effective intervention strategies in managing diabetes, a major global health concern.
The development reflects a broader trend in healthcare analytics, where machine learning techniques are increasingly applied to predict various health risks, including cancer and cardiovascular diseases, highlighting the importance of early detection and intervention across multiple health domains.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

One More Thing in AI

Master AI with curated tools and tutorials for practical, real-world applications.

AI & DataTry the app

Twofold Health

Automate medical documentation with AI for accuracy, security, and seamless integration.

AI & DataTry the app

Formula Bot

Analyze, visualize, and enrich your data with AI-powered insights and charts.

AI & DataTry the app

Continue Readings

arXiv — stat.MLa day ago

EcoCast: A Spatio-Temporal Model for Continual Biodiversity and Climate Risk Forecasting

PositiveArtificial Intelligence

EcoCast is a newly proposed spatio-temporal model aimed at continual biodiversity and climate risk forecasting, particularly in ecologically diverse regions like Africa. This model leverages multisource satellite imagery, climate data, and citizen science records to predict near-term shifts in species distributions using advanced machine learning techniques.

Read full article

via arXiv — stat.ML

arXiv — cs.LGa day ago

Hybrid(Penalized Regression and MLP) Models for Outcome Prediction in HDLSS Health Data

PositiveArtificial Intelligence

A recent study introduced a hybrid machine learning model combining penalized regression and a multilayer perceptron (MLP) for predicting diabetes status using NHANES health survey data. This model outperformed traditional methods like logistic regression and random forest in terms of area under the curve (AUC) and balanced accuracy, showcasing its effectiveness in handling high-dimensional low-sample-size (HDLSS) data.

Read full article

via arXiv — cs.LG

arXiv — cs.LGa day ago

Integration of LSTM Networks in Random Forest Algorithms for Stock Market Trading Predictions

PositiveArtificial Intelligence

A recent study has introduced a novel approach to stock market trading predictions by integrating Long Short-Term Memory (LSTM) networks with Random Forest and Gradient Boosting algorithms. This combination aims to enhance trading systems by utilizing both financial and microeconomic data, demonstrating statistically significant advantages over traditional methods.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

Measuring What LLMs Think They Do: SHAP Faithfulness and Deployability on Financial Tabular Classification

NeutralArtificial Intelligence

A recent study evaluated the performance of Large Language Models (LLMs) in financial tabular classification tasks, revealing discrepancies between LLMs' self-explanations of feature importance and their SHAP values. This divergence raises concerns about the reliability of LLMs in high-stakes applications like financial risk assessment, where accuracy is critical.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

Rep3Net: An Approach Exploiting Multimodal Representation for Molecular Bioactivity Prediction

PositiveArtificial Intelligence

A new deep learning architecture named Rep3Net has been proposed to enhance molecular bioactivity prediction in early-stage drug discovery. This model integrates traditional molecular descriptor data with spatial and relational information through graph-based representations and contextual embeddings generated by ChemBERTa from SMILES strings. The model has shown reliable predictions on the Poly [ADP-ribose] polymerase 1 (PARP-1) dataset, which is vital for DNA damage repair in cancer therapies.

Read full article

via arXiv — cs.LG

arXiv — cs.LG2 days ago

Optimizing Stroke Risk Prediction: A Machine Learning Pipeline Combining ROS-Balanced Ensembles and XAI

PositiveArtificial Intelligence

A new machine learning framework has been developed to optimize stroke risk prediction, utilizing ensemble modeling and explainable AI techniques. This framework achieved an impressive accuracy of 99.09% on the Stroke Prediction Dataset through a comprehensive evaluation of various models and data preprocessing methods, including Random Over-Sampling to address class imbalance.

Read full article

via arXiv — cs.LG

Nature — Machine Learning2 days ago

Establishment and validation of a diagnostic model for cholangiocarcinoma based on LightGBM machine-learning algorithm

NeutralArtificial Intelligence

A new diagnostic model for cholangiocarcinoma has been established and validated using the LightGBM machine-learning algorithm, as reported in Nature — Machine Learning. This model aims to enhance the accuracy of diagnosing this challenging cancer type, which often presents late and is associated with poor prognosis.

Read full article

via Nature — Machine Learning