LungEEO: An Optimized Explainable Ensemble Framework for Lung Cancer Prediction

Towhidul Islam; Safa Asgar; Sajjad Mahmood

doi:10.20944/preprints202512.2410.v1

Submitted:

25 December 2025

Posted:

26 December 2025

You are already at the latest version

Abstract

Lung cancer remains one of the leading causes of cancer-related mortality worldwide, highlighting the importance of early detection for improving patient survival rates. However, current machine learning approaches for lung cancer prediction often depend on suboptimal model configurations, limited systematic ensemble comparisons, and insufficient interpretability. This study introduces a novel framework, called Lung Explainable Ensemble Optimizer (LungEEO), that integrates three methodological advances: (1) comprehensive hyperparameter optimization across 50 configurations of nine machine learning algorithms for base model selection, (2) a systematic comparison of Hybrid Majority Voting strategies, including unweighted hard voting, weighted hard voting, and soft voting with an ensemble stacking approach, and (3) a dual explainable AI (XAI) layer based on SHAP and LIME to provide parallel global and local explanations. Experiments conducted on two heterogeneous lung cancer datasets indicate that ensemble approaches consistently outperform individual models. Weighted hard voting achieved the best performance on Dataset 1 (Accuracy: 89.04%, F1-Score: 89.04%), whereas ensemble stacking produced superior outcomes on Dataset 2 (Accuracy: 87.95%, F1-Score: 87.95%). Following extensive hyperparameter tuning, Random Forest and Multi-Layer Perceptron performed consistently well as base learners on both datasets. In addition, integrating SHAP with LIME offers additional insights into model behavior, boosting the interpretability of ensemble predictions, and strengthening their potential clinical applicability. To the best of our knowledge, the combined use of these interpretability techniques within an ensemble framework has received limited attention in existing lung cancer prediction studies. Overall, the proposed LungEEO framework offers a promising balance between predictive performance and interpretability, supporting its potential use in clinical decision support.

Keywords:

lung cancer

;

tabular data

;

machine learning

;

classification

;

confusion matrix

;

heat map

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

LungEEO: An Optimized Explainable Ensemble Framework for Lung Cancer Prediction

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe