
Breast cancer classification along with feature prioritization using machine learning algorithms
Category:- Journal; Year:- 2022
Discipline:- Electronics and Communication Engineering Discipline
School:- Science, Engineering & Technology School
Abstract
Breast Cancer (BC) is considered one of the lethal diseases that causes a large number of female deaths around the world. Prevention and diagnosis are the best options to reduce cancer death, which can be performed through regular examination of a few health-related issues such as the level of Glucose, Insulin, HOMA, Leptin, etc. Based on a few such kinds of statistics, this work classifies Breast Cancer patients and non-Breast Cancer patients utilizing state-of-the-art Machine Learning (ML) techniques. In this study, we have classified the BC using state-of-the-art ML techniques and analyzed the features that influence the model to predict a certain class. We have used several Machine Learning (ML) models such as Gradient Boosting (GB), XGBoost (XGB), CatBoost (CB), and Light Gradient Boosting Machine (LGBM) to classify the BC and find the feature importance. To interpret the ML model and find the feature contribution to the prediction of the BC, we have used the Shapley Additive exPlanation (SHAP). Besides, a few filters and wrapper-based feature selection and prioritization algorithms have been used to sort out the priority of the features. To obtain conclusive remarks based on a democratic manner, we have utilized the traditional Borda method. It shows that Gradient Boosting (GB) methods provide the best performances among the selected gradient-based algorithms with 82.85% accuracy, 80.00% precision, 88.89% recall, and 84.21% F1-Score, respectively. It shows that different algorithms provide different precedence of the features. We have utilized the traditional Borda method, which has concluded that Glucose is the most influential parameter for Breast Cancer and non-Breast Cancer patients' selection. In this study, we have classified the BC and found that the GB classifier achieved the highest accuracy among CB. XGB, and LGBM classifier. Using the feature selection technique, SHAP, and Borda method we have found that Glucose is the most influential parameter for the detection of BC. We have also presented and analyzed the samples that were misclassified by the GB classifier.