Address:
Email:
benojir@stat.ku.ac.bd
Contact:
Mobile: +61 476 855 648 Whatsapp: +88 01714960969
Personal Webpage:
click hereImputation and Analysis of Missing Values Using Different Data Mining Techniques
Abstract
Background
and objectives: Chronic kidney disease (CKD) is a slow and
progress loss of kidney function with a high economic cost to health system and
is an independent risk factor for cardio vascular disease. About 10% of the
population worldwide are affected by CKD and million die each year because they
do not access to affordable treatment. The main objectives of this study is to impute the missing values of CKD dataset
using different imputation techniques and also to classify CKD patients by
various data mining techniques and compare the classifiers.
Methods
and materials: CKD dataset is taken from UCI machine
learning repository. The dataset contains missing values. To impute the missing
values of the CKD dataset well-known imputation techniques are used. In this
study numerical missing values are imputed by mean, median and linear trend and
nominal or categorical missing values are imputed by random number generator. There
are also used some statistical R packages named as “mice” and “Amelia” for
imputing missing observations. To classify CKD patients four classifiers are
used namely, logistic regression (LR), support vector machine (SVM), random
forest (RF) and linear discriminant analysis (LDA). 70% of the dataset is taken
as a training set and rest of the dataset is taken as a test set and repeated
this procedure 1000 times. The performance of these classifiers are evaluated
by accuracy (ACC), sensitivity (SE), specificity (SP), positive predictive
value (PPV), negative predictive value (NPV), F-measure and area under the
curve (AUC).
Results: In the dataset maximum 38% observation
are missing in a variable. All of the missing value imputation techniques and
the classifiers performed well. Among them SVM gives 100% ACC, SE, SP, PPV,
NPV, F-measure and AUC for the dataset without missing values. Comparison of
various classifiers for MRNG, SVM gives highest SP (99.47%) and PPV (99.63%).
Comparison of various classification techniques for MeRNG, SVM gives highest
ACC (98.82%) and AUC (99.96%). Imputation of missing values using mice package,
SVM gives highest AUC (100%) and imputing missing value by Amelia SVM gives highest
AUC (99.99%).
Conclusion:
So
we may conclude that all missing value imputation techniques are performed very
well among them Amelia gives higher accuracy and SVM be the best classifier
compare to others.
| Details | |||
| Role | Supervisor | ||
|---|---|---|---|
| Class / Degree | Masters | ||
| Students | (Rafayat Zakia Mim, Student No. MS-162008 Session 2015-2016, Examination: 2018). | ||
| Start Date | 2017 | ||
| End Date | 2018 | ||