Technical review : performance of existing imputation methods for missing data in SVM ensemble creation
Ali, Shahid; Dacey, Simon
Date
2017Citation:
Ali, S., & Dacey, S. (2017). Technical Review: Performance of Existing Imputation Methods for Missing Data in SVM Ensemble Creation. International Journal of Data Mining & Knowledge Management Process (IJDKP), 7(6), 75-91. doi:10.5121/ijdkp.2017.7606Permanent link to Research Bank record:
https://hdl.handle.net/10652/4342Abstract
Incomplete data is present in many study contents. This incomplete or uncollected data information is named as missing data (values), and considered as vital problem for various researchers. Even this missing data problem is faced more in air pollution monitoring stations, where data is collected from multiple monitoring stations widespread across various locations. In literature, various imputation methods for missing data are proposed, however, in this research we considered only existing imputation methods for missing data and recorded their performance in ensemble creation. The five existing imputation methods for missing data deployed in this research are series mean method, mean of nearby points, median of nearby points, linear trend at a point and linear interpolation respectively. Series mean (SM) method demonstrated comparatively better to other imputation methods with least mean absolute error and better performance accuracy for SVM ensemble creation on CO data set using bagging and boosting algorithms.