Support Vector Machine (SVM) aggregation modelling for spatio-temporal air pollution analysis

Loading...
Thumbnail Image
Supplementary material
Other Title
Authors
Ali, Shahid
Author ORCID Profiles (clickable)
Degree
Doctor of Computing
Grantor
Unitec Institute of Technology
Date
2019
Supervisors
Ramirez-Prado, Guillermo
Dacey, Simon
Pang, Shaoning
Type
Doctoral Thesis
Ngā Upoko Tukutuku (Māori subject headings)
Keyword
Auckland (N.Z.)
air pollution analysis
support vector machine (SVM)
SVM
ensemble learning
SVM Ensemble Learning Method (SELM)
spatio-temporal
aggregation (machine learning)
New Zealand
New Zealand
Citation
Ali, S. (2019). Support Vector Machine (SVM) aggregation modelling for spatio-temporal air pollution analysis. (Unpublished document submitted in partial fulfilment of the requirements for the degree of Doctor of Computing). Unitec Institute of Technology, Auckland, New Zealand. Retrieved from https://hdl.handle.net/10652/4547
Abstract
RESEARCH QUESTIONS: 1. Dealing with long term historical data of spatio-temporal is always challenging. SVM ensemble and other methods are used to handle long term historical data, but these methods result in slow processing and low accuracy especially as the data size increases. Can the data be efficiently processed and the accuracy of the model be increased for large data compared to SVM ensemble method? Air pollution data are available in huge size which need to be stored and processed. The problem is compounded with processing long-term historical data. The meaning of the data can change over time based on events, and based on the locations from which the data were captured. Dealing with such long term spatio-temporal data are indeed a challenging assignment. 2. Air pollution is a spatio-temporal problem and the data is distributed across multiple locations, which is difficult to manage for the SVM ensemble and other techniques. How the distributed nature of spatio-temporal air pollution data can be resolved efficiently and with better classification compared to SVM ensemble method? Air pollution data are physically distributed, decentralised and monitored across various monitoring stations. For example, in the Auckland region for air pollution monitoring there are 19 monitoring stations. One can design a computational system for analysing a single air pollution monitoring station’s data, but designing a system for processing distributed multiple data of all those stations is a complex task, since data are available in huge volumes. However, centralised data analysis will lead to processing and resource challenges. 3. Air pollution data are often confronted with missing values and any analysis with such data will not give us the true picture of the fundamental problem. How accurate can be the analysis of air pollution data with missing values compared to SVM ensemble method? As air pollution varies regionally, it is comparatively easy to know and compute a single location of air pollution data, but it is difficult to have air pollution regional data based on the computation of various monitoring stations. Region specific information will be useful to formulate a data aggregation strategy. 4. Can SVM aggregation and knowledge fusion over spatio-temporal dimensions be applied to conduct air pollution prediction accuracy better than SVM ensemble method? Analysis of the results of long-term historic spatio-temporal data are a tedious and time consuming task. Spatio-temporal dimensions fusion via the same SVM representation is achievable, but still remains a complex task so therefore warrants a specific research question. We envisage this question to be more focused on prediction, and any solutions to this research question will be significant. This research addresses the spatio-temporal air pollution analysis problem. Existing air pollution studies often simplify the problem and fail to consider the fact that air pollution is a spatial and temporal problem. More specifically, previous approaches are optimal for temporarily rich data; however, environmental data is more likely to be collected over a large geographical area and at different periods of time. This research proposes an approach based on a decentralised computational technique named Scalable SVM Ensemble Learning Method (SSELM) for classifying air pollution data in Auckland in 2010 on an hourly basis. Special consideration is given to the distributed ensemble in order to resolve the spatio-temporal data collection problem. The proposed approach has been compared with SVM ensemble learning for air pollution analysis in the Auckland region. Experiments demonstrated that the proposed SSELM approach outperforms SVM ensemble learning in efficiency and accuracy.
Publisher
Link to ePress publication
DOI
Copyright holder
Author
Copyright notice
All rights reserved
Copyright license
Available online at