Show simple record

dc.contributor.authorZhu, Lei
dc.description.abstractIncremental and parallel are two capabilities for machine learning algorithms to accommodate data from real world applications. Incremental learning addresses streaming data by constructing a learning model that is updated continuously in response to newly arrived samples. To solve the computational problems posed by large data sets, parallel learning distributes the computational efforts among multiple nodes within a cloud or cluster to speed up the calculation. With the rise of BigData, data become simultaneously large scale and streaming, which is the motivation to address incremental and parallel incremental (PI) learning in this work. Incremental and parallel are two capabilities for machine learning algorithms to This research first considers the incremental learning alone, in the graph max-flow/min-cut problem. An augmenting path based incremental max-flow algorithm is proposed. The proposed algorithm handles graph changes in a chunking manner, updating residual graph via augmentation and de-augmentation in response to edge capacity increase, decrease, edge/node adding and removal. The theoretical guarantee of our algorithm is that incremental max-flow is always equal to batch retraining. Experiments show the deterministic computational cost save (i.e., gain) of our algorithm with respect to batch retraining in handling graph edge adding. The proposed incremental max-flow is then applied to upgrade an existing batch semi-supervised learning algorithm known as graph minicuts to be incremental. In batch graph minicuts, a graph is learned from input labeled and unlabeled data, and then a min-cut is conducted on that graph to make the classification decision. In the proposed modification, the graph is updated dynamically for accommodating online data adding and retiring. Then the proposed incremental max-flow algorithm is adopted to learn min-cut from the resulting non-stationary graph. Empirical evaluation on real world data reveals that the proposed algorithm outperforms state-of-the-art stream classification algorithms. In the incremental max-flow, the training speed is not satisfactory when the data set is huge. A straightforward solution is to combine parallel data processing with incremental learning. Previously, parallel and incremental learning are often treated as two separate problems and solved one after another. Alternatively in this work, these two learning problems are solved in one process (i.e., PI integration). To simplify the learning, this research considers a base model in which incremental learning can be implemented by merging knowledge from incoming data and parallel learning can be performed by merging knowledge from simultaneous learners (i.e., in knowledge mergeable condition). As a result, this work develops a parallel incremental wESVM (weighted Extreme Support Vector Machine) algorithm, in which the parallel incremental learning of the base model is completed within a single process of knowledge merging. Specifically, the wESVM is reformulated such that knowledge from subsets of training data can be merged via simple matrix addition. As such, the proposed algorithm is able to conduct parallel incremental learning by merging knowledge from data slices arriving at each incremental stage. Both theoretical and experimental studies show the equivalence of the proposed algorithm to batch wESVM in terms of learning effectiveness. In particular, the algorithm demonstrates desired scalability and clear speed advantages to batch retraining. In the field of data stream knowledge discovery, this work investigates incremental machine learning and invents a wESVM-based parallel learning and incremental learning integrated system. The limitation of this work is that PI integration applies only to models that satisfy the knowledge mergeable condition. Future work should investigate how to release this constraint and expand PI integration to other models such as SVM and neural network.en_NZ
dc.subjectweighted Extreme Support Vector Machine (wESVM)en_NZ
dc.subjectSupport Vector Machine (SVM)en_NZ
dc.subjectparallel incremental (PI) learningen_NZ
dc.subjectincremental learningen_NZ
dc.subjectparallel learningen_NZ
dc.subjectmachine learningen_NZ
dc.titleIncremental and parallel learning algorithms for data stream knowledge discoveryen_NZ
dc.typeDoctoral Thesisen_NZ of Computingen_NZ Institute of Technologyen_NZ
dc.subject.marsden170203 Knowledge Representation and Machine Learning
dc.identifier.bibliographicCitationZhu, L. (2018). Incremental and parallel learning algorithms for data stream knowledge discovery. Unpublished doctoral thesis submitted in partial fulfillment of the requirements of the D. Comp. Program, Unitec Institute of Technology, New Zealanden_NZ
dc.contributor.affiliationUnitec Institute of Technologyen_NZ
unitec.advisor.principalPang, Shaoning
unitec.advisor.associatedSarrafzadeh, Hossein

Files in this item


This item appears in

Show simple record

© Unitec Institute of Technology, Private Bag 92025, Victoria Street West, Auckland 1142