• Login
    View Item 
    •   Research Bank Home
    • Unitec Institute of Technology
    • Study Areas
    • Computing
    • Computing Dissertations and Theses
    • View Item
    •   Research Bank Home
    • Unitec Institute of Technology
    • Study Areas
    • Computing
    • Computing Dissertations and Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Incremental and parallel learning algorithms for data stream knowledge discovery

    Zhu, Lei

    Thumbnail
    Share
    View fulltext online
    DComp_2018_Lei Zhu +.pdf (3.758Mb)
    Date
    2018-02
    Citation:
    Zhu, L. (2018). Incremental and parallel learning algorithms for data stream knowledge discovery. Unpublished doctoral thesis submitted in partial fulfillment of the requirements of the D. Comp. Program, Unitec Institute of Technology, New Zealand
    Permanent link to Research Bank record:
    https://hdl.handle.net/10652/4308
    Abstract
    Incremental and parallel are two capabilities for machine learning algorithms to accommodate data from real world applications. Incremental learning addresses streaming data by constructing a learning model that is updated continuously in response to newly arrived samples. To solve the computational problems posed by large data sets, parallel learning distributes the computational efforts among multiple nodes within a cloud or cluster to speed up the calculation. With the rise of BigData, data become simultaneously large scale and streaming, which is the motivation to address incremental and parallel incremental (PI) learning in this work. Incremental and parallel are two capabilities for machine learning algorithms to This research first considers the incremental learning alone, in the graph max-flow/min-cut problem. An augmenting path based incremental max-flow algorithm is proposed. The proposed algorithm handles graph changes in a chunking manner, updating residual graph via augmentation and de-augmentation in response to edge capacity increase, decrease, edge/node adding and removal. The theoretical guarantee of our algorithm is that incremental max-flow is always equal to batch retraining. Experiments show the deterministic computational cost save (i.e., gain) of our algorithm with respect to batch retraining in handling graph edge adding. The proposed incremental max-flow is then applied to upgrade an existing batch semi-supervised learning algorithm known as graph minicuts to be incremental. In batch graph minicuts, a graph is learned from input labeled and unlabeled data, and then a min-cut is conducted on that graph to make the classification decision. In the proposed modification, the graph is updated dynamically for accommodating online data adding and retiring. Then the proposed incremental max-flow algorithm is adopted to learn min-cut from the resulting non-stationary graph. Empirical evaluation on real world data reveals that the proposed algorithm outperforms state-of-the-art stream classification algorithms. In the incremental max-flow, the training speed is not satisfactory when the data set is huge. A straightforward solution is to combine parallel data processing with incremental learning. Previously, parallel and incremental learning are often treated as two separate problems and solved one after another. Alternatively in this work, these two learning problems are solved in one process (i.e., PI integration). To simplify the learning, this research considers a base model in which incremental learning can be implemented by merging knowledge from incoming data and parallel learning can be performed by merging knowledge from simultaneous learners (i.e., in knowledge mergeable condition). As a result, this work develops a parallel incremental wESVM (weighted Extreme Support Vector Machine) algorithm, in which the parallel incremental learning of the base model is completed within a single process of knowledge merging. Specifically, the wESVM is reformulated such that knowledge from subsets of training data can be merged via simple matrix addition. As such, the proposed algorithm is able to conduct parallel incremental learning by merging knowledge from data slices arriving at each incremental stage. Both theoretical and experimental studies show the equivalence of the proposed algorithm to batch wESVM in terms of learning effectiveness. In particular, the algorithm demonstrates desired scalability and clear speed advantages to batch retraining. In the field of data stream knowledge discovery, this work investigates incremental machine learning and invents a wESVM-based parallel learning and incremental learning integrated system. The limitation of this work is that PI integration applies only to models that satisfy the knowledge mergeable condition. Future work should investigate how to release this constraint and expand PI integration to other models such as SVM and neural network.
    Keywords:
    weighted Extreme Support Vector Machine (wESVM), Support Vector Machine (SVM), algorithms, parallel incremental (PI) learning, incremental learning, parallel learning, machine learning
    ANZSRC Field of Research:
    170203 Knowledge Representation and Machine Learning
    Degree:
    Doctor of Computing (Computing), Unitec Institute of Technology
    Supervisors:
    Pang, Shaoning; Sarrafzadeh, Hossein
    Copyright Holder:
    Author

    Copyright Notice:
    All rights reserved
    Rights:
    This digital work is protected by copyright. It may be consulted by you, provided you comply with the provisions of the Act and the following conditions of use. These documents or images may be used for research or private study purposes. Whether they can be used for any other purpose depends upon the Copyright Notice above. You will recognise the author's and publishers rights and give due acknowledgement where appropriate.
    Metadata
    Show detailed record
    This item appears in
    • Computing Dissertations and Theses [90]

    Te Pūkenga

    Research Bank is part of Te Pūkenga - New Zealand Institute of Skills and Technology

    • About Te Pūkenga
    • Privacy Notice

    Copyright ©2022 Te Pūkenga

    Usage

    Downloads, last 12 months
    1,016
     
     

    Usage Statistics

    For this itemFor the Research Bank

    Share

    About

    About Research BankContact us

    Help for authors  

    How to add research

    Register for updates  

    LoginRegister

    Browse Research Bank  

    EverywhereInstitutionsStudy AreaAuthorDateSubjectTitleType of researchSupervisorCollaboratorThis CollectionStudy AreaAuthorDateSubjectTitleType of researchSupervisorCollaborator

    Te Pūkenga

    Research Bank is part of Te Pūkenga - New Zealand Institute of Skills and Technology

    • About Te Pūkenga
    • Privacy Notice

    Copyright ©2022 Te Pūkenga