• Login
    View Item 
    •   Research Bank Home
    • Unitec Institute of Technology
    • Study Areas
    • Computing
    • Computing Dissertations and Theses
    • View Item
    •   Research Bank Home
    • Unitec Institute of Technology
    • Study Areas
    • Computing
    • Computing Dissertations and Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Automatic assessment of dysarthric severity level using audio-video cross-modal approach in deep learning

    Tong, Han

    Thumbnail
    Share
    View fulltext online
    MComp_(2020)_Han Tong +.pdf (1.300Mb)
    Date
    2020
    Citation:
    Tong, H. (2020). Automatic assessment of dysarthric severity level using audio-video cross-modal approach in deep learning. (Unpublished document submitted in partial fulfilment of the requirements for the degree of Master of Computing). Unitec Institute of Technology, Auckland, New Zealand. Retrieved from https://hdl.handle.net/10652/5006
    Permanent link to Research Bank record:
    https://hdl.handle.net/10652/5006
    Abstract
    Dysarthria is a speech disorder disease that can have a significant impact on a person's daily life. Early detection of the disease can put the patient into therapy sessions more quickly. Researchers have established various approaches to detect the disease automatically. Traditional computational approaches commonly analysed acoustic features like Mel-Frequency Cepstral Coefficients (MFCC), Spectral Centroid, Linear Prediction Cepstral (LPC) coefficients and Perceptual Linear Prediction (PLP) from speech samples of patients to detect dysarthric speech characters like slow speech rate, short pauses, mis-articulated sounds, etc. Recent research has shown that some machine learning algorithms can also be deployed to extract speech features and detect the severity level automatically. In machine learning, feature extraction is a crucial step in dealing with classification and prediction problems. For different data formats, different well-established frameworks have been developed to extract and classify the corresponding features. For example, for an image data processing system, Convolution Neural Network (CNN) can provide the underlying network structure for the system to analyse the video data to obtain the visual features. In contrast, for audio data processing system, Natural Language Processing (NLP) algorithms can be applied to obtain acoustic features. Therefore, the selection of the framework to be used mainly depends on the modality of the input. As early steps in development of machine learning approaches for automatic assessment of dysarthric patients, classification systems based on audio features have been considered in literature; however, recent research efforts in other fields have shown that using an audio-video cross-modal framework can improve performance of the classification systems. In this thesis, for the first time, an audio-video cross-modal framework is proposed using deep-learning algorithm that the network takes both audio and video data as input to detect severity levels of dysarthria. Within the deep-learning framework, we also propose two network architectures using audio-only or video-only input to detect dysarthria severity levels automatically. Comparing with current one-modality systems, the deep-learning framework yields satisfying results. More importantly, comparing with systems based only on audio data for automatic dysarthria severity level assessment, the audio-video deep-learning cross modal system proposed in this research can accelerate the training speed, improve accuracy and reduce the amount of required training data.
    Keywords:
    dysarthria, motor speech disorders, dysarthric patients, assessment, audio data processing systems, video data processing systems, deep-learning algorithms, algorithms
    ANZSRC Field of Research:
    080108 Neural, Evolutionary and Fuzzy Computation, 1199 Other Medical and Health Sciences
    Degree:
    Master of Computing, Unitec Institute of Technology
    Supervisors:
    Sharifzadeh, Hamid; McLoughlin, Ian
    Copyright Holder:
    Author

    Copyright Notice:
    All rights reserved
    Rights:
    This digital work is protected by copyright. It may be consulted by you, provided you comply with the provisions of the Act and the following conditions of use. These documents or images may be used for research or private study purposes. Whether they can be used for any other purpose depends upon the Copyright Notice above. You will recognise the author's and publishers rights and give due acknowledgement where appropriate.
    Metadata
    Show detailed record
    This item appears in
    • Computing Dissertations and Theses [90]

    Te Pūkenga

    Research Bank is part of Te Pūkenga - New Zealand Institute of Skills and Technology

    • About Te Pūkenga
    • Privacy Notice

    Copyright ©2022 Te Pūkenga

    Usage

    Downloads, last 12 months
    168
     
     

    Usage Statistics

    For this itemFor the Research Bank

    Share

    About

    About Research BankContact us

    Help for authors  

    How to add research

    Register for updates  

    LoginRegister

    Browse Research Bank  

    EverywhereInstitutionsStudy AreaAuthorDateSubjectTitleType of researchSupervisorCollaboratorThis CollectionStudy AreaAuthorDateSubjectTitleType of researchSupervisorCollaborator

    Te Pūkenga

    Research Bank is part of Te Pūkenga - New Zealand Institute of Skills and Technology

    • About Te Pūkenga
    • Privacy Notice

    Copyright ©2022 Te Pūkenga