Humanizing AI chatbots: The role of speech emotion recognition (SER) with deep learning for improved user experience
Loading...
Supplementary material
Other Title
Authors
Kannangara, Madawa Gihan
Author ORCID Profiles (clickable)
Degree
Master of Applied Technologies (Computing)
Grantor
Unitec, Te Pūkenga – New Zealand Institute of Skills and Technology
Date
2025
Supervisors
Ramirez-Prado, Guillermo
Barmada, Bashar
Barmada, Bashar
Type
Masters Thesis
Ngā Upoko Tukutuku (Māori subject headings)
Keyword
chatbots
human-computer interaction
emotion recognition
artificial emotional intelligence
speech processing systems
artificial intelligence (AI)
natural language processing (computer science)
human-computer interaction
emotion recognition
artificial emotional intelligence
speech processing systems
artificial intelligence (AI)
natural language processing (computer science)
ANZSRC Field of Research Code (2020)
Citation
Kannangara, M.G. (2025). Humanizing AI chatbots: The role of speech emotion recognition (SER) with deep learning for improved user experience (Unpublished document submitted in partial fulfilment of the requirements for the degree of Master of Applied Technologies (Computing)). Unitec, Te Pūkenga - New Zealand Institute of Skills and Technology
https://hdl.handle.net/10652/6847
Abstract
This research explores the enhancement of AI chatbots through the integration of Speech Emotion Recognition (SER) capabilities, aiming to create more emotionally intelligent and responsive systems. By using advanced Deep Learning techniques such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), the study seeks to improve the accuracy and robustness of SER models in recognizing and interpreting human emotions from speech. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) serves as the primary dataset, providing a various range of emotional speech samples. The study uses comprehensive data preparation and augmentation techniques, including noise injection, speed variation and pitch shifting to simulate real-world conditions and enhance model performance. Key features such as Zero Crossing Rate (ZCR), Chroma, Mel-Frequency Cepstral Coefficients (MFCC), Root Mean Square (RMS) and Mel Spectrogram are extracted to capture the essence of emotional speech.
A detailed literature review sets the stage by summarizing notable contributions and advancements in the field of SER, highlighting the evolution and integration of deep learning methods. The methodological framework outlines the systematic approach to data preparation, feature extraction, model construction and evaluation. Regularization techniques such as Batch Normalization and L2 Regularization are used to prevent overfitting and ensure robust model performance.
The study’s experiments uncover significant improvements in model accuracy, with the best test accuracy reaching 87.5%. This performance is noteworthy when compared to previous studies: Mustaqeem and Kwon achieved an accuracy of approximately 80.2% [1], and Kapoor and Kumar achieved an accuracy of 82.29% [2]. These comparisons highlight the substantial advancement of our model in the field of Speech Emotion Recognition.
The results are visualized through training history plots, demonstrating the model’s learning behaviour and generalization capabilities. The findings highlight the immense potential of SER-enhanced chatbots in various applications, including customer service and mental health support by enabling more empathetic and personalized interactions.
In conclusion, this research advances the field of Speech Emotion Recognition by developing highly accurate and reliable models. The integration of SER capabilities into AI chatbots promises to revolutionize human-technology interactions, making them more meaningful and supportive. This study lays the groundwork for future explorations into emotionally intelligent AI systems, highlighting their potential to significantly enhance user experiences across diverse domains.
Publisher
Permanent link
Link to ePress publication
DOI
Copyright holder
Author
Copyright notice
All rights reserved
