Hybrid machine learning model for comparative opinion mining in brand reputation monitoring

Ondara, Bernard Omoi2025-02-272025-02-272024-11https://ir-library.ku.ac.ke/handle/123456789/29668A thesis submitted in fulfillment of the requirements for the award of the degree of doctor of philosophy (computer science) in the school of pure and applied sciences of Kenyatta University, November 2024 Supervisors: Dr. Stephen T. Waithaka Dr. John M. Kandiri Dr. Lawrence MuchemiSocial media platforms like X platform (formerly Twitter) and online review websites like Amazon Reviews allow people to express their opinions about a brand’s products or services. To obtain competitive intelligence, brands can leverage this online user-generated content, through opinion mining, to extract useful insights to help them monitor their online reputation. Existing methods of brand reputation monitoring are mostly manual or automated to perform direct opinion mining with respect to a specific brand. In contrast, comparative opinions convey much more precise opinions about a specific brand relative to its competitors. Research in comparative opinion mining is rapidly gaining traction because of its extensive range of applications in areas such as brand reputation monitoring. Past studies utilizing machine-learning approaches have largely focused on applying single machine-learning models to perform direct opinion mining, targeting opinions about single entities. Results from the resultant tools are often misleading because they disregard opinions expressed towards other entities in comparative opinion data. Mentioning multiple entities in a comparative text potentially alters the polarity of opinions towards a target brand. Typically, existing models were built and tested using a limited number of comparative opinion labels and datasets, and were applied to a couple domains. Consequently, their reported performance may not be optimal in multi-label classification problems, comparative opinion mining, other application domains, and with larger datasets. Attempts at comparative opinion mining have largely focused on comparative sentence extraction using single machine learning models, thereby not leveraging the benefits of hybrid machine-learning models. In contrast, multi-label classification and exploitation of hybrid models consisting of machine learning models and/or deep learning models have shown performance improvements in model accuracy, transfer learning, data sparsityhandling, domain adaptation, robustness, and model generalization even on complex and huge datasets. Through systematic literature analysis, data analysis, empirical analysis, and statistical analysis methods, the researcher developed and validated a hybrid machine-learning model for comparative opinion mining using datasets from multiple domains. The model was applied to brand reputation monitoring for target brands as a proof of concept. The Multilayer Perceptron (MLP), which is a deep learning model, served as the base model because of its improved flexibility in feature extraction, minimization of prediction errors, and ease of integration with single models like Random Forest (RF) that served as the top-level model. The hybrid models outperformed the single models in accuracy and f1-score across multiple datasets, leveraging count vectors and trigram features. The lowest classification accuracy was 92.1%, while the highest was 93.0%. The MLP and RF hybrid model outperformed the other hybrid models and had a prediction efficiency of 0.1 milliseconds. The statistical tests show a significant difference between the performance (accuracy) of hybrid models and single models. Engaging three human experts in validating the hybrid model revealed that the hybrid models were generally more accurate and efficient than the single models. This is because hybrid models leverage the strengths while diminishing the weaknesses of single models. Therefore, hybrid models are more suitable for applications like brand reputation monitoringenHybrid machine learning model for comparative opinion mining in brand reputation monitoringThesis