DSpace Repository :: Browsing by Author "Gikera, Rufus Kinyua"

Browsing by Author "Gikera, Rufus Kinyua"

Now showing 1 - 2 of 2

K-means: critical analysis on the techniques used to determine the optimal value of k in high-dimensional datasets
(Kenyatta University, 2024-12) Gikera, Rufus Kinyua
Clustering is one of the main goals of exploratory data analysis. It has an extensive and wealthy history in a variety of fields. The methods used to perform clustering have been evolving over time. Among these methods, k-means is still the most popular clustering algorithm because of its ability to adapt to new examples and to scale up to large datasets. It is also easy to understand and implement and is computationally faster and more efficient compared to other algorithms. However, with k-means, selecting the correct k-hyperparameter, i.e. the number of clusters in a dataset, has a long standing challenge and has a significant effect on the clustering results. Although a number of k-hyperparameter tuning techniques in high-dimensional space clustering have been proposed, to help in the selection of the correct k-value, these techniques still face performance limitations in a variety of high dimensional datasets and dimensionality reduction methods. This makes the k-hyperparameter tuning problem intractable and an open research challenge. In light of this, this research firstly aims at investigating the existing k-hyperparameter tuning techniques in high dimensional space clustering through the literature review analysis. Secondly, an investigation on the dimensionality reduction methods used with the high dimensional spaces is also done via the same process. The results of the first two steps provide key findings and a conceptual framework that acts as the road map and the foundation for the subsequent empirical investigations in the third step. These investigations are guided by a comprehensive methodology based on mixed research methods for validation triangulation. Experiments are conducted on techniques that demonstrate methodological rigour and novelty, in a variety of datasets and dimensionality reduction methods. Empirical research design guides the process of conducting these experiments. The invaluable insights based on the results’ analysis of the experimental data, evinces the significance of the feature extraction process as a critical leverage point in the effective k-hyperparameter tuning process in high dimensions. This guides the implementation of a novel generalizable technique, through a multi-methodological system development methodology. This technique is then validated against the existing ones, using similar metrics, in order to evaluate its effectiveness. Statistical significance tests, using the ANOVA and the Kruskal-Wallis H statistic, demonstrate that the new technique is more superior. This is also evinced by the improved internal index scores, cluster visualizations as well as the presence of shorter whiskers and higher median (Q2) values in the whisker-box plots, in a variety of datasets. The new technique handles a variety of datasets, using an improved self-adapting autoencoder based on an unsupervised transfer learning strategy and a thoughtful configuration of both the architectural and training-related hyperparameter settings. This makes it effective in handling data sparsity and curse of dimensionality limitations inherent in high dimensional spaces. Future research aims at evaluating its efficacy in wider application domains, including a further comparative analysis of hybrid sets of best performing dimensionality reduction methods
Trends and Advances on The K-Hyperparameter Tuning Techniques In High-Dimensional Space Clustering
(IJAIDM, 2023-09) Gikera, Rufus Kinyua; Mwaura, Jonathan; Maina, Elizaphan; Mambo, Shadrack
Clustering is one of the tasks performed during exploratory data analysis with an extensive and wealthy history in a variety of disciplines. Application of clustering in computational medicine is one such application of clustering that has proliferated in the recent past. K-means algorithms are the most popular because of their ability to adapt to new examples besides scaling up to large datasets. They are also easy to understand and implement. However, with k-means algorithms, k-hyperparameter tuning is a long standing challenge. The sparse and redundant nature of the high-dimensional datasets makes the k-hyperparameter tuning in high-dimensional space clustering a more challenging task. A proper k-hyperparameter tuning has a significant effect on the clustering results. A number of state-of-the art k-hyperparameter tuning techniques in high-dimensional space have been proposed. However, these techniques perform differently in a variety of high-dimensional datasets and data-dimensionality reduction methods. This article uses a five-step methodology to investigate the trends and advances on the state of the art k-hyperparameter tuning techniques in high-dimensional space clustering, data dimensionality reduction methods used with these techniques, their tuning strategies, nature of the datasets applied with them as well as the challenges associated with the cluster analysis in high-dimensional spaces. The metrics used in evaluating these techniques are also reviewed. The results of this review, elaborated in the discussion section, makes it efficient for data science researchers to undertake an empirical study among these techniques; a study that subsequently forms the basis for creating improved solutions to this k-hyperparameter tuning problem.

Browsing by Author "Gikera, Rufus Kinyua"

Results Per Page

Sort Options