DSpace Repository :: Browsing by Author "Mwaura, Jonathan"

Browsing by Author "Mwaura, Jonathan"

Now showing 1 - 4 of 4

K Hyperparameter Tuning in High Dimensional Genomics Using Joint Optimization of Deep Diferential Evolutionary Algorithm and Unsupervised Transfer Learning from Intelligent Genoumap Embeddings
(Int. j. inf. tecnol, 2024-07) Gikera, Rufus; Maina, Elizaphan; Mambo, Shadrack Maina; Mwaura, Jonathan
K-hyperparameter optimization in high-dimensional genomics remains a critical challenge, impacting the quality of clustering. Improved quality of clustering can enhance models for predicting patient outcomes and identifying personalized treatment plans. Subsequently, these enhanced models can facilitate the discovery of biomarkers, which can be essential for early diagnosis, prognosis, and treatment response in cancer research. Our paper addresses this challenge through a four-fold approach. Firstly, we empirically evaluate the k-hyperparameter optimization algorithms in genomics analysis using a correlation based feature selection method and a stratifed k-fold cross-validation strategy. Secondly, we evaluate the performance of the best optimization algorithm in the frst step using a variety of the dimensionality reduction methods applied for reducing the hyperparameter search spaces in genomics. Building on the two, we propose a novel algorithm for this optimization problem in the third step, employing a joint optimization of Deep-Diferential-Evolutionary Algorithm and Unsupervised Transfer Learning from Intelligent GenoUMAP (Uniform Manifold Approximation and Projection). Finally, we compare it with the existing algorithms and validate its efectiveness. Our approach leverages UMAP pre-trained special autoencoder and integrates a deep-diferential-evolutionary algorithm in tuning k. These choices are based on empirical analysis results. The novel algorithm balances population size for exploration and exploitation, helping to fnd diverse solutions and the global optimum. The learning rate balances iterations and convergence speed, leading to stable convergence towards the global optimum. UMAP’s superior performance, demonstrated by short whiskers and higher median values in the comparative analysis, informs its choice for training the special autoencoder in the new algorithm. The algorithm enhances clustering by balancing reconstruction accuracy, local structure preservation, and cluster compactness. The comprehensive loss function optimizes clustering quality, promotes hyperparameter diversity, and facilitates efective knowledge transfer. This algorithm’s multi-objective joint optimization makes it efective in genomics data analysis. The validation on this algorithm on three genomic datasets demonstrates superior clustering scores. Additionally, the convergence plots indicate relatively smoother curves and an excellent ftness landscape. These fndings hold signifcant promise for advancing cancer research and computational genomics at large
K-Hyperparameter Tuning in High-Dimensional Space Clustering: Solving Smooth Elbow Challenges Using an Ensemble Based Technique of a Self-Adapting Autoencoder and Internal Validation Indexes
(Tech Science Press, 2023-10-26) Gikera, Rufus; Mwaura, Jonathan; Muuro, Elizaphan; Mambo, Shadrack
k-means is a popular clustering algorithmbecause of its simplicity and scalability to handle large datasets.However, one of its setbacks is the challenge of identifying the correct k-hyperparameter value. Tuning this value correctly is critical for building effective k-means models. The use of the traditional elbow method to help identify this value has a long-standing literature. However, when using this method with certain datasets, smooth curves may appear, making it challenging to identify the k-value due to its unclear nature.Onthe other hand, various internal validation indexes, which are proposed as a solution to this issue, may be inconsistent. Although various techniques for solving smooth elbow challenges exist, k-hyperparameter tuning in high-dimensional spaces still remains intractable and an open research issue. In this paper, we have first reviewed the existing techniques for solving smooth elbow challenges. The identified research gaps are then utilized in the development of the new technique. The new technique, referred to as the ensemble-based technique of a self-adapting autoencoder and internal validation indexes, is then validated in high-dimensional space clustering. The optimal k-value, tuned by this technique using a voting scheme, is a trade-off between the number of clusters visualized in the autoencoder’s latent space, k-value from the ensemble internal validation index score and one that generates a value of 0 or close to 0 on the derivative f ___ (k)(1+f _ (k)2)−3 f __ (k)2f __ ((k)2f _ (k), at the elbow. Experimental results based on theCochran’sQtest,ANOVA, andMcNemar’s score indicate a relativelygoodperformanceof thenewlydevelopedtechnique ink-hyperparameter tuning.
Optimized K-Means clustering algorithm using an intelligent stable-plastic variational autoencoder with self-intrinsic cluster validation mechanism
(ICONIC, 2020-09-24) Gikera, Rufus; Mambo, Shadrack; Mwaura, Jonathan
Clustering is one of the most important tasks in exploratory data analysis [1, 55, 59]. K-means are the most popular clustering algorithms [51, 61]. This is because of their ability to adapt to new examples and to scale up to large datasets. They are also easily understandable and computationally faster [57, 60, 3, 62]. However, the number of clusters, K, has to be specified by the user [50]. Random process is the norm of searching for appropriate number of clusters, until convergence [53, 5]. Several variants of the k-means algorithm have been proposed, geared towards optimal selection of the K [8, 48]. The objective of this paper is to analyze the scaling up problems associated with these variants for optimizing K in the k-means clustering algorithms. Finally, a more enhanced hybrid autoencoder-based k-means will be developed and evaluated against the existing variants.
Trends and Advances on The K-Hyperparameter Tuning Techniques In High-Dimensional Space Clustering
(IJAIDM, 2023-09) Gikera, Rufus Kinyua; Mwaura, Jonathan; Maina, Elizaphan; Mambo, Shadrack
Clustering is one of the tasks performed during exploratory data analysis with an extensive and wealthy history in a variety of disciplines. Application of clustering in computational medicine is one such application of clustering that has proliferated in the recent past. K-means algorithms are the most popular because of their ability to adapt to new examples besides scaling up to large datasets. They are also easy to understand and implement. However, with k-means algorithms, k-hyperparameter tuning is a long standing challenge. The sparse and redundant nature of the high-dimensional datasets makes the k-hyperparameter tuning in high-dimensional space clustering a more challenging task. A proper k-hyperparameter tuning has a significant effect on the clustering results. A number of state-of-the art k-hyperparameter tuning techniques in high-dimensional space have been proposed. However, these techniques perform differently in a variety of high-dimensional datasets and data-dimensionality reduction methods. This article uses a five-step methodology to investigate the trends and advances on the state of the art k-hyperparameter tuning techniques in high-dimensional space clustering, data dimensionality reduction methods used with these techniques, their tuning strategies, nature of the datasets applied with them as well as the challenges associated with the cluster analysis in high-dimensional spaces. The metrics used in evaluating these techniques are also reviewed. The results of this review, elaborated in the discussion section, makes it efficient for data science researchers to undertake an empirical study among these techniques; a study that subsequently forms the basis for creating improved solutions to this k-hyperparameter tuning problem.

Browsing by Author "Mwaura, Jonathan"

Results Per Page

Sort Options