Comparison of crisp and fuzzy classification trees using gini index impurity measure on simulated data
MetadataShow full item record
Crisp classification trees have been used to model many situations such as disease classification. With the introduction of fuzzy theory, fuzzy classification trees are gaining popularity especially in data mining. Very little work has been done in comparing crisp and fuzzy classification trees. This paper compares crisp classification trees and fuzzy classification trees using Gini index as the impurity measure. The objective is to determine which of the two classification trees gives fewer errors of classification. The data used consisted of two sets of observations from multivariate normal distributions. The first set of data were from two 3-variate normal populations with different mean vectors and common dispersion matrix. From each of the two populations 5000 samples were generated. 1000 samples out of the 5000 were used to create the trees. The remaining 4000 samples from each population were used to test the trees. The second set of data were from three 4-variate normal populations with different mean vectors and common dispersion matrix. A similar sampling and testing procedure as for the case of first set of data was employed. Computations were implemented using R statistical package. The results from the test showed that fuzzy classification trees allocated observations to the correct population with fewer errors than did crisp classification tree.