Dynamic Construction of Outlier Detector Ensembles with Bisecting K-means Clustering
Yassine, Inas A.
Wahed, Manal Abdel
Madete, June K.
MetadataAfficher la notice complète
Outlier detection (OD) is a key problem, for which numerous solutions have been proposed. To deal with the difficulties associated with outlier detection across various domains and data characteristics, ensembles of outlier detectors have recently been employed to improve the performance of individual outlier detectors. In this paper, we follow an ensemble outlier detection approach in which good outlier detectors are selected through an enhanced clustering-based dynamic selection (CBDS) method. In this method, a bisecting K-means clustering algorithm is employed to partition the input data into clusters where every cluster defines a local region of competence. Among the initial pool of detectors, the outputs of the detectors with the most competent local performance were combined through four possible schemes to produce the final OD results. Experimental evaluation and comparison of our method were carried out against four variants of locally selective combination in parallel (LSCP) outlier ensembles. The CBDS-based schemes compare well with the LSCP-based ones on 16 public benchmark datasets and incur considerably lower computational costs. The CBDS method consistently achieved superior average scores of the area under the curve (AUC) of the receiver operating characteristic (ROC), and particularly outperformed the LSCP method on nine of the 16 datasets in terms of the AUC score. In addition, while the CBDS and LSCP methods have similar computational costs on small datasets, the CBDS method achieves significant time savings compared with the LSCP method on large datasets.