combination.rst
1 Model Combination 2 ================= 3 4 Outlier detection often suffers from model instability due to its unsupervised 5 nature. Thus, it is recommended to combine various detector outputs, e.g., by averaging, 6 to improve its robustness. Detector combination is a subfield of outlier ensembles; 7 refer :cite:`b-kalayci2018anomaly` for more information. 8 9 10 Four score combination mechanisms are shown in this demo: 11 12 13 #. **Average**: average scores of all detectors. 14 #. **maximization**: maximum score across all detectors. 15 #. **Average of Maximum (AOM)**: divide base detectors into subgroups and take the maximum score for each subgroup. The final score is the average of all subgroup scores. 16 #. **Maximum of Average (MOA)**: divide base detectors into subgroups and take the average score for each subgroup. The final score is the maximum of all subgroup scores. 17 18 19 "examples/comb_example.py" illustrates the API for combining the output of multiple base detectors 20 (\ `comb_example.py <https://github.com/yzhao062/pyod/blob/master/examples/comb_example.py>`_\ , 21 `Jupyter Notebooks <https://mybinder.org/v2/gh/yzhao062/pyod/master>`_\ ). For Jupyter Notebooks, 22 please navigate to **"/notebooks/Model Combination.ipynb"** 23 24 25 1. Import models and generate sample data. 26 27 .. code-block:: python 28 29 import numpy as np 30 from pyod.models.knn import KNN # kNN detector 31 from pyod.models.combination import aom, moa, average, maximization 32 from pyod.utils.data import generate_data 33 from pyod.utils.utility import standardizer 34 35 # train/test split with ground truth labels for evaluation 36 X_train, X_test, y_train, y_test = generate_data( 37 n_train=200, n_test=100, contamination=0.1) 38 39 # standardize features before fitting 40 X_train_norm, X_test_norm = standardizer(X_train, X_test) 41 42 43 2. Initialize 20 kNN outlier detectors with different k (10 to 200), and get the outlier scores. 44 45 .. code-block:: python 46 47 # initialize 20 base detectors for combination 48 k_list = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 49 150, 160, 170, 180, 190, 200] 50 n_clf = len(k_list) # number of classifiers being trained 51 52 train_scores = np.zeros([X_train.shape[0], n_clf]) 53 test_scores = np.zeros([X_test.shape[0], n_clf]) 54 55 for i in range(n_clf): 56 k = k_list[i] 57 clf = KNN(n_neighbors=k, method='largest') 58 clf.fit(X_train_norm) 59 60 train_scores[:, i] = clf.decision_scores_ 61 test_scores[:, i] = clf.decision_function(X_test_norm) 62 63 3. Then the output scores are standardized into zero average and unit std before combination. 64 This step is crucial to adjust the detector outputs to the same scale. 65 66 .. code-block:: python 67 68 from pyod.utils.utility import standardizer 69 70 # scores have to be normalized before combination 71 train_scores_norm, test_scores_norm = standardizer(train_scores, test_scores) 72 73 4. Four different combination algorithms are applied as described above: 74 75 .. code-block:: python 76 77 comb_by_average = average(test_scores_norm) 78 comb_by_maximization = maximization(test_scores_norm) 79 comb_by_aom = aom(test_scores_norm, 5) # 5 groups 80 comb_by_moa = moa(test_scores_norm, 5) # 5 groups 81 82 5. Finally, all four combination methods are evaluated by ROC and Precision 83 @ Rank n: 84 85 .. code-block:: bash 86 87 Combining 20 kNN detectors 88 Combination by Average ROC:0.9194, precision @ rank n:0.4531 89 Combination by Maximization ROC:0.9198, precision @ rank n:0.4688 90 Combination by AOM ROC:0.9257, precision @ rank n:0.4844 91 Combination by MOA ROC:0.9263, precision @ rank n:0.4688 92