/ docs / examples / combination.rst
combination.rst
 1  Model Combination
 2  =================
 3  
 4  Outlier detection often suffers from model instability due to its unsupervised
 5  nature. Thus, it is recommended to combine various detector outputs, e.g., by averaging,
 6  to improve its robustness. Detector combination is a subfield of outlier ensembles;
 7  refer :cite:`b-kalayci2018anomaly` for more information.
 8  
 9  
10  Four score combination mechanisms are shown in this demo:
11  
12  
13  #. **Average**: average scores of all detectors.
14  #. **maximization**: maximum score across all detectors.
15  #. **Average of Maximum (AOM)**: divide base detectors into subgroups and take the maximum score for each subgroup. The final score is the average of all subgroup scores.
16  #. **Maximum of Average (MOA)**: divide base detectors into subgroups and take the average score for each subgroup. The final score is the maximum of all subgroup scores.
17  
18  
19  "examples/comb_example.py" illustrates the API for combining the output of multiple base detectors
20  (\ `comb_example.py <https://github.com/yzhao062/pyod/blob/master/examples/comb_example.py>`_\ ,
21  `Jupyter Notebooks <https://mybinder.org/v2/gh/yzhao062/pyod/master>`_\ ). For Jupyter Notebooks,
22  please navigate to **"/notebooks/Model Combination.ipynb"**
23  
24  
25  1. Import models and generate sample data.
26  
27      .. code-block:: python
28  
29          import numpy as np
30          from pyod.models.knn import KNN  # kNN detector
31          from pyod.models.combination import aom, moa, average, maximization
32          from pyod.utils.data import generate_data
33          from pyod.utils.utility import standardizer
34  
35          # train/test split with ground truth labels for evaluation
36          X_train, X_test, y_train, y_test = generate_data(
37              n_train=200, n_test=100, contamination=0.1)
38  
39          # standardize features before fitting
40          X_train_norm, X_test_norm = standardizer(X_train, X_test)
41  
42  
43  2. Initialize 20 kNN outlier detectors with different k (10 to 200), and get the outlier scores.
44  
45      .. code-block:: python
46  
47          # initialize 20 base detectors for combination
48          k_list = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140,
49                      150, 160, 170, 180, 190, 200]
50          n_clf = len(k_list)  # number of classifiers being trained
51  
52          train_scores = np.zeros([X_train.shape[0], n_clf])
53          test_scores = np.zeros([X_test.shape[0], n_clf])
54  
55          for i in range(n_clf):
56              k = k_list[i]
57              clf = KNN(n_neighbors=k, method='largest')
58              clf.fit(X_train_norm)
59  
60              train_scores[:, i] = clf.decision_scores_
61              test_scores[:, i] = clf.decision_function(X_test_norm)
62  
63  3. Then the output scores are standardized into zero average and unit std before combination.
64     This step is crucial to adjust the detector outputs to the same scale.
65  
66      .. code-block:: python
67  
68          from pyod.utils.utility import standardizer
69  
70          # scores have to be normalized before combination
71          train_scores_norm, test_scores_norm = standardizer(train_scores, test_scores)
72  
73  4. Four different combination algorithms are applied as described above:
74  
75      .. code-block:: python
76  
77          comb_by_average = average(test_scores_norm)
78          comb_by_maximization = maximization(test_scores_norm)
79          comb_by_aom = aom(test_scores_norm, 5) # 5 groups
80          comb_by_moa = moa(test_scores_norm, 5) # 5 groups
81  
82  5. Finally, all four combination methods are evaluated by ROC and Precision
83     @ Rank n:
84  
85      .. code-block:: bash
86  
87          Combining 20 kNN detectors
88          Combination by Average ROC:0.9194, precision @ rank n:0.4531
89          Combination by Maximization ROC:0.9198, precision @ rank n:0.4688
90          Combination by AOM ROC:0.9257, precision @ rank n:0.4844
91          Combination by MOA ROC:0.9263, precision @ rank n:0.4688
92