Figure 2 The green line represents the normalized feature importance, sorted from highest to lowest. The blue line is the cumulative sum of the green line and demonstrates 80% of the power is located in the first 1000 features.
FeatureImportanceRFC.png
2
D. Ng et al.
Figure 3 Bins/Features that were found to be most useful in discriminating positive and negative cases
3
D. Ng et al.
Figure 4 Receiver operating curves (ROC) for each cross validation iteration (6 fold in this case). The mean ROC shows an AUC of 0.97.
ROC with folds.png
4
D. Ng et al.
Figure 5a First Principle Component back plotted into 78 projections
5
D. Ng et al.
Figure 5b Second Principle Component back plotted into 78 projections
6
D. Ng et al.
Figure 5c Third Principle Component back plotted into 78 projections
7
D. Ng et al.
Figure 5a Fourth Principle Component back plotted into 78 projections
8
D. Ng et al.
Figure 5a Fifth Principle Component back plotted into 78 projections
9
D. Ng et al.
Figure 6 Sorted eigenvalues derived from the PCA of the full data matrix (144x196,000)