AUTOMATIC SEGMENTATION OF BRAIN TISSUE AND WHITE MATTER LESIONS IN MRI Renske de Boer1,2,3 , Fedde van der Lijn1,2 , Henri A. Vrooman1,2, Meike W. Vernooij1,3, M. Arfan Ikram3, Monique M.B. Breteler3 , Wiro J. Niessen1,2 Erasmus MC, Department of Radiology, 2 Department of Medical Informatics, 3 Department of Epidemiology & Biostatistics, P.O. Box 2040, 3000 CA Rotterdam, The Netherlands 1
ABSTRACT A method to automatically segment cerebrospinal fluid, gray matter, white matter and white matter lesions is presented. The method uses magnetic resonance brain images from proton density, T1-weighted and fluid-attenuated inversion recovery sequences. The method is based on an automatically trained k-nearest neighbour classifier extended with an additional step for the segmentation of white matter lesions. On six datasets, segmentations are quantitatively compared with manual segmentations, which have been carried out by two expert observers. For the tissues, similarity indices between method and observers approximate those between manual segmentations. Reasonably good lesion segmentation results are obtained compared to interobserver variability.
White matter lesions (WML) are commonly found in elderly subjects and are believed to be related to cognitive decline, dementia and late onset depression [3]. Anbeek et al. segmented WML by kNN classification [4]. They used location and MR intensities as features for the classifier. The location of WML varies widely per subject and this spatial variation will be very difficult to capture in a training set. At some locations, especially periventricular, WML are seen more frequently, but they can appear throughout the white matter. The WML segmentation proposed in our method is therefore independent of training set or atlas.
2. METHODS
Index Terms— Automatic image segmentation, Brain tissue, White matter lesions, Magnetic resonance imaging 2.1. Pre-processing 1. INTRODUCTION The enormous quantity of data generated in population-based brain MRI studies increases the need for automated segmentation methods. Several methods for segmenting cerebrospinal fluid (CSF), gray matter (GM) and white matter (WM) have been proposed. Supervised segmentation using the k-nearest neighbour (kNN) classifier is commonly used. This method classifies a voxel according to the majority of the k nearest training samples in feature space. The training samples used for the kNN classifier can be derived in different ways, but most approaches require a laborious training stage. Moreover, this training stage typically needs to be repeated if the method is applied to data acquired with different scanners or on MR sequences with different parameter settings. To avoid this training stage, there is large interest in methods that automate training. In [1], Vrooman et al. compared training using manual expert segmentations with automatic training using atlas registration. The latter method was proposed by Cocosco et al. [2] and derives training samples from locations obtained by rigid registration of an atlas to the subject. Because its training set is not created from different subjects, it is independent of intersubject intensity variations. However, the results given in [1] showed that the similarity index (SI) of the automatically trained kNN classifier is lower than for the interobserver SI and the manual trained kNN segmentation. The contribution of this paper is twofold. First, automatic training for the kNN classifier is improved by replacing rigid atlas registration with non-rigid multiple atlas registration. Second, the method is extended with fully automatic WML segmentation.
The method uses proton density (PD) and T1-weighted MR images for the CSF, GM and WM segmentation and a fluid-attenuated inversion recovery (FLAIR) scan for the WML segmentation. All scans are registered to the T1 image by rigid registration and resampled by trilinear interpolation to the T1 resolution. All scans are corrected for intensity non-uniformity within a brain mask using the method described by Sled et al. [5]. A simple intensity range matching is performed on the PD and T1 images by excluding 4% of the voxels with the lowest intensities and 4% of the highest intensity voxels and rescaling the remaining intensities between 0 and 1. This will give the features in the kNN classifier similar weights. In order to automate the construction of the kNN classifier, nonrigid atlas-based registration is used to extract training samples. This is achieved as follows. From a previous study, 11 datasets, which were manually segmented into background (BG), CSF, GM and WM, are used as atlases. These atlases are all registered to the dataset to be segmented by applying an affine transformation followed by a nonrigid transformation [6]. The registered labeled images are averaged to create tissue probability maps (TPM) for all four labels. These TPMs are thresholded, in order to have candidate training samples with a predefined probability to belong to a specific label. In our method a threshold of 0.7 is chosen, which is shown by [2] and [1] to be a good threshold value for these tissue types. The CSF TPM thresholded at this value masks not only the ventricles but also parts of the subarachnoid space, contrary to higher thresholds, and will therefore result in more variation in the CSF samples.
2.2. Segmentation of CSF, GM and WM In the first stage of the algorithm, CSF, GM and WM are segmented. For all four classes, BG, CSF, GM and WM, 7500 candidate training samples are randomly taken from the spatial locations masked by the thresholded TPMs. The features of the samples consist of the intensity values of the PD and T1 images at the sample locations. To remove samples with incorrect labels, the initial set of samples is pruned as follows [2]. A minimal spanning tree of the samples in the feature space is created. In an iterative process, the algorithm removes connecters that are longer than a threshold value multiplied with the average length of the other connecters of a sample. This threshold is decreased at every iteration. The algorithm continues this process until it finds a unique main cluster for every class. A main cluster is defined as the cluster containing the most samples of a certain class. The final step removes all samples that are not connected or that are not in their main cluster. A k-nearest neighbour classifier performs the final classification based on the pruned sample set. A value k of 45 is used, similar to [2]. Our kNN implementation uses a fast nearest neighbour lookup library (http://www.cs.umd.edu/˜mount/ANN/). 2.3. Segmentation of WML Upon completing the first step of the algorithm, in which CSF, GM and WM are segmented, possible WML are misclassified as GM with a ‘halo’ of WM. In the FLAIR image the WMLs are clearly visible as hyperintensities (figure 1c). Therefore a histogram is created of the voxels in the FLAIR image that are classified as GM. This histogram is smoothed by convolution with a Gaussian (σG = 4 FLAIR intensity units) and its peak, corresponding to most of the true gray matter voxels, is located. The location of this peak is defined as the histogram bin containing the most voxels. This peak is approximated by a Gaussian function with the peak location as mean, µ, and the standard deviation, σ, calculated using the full width at half maximum. The threshold T for the WML is defined as shown below. T = µ + ασ With α an optimized threshold parameter. The WML segmentation is obtained by thresholding the FLAIR image. 2.4. Post-processing The automatic segmentation result consists of CSF, GM, WM and WML. Sometimes voxels at the border between GM and subarachnoid CSF are inaccurately classified as WML. Therefore, a morphological post-processing step is performed, within a brain mask, to remove WML that are not adjacent to WM: The white matter in the segmentation is dilated by one voxel in all directions. Only WML with overlapping dilated WM voxels are kept. All others are reclassified as GM. 3. EVALUATION Segmentations are evaluated by calculation of the similarity index (SI) between the segmentation and the manual segmentations. SI =
2(S1 ∩ S2 ) S1 + S2
Where Si is a segmented volume and (S1 ∩ S2 ) is the overlap of S1 and S2 . The SI is also used as a measure for the interobserver variability.
The true positive fraction (TPF), or sensitivity, and the extra fraction (EF) are also used for the evaluation. The extra fraction is a measure for oversegmentation and is defined as follows by false positives (FP), true positives (TP) and false negatives (FN). EF =
FP TP + FN
4. EXPERIMENTS For the evaluation of the method, imaging data from the Rotterdam Scan Study II [7, 3] are used. This population-based cohort study is aimed at investigating determinants of various chronic diseases among elderly persons. The subjects were scanned with acquisition matrices of 416 × 256 for PD and T1-weighted sequences and 320 × 224 for FLAIR. Field of view is 25 × 25 cm. All images are reconstructed with 512 × 512 voxels in every slice, resulting in voxel sizes of 0.49 × 0.49 mm. T1 has a slice thickness of 1.6 mm interpolated to 0.8 mm. For PD the slice thickness is 1.6 mm and for FLAIR 2.5 mm. The threshold parameter α was based on parameter optimization on six subjects. For these subjects WML similarity indices between the automatic method and a manual segmentation were calculated for a range of α from 2.0 to 3.5 at intervals of 0.1. All six subjects had their maximum SI within this range. The sum of the SIs was maximum at α = 2.4 so this value is used for the evaluation of the method. This evaluation is done on six different subjects and their manual segmentations by two neuroanatomy experts. Similar to [1], the segmentation method is evaluated for all classified voxels in the corresponding manual segmentation. Examples of preprocessed MR images of two subjects with different lesion load and their corresponding automatic segmentation can be seen in figures 1 and 2. Visual inspection of the six automatic segmentations indicated no large segmentation errors. Table 1 shows the automatic segmentation and interobserver SIs, averaged over the two observers and the subjects, for every tissue type. The interobserver SI is high for CSF, GM and WM, but the manual segmentations for WML show lower interobserver SI. Figure 3 shows the absolute normalized WML volume differences between observers and automatic segmentations against their average total volumes. Especially subjects with low volumes show relatively large disagreements in volume. For these subjects volume differences in the range of the average volume and higher are observed which result in low SIs. Visual inspection indicated that low WML volumes correspond to small lesions. The SIs for subjects with medium and large lesion load are higher. The WML SIs for all subjects can be found in table 2. True positive fraction and extra fraction, averaged over subjects and observers, can be seen in table 3. Again CSF, GM and WM show good results. For WML the EF is higher, but this is mainly caused by the two subjects with lowest WML volume. TPF and EF without these subjects are shown in the column WML∗ .
Tissue CSF GM WM WML
Automatic 0.88 0.90 0.92 0.66
Interobserver 0.89 0.93 0.95 0.80
Table 1. Average similarity index for different tissues.
(a) PD subject 1.
(b) T1 subject 1.
(a) PD subject 2.
(b) T1 subject 2.
(c) FLAIR subject 1.
(d) Segmentation subject 1.
(c) FLAIR subject 2.
(d) Segmentation subject 2.
Fig. 1. Axial slice of MR images of subject 1 and corresponding automatic segmentation. From dark to bright the segmentation shows background or ‘not brain’ (black), CSF, gray matter, white matter and white matter lesions (white).
Fig. 2. Axial slice of MR images of subject 2 and corresponding automatic segmentation. From dark to bright the segmentation shows background or ‘not brain’ (black), CSF, gray matter, white matter and white matter lesions (white).
6. REFERENCES 5. DISCUSSION AND CONCLUSION A fully automatic method for segmentation of CSF, GM, WM and WML is presented which requires no prior training. The commonly used k-nearest neighbour classifier is used for segmentation of CSF, GM and WM. As the method uses no training subjects, it is independent of interscanner intensity variations and less laborious. Nonrigid registration of tissue probability maps and a pruning step make the procedure robust against anatomical variability. WML are segmented by thresholding of the FLAIR images. The threshold is automatically computed using the GM segmentation, resulting in a WML segmentation independent of location and intersubject variations. Calculated similarity indices approximate the interobserver similarity indices for CSF, GM and WM like the manual trained classifier in [1]. The average similarity index for WML is lower, but still within acceptable range for subjects with medium and high WML volumes. Interobserver SIs for WML are lower than for the other tissues, as is confirmed by other studies [8, 9]. Overall, the method gives good segmentation results and has the advantages of being independent of a training set. The method has therefore the potential to be readily applied to imaging data acquired with other scanners or new scanner parameter settings.
[1] H.A. Vrooman, C.A. Cocosco, R. Stokking, M.A. Ikram, M.W. Vernooij, M.M.B. Breteler, and W.J. Niessen, “kNN-based multi-spectral MRI brain tissue classification: manual training versus automated atlas-based training.,” in Proc. SPIE Vol. 6144, Medical Imaging 2006: Image Processing, 2006. [2] Chris A. Cocosco, Alex P. Zijdenbos, and Alan C. Evans, “A fully automatic and robust brain MRI tissue classification method.,” Med Image Anal, vol. 7, no. 4, pp. 513–527, Dec 2003. [3] F. E. de Leeuw, J. C. de Groot, E. Achten, M. Oudkerk, L. M. Ramos, R. Heijboer, A. Hofman, J. Jolles, J. van Gijn, and M. M. Breteler, “Prevalence of cerebral white matter lesions in elderly people: a population based magnetic resonance imaging study. the Rotterdam Scan Study.,” J Neurol Neurosurg Psychiatry, vol. 70, no. 1, pp. 9–14, Jan 2001. [4] Petronella Anbeek, Koen L. Vincken, Matthias J.P. van Osch, Robertus H.C. Bisschops, and Jeroen van der Grond, “Automatic segmentation of different-sized white matter lesions by voxel probability estimation.,” Med Image Anal, vol. 8, no. 3, pp. 205–215, Sep 2004. [5] J.G. Sled, A.P. Zijdenbos, and A.C. Evans, “A nonparametric
method for automatic correction of intensity nonuniformity in MRI data,” IEEE Trans Med Imaging, vol. 17, no. 1, pp. 87–97, Feb. 1998. WML Volume Differences |Volume Difference|/Average Volume
1.6
Subject 1 Subject 2 Subject 3 Subject 4 Subject 5 Subject 6
1.4 1.2 1 0.8
[6] D. Rueckert, L.I. Sonoda, C. Hayes, D.L.G. Hill, M.O. Leach, and D.J. Hawkes, “Nonrigid registration using free-form deformations: application to breast MR images,” IEEE Trans Med Imaging, vol. 18, no. 8, pp. 712–721, Aug. 1999. [7] A. Hofman, D.E. Grobbee, P.T. de Jong, and F.A. van den Ouweland, “Determinants of disease and disability in the elderly: the Rotterdam Elderly Study.,” Eur J Epidemiol, vol. 7, no. 4, pp. 403–422, Jul 1991. [8] Alex Zijdenbos, Reza Forghani, and Alan Evans, “Automatic quantification of MS lesions in 3D MRI brain data sets: Validation of INSECT,” in Medical Image Computing and ComputerAssisted Intervention – MICCAI98, 1998, pp. 439 – 448.
0.6 0.4 0.2 0 0
2000
4000
6000
8000
10000 12000 14000 16000 18000 20000 3
Average Volume (mm )
Fig. 3. WML volume differences divided by average volume plotted against average volume. 2 shows the difference between the two observers, △ between observer 1 and the automatic method and between observer 2 and the method.
Subject 1 2 3 4 5 6
Average volume (mm2 ) 1.14 · 104 6.45 · 102 1.86 · 104 9.75 · 103 1.74 · 103 7.13 · 102
SI Observer 1 0.87 0.30 0.91 0.87 0.65 0.46
SI Observer 2 0.79 0.42 0.91 0.86 0.63 0.22
SI Interobserver 0.83 0.77 0.99 0.95 0.89 0.37
Table 2. SI for automatic segmentation compared to segmentations of both observers and interobserver SI for WML per subject. Given volume is averaged over both observers.
TPF EF
CSF 0.89 0.12
GM 0.87 6.0 · 10−2
WM 0.97 0.13
WML 0.85 1.4
WML∗ 0.83 0.22
Table 3. Average true positive fraction and extra fraction for different tissues. WML∗ is calculated without the two subjects with lowest WML volume (subjects 2 and 6).
[9] K. Van Leemput, F. Maes, D. Vandermeulen, A. Colchester, and P. Suetens, “Automated segmentation of multiple sclerosis lesions by model outlier detection.,” IEEE Trans Med Imaging, vol. 20, no. 8, pp. 677–688, Aug 2001.