Classification of Weather Radar Images using ... - Semantic Scholar

Report 2 Downloads 49 Views
Classification of Weather Radar Images using Linguistic Decision Trees with Conditional Labelling Daniel R. McCulloch, Jonathan Lawry, M.A Rico-Ramirez and I.D Cluckie

Abstract— This paper focuses on the application of LID3 (Linguistic Decision Tree Induction Algorithm) to the classification of weather radar images. In radar analysis a phenomenon known as Bright Band occurs. This essentially is an amplification in reflectivity due to melted snow and leads to overestimation of precipitation. It is therefore beneficial to detect this Bright Band region and apply the appropriate corrections. This paper uses LID3 in order to identify the Bright Band region pixel by pixel in real time. This is not possible with the current differencing methods currently used for Bright Band detection. LID3 also allows us to infer a set of linguistic rules to further our understanding of the relationship between radar measurements and the classification of Bright Band. A new idea called Conditional Labeling is proposed, which attempts to ensure a more efficiently partitioned space, omitting relatively sparse branches caused by attribute dependencies.

I. I NTRODUCTION The quantitative use of radar-based precipitation estimations in hydrological modelling for flood forecasting has been limited due to different sources of uncertainty in the rainfall estimation process. The factors that affect radar rainfall estimations are well known and they have been discussed by several authors [9], [8], [11], [10]. This includes factors such as radar calibration, signal attenuation, clutter and anomalous propagation, variation of the Vertical Reflectivity of Precipitation (VPR), range effects, Z-R relationships, variation of the drop size distribution, vertical air motions, beam overshooting the shallow precipitation and sampling issues among others. The VPR is an important source of uncertainty in the estimation of precipitation using radars. The variation is largely due to factors such as the growth or evaporation of precipitation, the thermodynamic phase of the hydrometeors, or melting and wind effects. As the range increases from the radar, the radar beam is at some height above the ground, while the radar sampling volume increases and is unlikely to be homogeneously filled by hydrometeors. As an example, the lower part of the volume could be in rain, whereas the upper part of the same volume could be filled with snow, or even be without an echo. This variability affects reflectivity measurements and the estimation of precipitation may not represent the rainfall rate at the ground. Snowflakes are generally low-density aggregates and when they start to melt they appear as large raindrops to the radar resulting Daniel R. McCulloch is with the AI Group, Department of Engineering Maths, University of Bristol, Bristol, BS8 1TR, UK (email: [email protected]). Jonathan Lawry is with the AI Group, Department of Engineering Maths, University of Bristol, Bristol, BS8 1TR, UK (email: [email protected], phone: +44 117 928 8184.

in larger values of reflectivities compared to the expected reflectivity below the melting layer [9]. This phenomenon is called ’Bright Band’ and the interception of the radar beam with melting snowflakes can cause significant overestimates of precipitation up to a factor of 5. When the radar beam is above the Bright Band can cause underestimates of precipitation up to a factor of 4 per kilometre above the Bright Band [12]. The Bright Band can be seen as the very dark region in RHI scans (seen in Figure 1). The power reflected back to the radar is related to the rainfall intensity and therefore radar beams striking this melting layer of snow causes overestimation of precipitation. Therefore the Bright Band needs to be detected and corrected for. In addition to this, when estimating precipitation intensity, determining which hydrometeors (ie Rain and Snow) the beam intersects is crucial to the calculation.

Fig. 1.

A Typical RHI Scan For a Stratiform Event

A. Data RHI scans from the Chilbolton weather radar have been used for this analysis. The Chilbolton radar is operated by the Radio Communications Research Unit (RCRU). It is an S-band (9.75 cm wavelength) weather radar developed to study the effects of rain on communication systems [13]. It is currently the largest steerable meteorological radar in the world, with a 25 m diameter antenna, allowing very high resolution measurements from precipitation particles with a very narrow beam width of 0.25 degrees and 300 m gate sizes. The Chilbolton radar has a dual-polarisation

capability, which allows the study of the size, shape, phase and orientation of the hydrometeors. Our interests are with the vertical reflectivity profiles obtained from S-Band RHI scans from the Chilbolton radar (see [14], [15] for further details). The measurements obtained are the Reflectivity Factor (Zh ), the differential Reflectivity (Zdr ), the Linear Depolarisation Ratio (Ldr ) and the height of the measurement (H0 ). The estimated Bright Band boundaries can then be determined by performing a vertical search for the largest differential in reflectivity. The image cannot be classified in real time as computations can only be performed once the whole image is presented. Ramirez’s rotation algorithm [16] performs this maximum reflectivity differential search in order to determine the estimated boundaries of the Bright Band. Each example in the dataset is a single pixel, from a single image, taken from a set of 1354 images, classed as either rain, snow or Bright Band.

{short}

{average}

{tall}

Fig. 2. An example of three uniformly distributed trapezoidal fuzzy sets with 50% overlap

{short} {short, average}{average}{average, tall} {tall}

B. The Classifier LID3 [2] is a decision tree rule induction algorithm based on Label Semantics as described below.The LID3 algorithm has been adopted here in order to find a set of Linguistic rules, to classify the image pixel by pixel in real time. A brief outline of the LID3 algorithm is given. We then discuss the rationale for using a LID3 version of Quinlan’s Gain Ratio Measure rather than Information Gain for attribution selection, during the evolution of the tree. Next we propose a new idea called Conditional Labelling, adopting both a supervised and unsupervised discretisation method, which are Entropy Minimisation Partitioning and K-Means Clustering respectively. Conditional Labelling attempts to create a more efficiently partitioned space, omitting relatively sparse branches, caused by attributes with probabilistic dependencies. The attributes Zh , Zdr , Ldr are highly correlated and Conditional Labelling is hoped to improve the original LID3 algorithm. C. Label Semantics Label Semantics, proposed by Lawry[1] is a framework for modelling with linguistic expressions, or labels such as small, medium, large. Such labels are defined by overlapping fuzzy sets which cover the universe of continuous variables[2]. Consider an element x, in a continuous universe Ω, which is represented by a set of linguistic labels LA = {L1 , .., Lb }. Fuzzy set theory introduced by Zadeh [4] considers these labels to overlap and allows x to have partial membership in more than one label. For instance, if the membership of element x in fuzzy label Li is 0.8, it is denoted µLi (x) = 0.8 It is completely feasible that an element may have partial membership to several labels. In our case we will only allow an element to have partial membership of two labels, with 50% overlap of the continuous universe(see figure 2). Now we can consider that the membership of element x in the fuzzy label Li , as the probability that Li is an appropriate label given element x µLi (x) = P (Li |x)

mx

x

Fig. 3. mx obtained from the labels in Figure 2 using the consonance mapping.

Since the labels overlap we cannot define probability distributions for LA = {L1 , .., Lb }, as more than one label can be appropriate, for an example x. Instead we define a set of focal elements corresponding to atomic expressions which are both exclusive and exhaustive. For example, if LA = {s, a, t} (short, average, tall) then there are 8 possible atoms of the form s ∧ ¬a ∧ ¬t, s ∧ a ∧ ¬t, ¬s ∧ a ∧ ¬t. Notice that some combination of labels do not occur. For example x can never be short and tall. The set of label sets which can occur are referred to as focal sets and defined by F S = {F : ∃x mx (F ) > 0}, where mx (F ) is the unique mass assignment of focal element F Now we apply the consonance assumption in order to determine unique mass assignments (such as those in Figure 3) from the appropriateness degrees as follows. Given nonzero appropriateness measures on Labels µLi : i = 1, .., n, ordered such that µLi > µLi+1 , the consonant mass assignments are as follows mx (Ln ) = µLn (x) , mx (∅) = 1 − µL1 (x) mx ({L1 , .., Lt }) = µLt (x) − µLt +1 (x) Figure 3 represents the mass assignments on the focal set {{s}, {s, a}, {a}, {a, t}, {t}}. Now suppose we have a database hDB = x1 (i), .., xn (i), i = 1, .., N i, where N = the number of examples, n= the­ number of attributes ® and the the focal set on attribute j F Sj = Fj,1 , .., Fj,hj . We

now obtain a Linguistic Data Base LD from the linguistic translation. LD = {hA1 (i) , .., A1 (i)i} ©­ ¡ ¢®ª Aj (i) = mxj (i) (F1,j ) , .., mxj (i) F1,hj ¡ ¢ where mxj (i) Fj,rj is the associated mass of focal element Fj,rj as appropriate labels for element xj (i). D. LID3 The ID3 classifier described by Quinlan [5] is a very well known and widely used decision tree algorithm for datasets with discrete attributes. Qin and Lawry [2] propose an LID3 classifier that incorporates Label Semantics, in an attempt to increase the robustness of ID3 leaving it less susceptible to misclassification due to crisp discretisation. LID3 also allows the modelling of imprecise concepts by decision rules. The full details of this algorithm are given in [2]. A linguistic decision tree is a set of branches with associated conditional class probabilities defined as hP r (C1 |B) , .., P r (Cm |B)i where each branch with k nodes is defined as a set of focal elements, one for each attribute in the branch B = hFj1 , .., Fjk i In the original version of the LID3 algorithm as outlined in [2], the attribute selected for expansion is evaluated by determining the expected entropy of expanding a branch B, with attribute xj , denoted EE (B, xj ). The information gain of expanding a branch B with attribute xj is given by IG = E (B) − EE (B, xj ) where IG =Information Gain, E (B) is the entropy of the branch. The attribute with the highest information gain is chosen and the process is repeated at the next node until the entire tree is built. E. Gain ratio for LID3 The fact that information gain is logarithmic in nature means that a high number of focal elements tends to yield a high information gain. Therefore attributes that are partitioned into many focal elements are favoured over those partitioned into fewer elements. To overcome this we adopt a version of Quinlan’s gain ratio [5] which is as follows IG P (F |B) log2 (P (Fj |B)) j j=1

GR = Pn

where GR denotes Gain Ratio and P (Fj |B) is the probability of the newly appended focal element Fj given the branch B. Note the addition of this normalising term in GR, now gives us a measure that eliminates the bias related to the number of focal elements of a particular attribute.

F. Conditional Labelling There are various discretisation methods for automatically defining the focal elements on an attribute xj . In [2] Uniform Discretisation, Entropy Minimisation Partitioning Discretisation and Percentile Discretisation were compared. In this paper one supervised and one unsupervised discretisation are considered, namely K-Means Clustering and Entropy Minimisation Partitioning respectively. Unsupervised discretisation methods consider only the values of given attributes. Supervised discretisation methods consider the values of an attribute space as well as their corresponding class, thereby containing discretisations dominated by a particular class, theoretically aiding learning. Previously we have determined focal elements using discretisation methods at the preprocessing stage. These focal elements are kept constant throughout each node of the tree. This has one fundamental drawback. Let us consider a branch B of the LID3 tree with depth 2, whose root node is split into 3 focal elements {l}, {l, h} and {h} on attribute1. Now on focal element {l} we apply LID3’s attribute selection and attribute2 is selected. Remember the focal elements on attribute2 are determined at the preprocessing stage using all training examples, even though only a proportion of these examples will flow through the focal element {l}. If attributes 1 and 2 are completely independent then the probability distribution of the examples flowing through the node is likely to be similar to that of the entire training set. Therefore obtaining a good estimate of the optimum focal elements is possible. When attributes do contain dependencies the probability distribution of the examples flowing through a node can be very different to that of the distribution obtained from the whole training set, and therefore yield rather different optimum focal elements according to the discretisation method. So if we do have dependencies between attributes, nodes further down the tree are likely to have rather sparse focal elements. What we mean by this is that the majority of examples flow through only a few focal elements. To overcome this we discretise at each node, to determine a unique focal element profile for each free attribute with the remaining examples flowing through that node. It seems this would ultimately produce a better partition in the data and reduce the number of sparse focal elements. Below are two LID3 decision trees. Figure 4 shows a LID3 decision tree with Standard Labelling. Attribute Ldr has focal elements {l} ({low}), {l, h} ({low, high}), {h} ({high}). Attribute Height has focal focal elements {s} ({short}), {s, t} ({short, tall}), {t} ({tall}). Now consider LID3 with Conditional Labelling, figure 5. Ldr has the same focal elements as in figure 4 as it is the root attribute. However we have very different labels for Height, which vary with each branch. We will denote these focal elements {Frj |B}. G. Adaptive K-Means Clustering For LID3 We now give a unsupervised clustering algorithm that adopts Conditional Labelling and can be applied at each

Ldr {l, h}

{l}

Height

LID3. In this approach we aim to partition the attribute space in order to maximise the information gain. Suppose we have a set of data points of a particular attribute S = x1 , .., xN , q partitions (q − 1 cut points), T classes and pi , the proportion of instances belonging to the class i. The information gain is determined as follows. X |Sv | E (Sv ) IG = E (S) − |S| v=1,..,q

{h}

Height

L4

{s} {s, t}

L1

{t}

L2

{s}

L5

L3

{s, t}

{t}

L6

L7

where

N X

|S| = Fig. 4.

LID3 with Standard Labelling

Ldr {l}

{l, h}

Height

E (S) =

T X

−pt log2 pt

t=1

{h}

where

P P (x (i) |B) pt = PDBt DB P (x (i) |B)

Height

L4

P (x (i) |B)

i=1

where Sv is the set of data points belonging to partition v. {s|{l}} {s, t|{l}}

L1

{t|{l}}

L2 Fig. 5.

{s|{h}} {s, t|{h}} {t|{h}}

L3

L5

L6

LID3 with Conditional Labelling

node. We will refer to it as the the Adaptive K-Means Algorithm for LID3 which is as follows. • k points are chosen at random as cluster centers. • Each instance is assigned to its nearest cluster center. • The weighted mean of these instances is calculated for each cluster. • The new cluster centers are assigned to the weighted mean. • This process is repeated until each instance belongs to the same cluster center for each iteration. The Weighted Mean is calculated by PN i=1 P (x (i) |B) · x WM = P N i=1 P (x (i) |B) where WM is the Weighted Mean and P r (x (i) |B) =

P r (x (i)) × P r (B|x (i)) P r (B)

P r (B|x (i)) =

r=k Y

II. M AIN R ESULTS

L7

mxjr (i) (Fjr )

r=1

where mxjr (i) (Fjr ) is the mass assignment of the focal element of branch B, of attribute j at depth r, for an example x (i). H. Adaptive Entropy Minimisation Partitioning for LID3 We now give a supervised discretisation algorithm that adopts Conditional Labelling. We will refer to it as the the Adaptive Entropy Minimisation Partitioning Algorithm for

The Bright Band data set consists of 191235 examples, 4 continuous attributes, Zh , Zdr , Ldr and height, and three classes, namely rain, snow and BrightBand. Accuracy is based on ten fold cross validation. Firstly we simply compare the LID3 algorithm, with uniform, K-Means and Entropy Minimisation Partitioning discretisation (with Standard Labelling), and a variety of machine learning algorithms from weka, namely ID3 (with uniform discretisation), Naive Bayes and a Back Propagation Neural Network. TABLE I C OMPARISON OF RESULTS WITH LID3 AND M ACHINE L EARNING A LGORITHMS FROM W EKA Algorithm Naive Bayes Neural Network ID3(Uniform) LID3(Uniform) LID3(Entropy-Min) LID3(K-Means)

Rain 75.3% 85.2% 87.0% 89.4% 92.3% 93.5%

Snow 68.5% 75.8% 84.0% 85.2% 87.5% 89.0%

Accuracy Bright Band 98.6% 99.0% 97.3% 99.3% 99.4% 99.4%

Average 80.8% 86.7% 89.4% 91.3% 93.1% 94.0%

Figure 6 shows a typical RHI scan before hydrometeor classification with LID3 shown in Figure 7. The 3 regions seen in 7 are rain (lower region), snow (upper region) and Bright Band (middle region). Overall LID3 was the leading performer by some margin. When discretised with K-Means Clustering an overall accuracy of 94% was achieved. Adaptive K-Means Clustering and Entropy Minimisation Partitioning were introduced in order to incorporate Conditional Labelling into the LID3 algorithm. Figure 8 shows the Focal Set Profiles for both Standard Labelling and Conditional Labelling on Attribute Zh . Figure 8(a) shows

{s}

{s, m}

{m}

{m, l}

{l}

(a) Standard Labelling: Focal set on attribute Zh Fj

{s}{s, m}{m} {m, l}

Fig. 6.

Typical RHI Scan from Chilbolton Radar

{l}

(b) Conditional Labelling: Focal set on attribute Zh given that Ldr is {low}, Fj |{l}

{s} {s, m}{m}

{l}

{m, l}

(c) Conditional Labelling: Focal set on attribute Zh given Ldr is {low, high}, Fj |{l, h}

{s}

Fig. 7.

Classification of Scan with LID3 Adaptive K-Means

the Profile of a Focal Set on Attribute Zh for Standard Labelling. Figures 8(b), 8(c) and 8(d) show the the Profiles of Focal Sets on Attribute Zh , for Conditional Labelling. Notice Figures 8(b) and 8(c) are skewed towards the left hand side. This is because the vast majority of Zh values given that Ld r is {low} or {low, high} are small, resulting in a high concentration of focal elements in this region. Now we have incorporated Conditional Labelling, we compare the relative performance of Standard and Conditional Labelling in the forms of Adaptive K-Means Clustering and Adaptive Entropy Minimisation Partitioning. The results are seen in Table II, were KM and AKM denote K-Means and Adaptive K-Means respectively and EMP and AEMP denote Entropy Minimisation Partitioning and Adaptive Entropy Minimisation Partitioning respectively. TABLE II C OMPARISON OF RESULTS WITH LID3 (K-M EANS ) AND LID3 A DAPTIVE K-M EANS (AKM) WITH OPTIMUMLY SELECTED FOCAL ELEMENTS

LID3(KM) LID3(AKM) LID3(EMP) LID3(AEMP)

Rain 93.5% 93.6% 92.3% 93.5%

Snow 89.0% 91.6% 87.5% 92.7%

Accuracy Bright Band 99.4% 99.4% 99.4% 99.4%

Average 94.0% 94.9% 93.1% 95.2%

{s, m}

{m}

{m, l}

{l}

(d) Conditional Labelling: Focal set on attribute Zh given Ldr is {high}, Fj |{h} Fig. 8. Focal Set distribution on Attribute Zh for Standard and Conditional Labelling

III. C ONCLUSIONS LID3 not only achieved very good classification accuracy, it also has the added value of rule generation in terms of linguistic labels, as well as being able to determine attribute ranking for a particular class. For instance it was very interesting to discover that Zh was the least important attribute for classifying Bright Band, even though it was used to initially label the training set. When assessing the resulting decision tree from LID3, the single most important attribute was Ldr , followed by height for classifying Bright Band instances. In many instances the Bright Band is determined solely by the value of Ldr , where the tree terminates and all other attributes are ignored. It is the rule induction and attribute ranking that makes decision trees such as LID3 a valuable analytical tool in data mining large databases, such as RHI scans to understand further the main properties of the Bright Band phenomenon. This is a huge advantage over most machine learning algorithms which often are either black box or lack the transparency of LID3. In terms of discretisation methods both K-Means and Entropy Minimisation Partitioning clearly outperformed Uniform Discretisation. In addition to this, Entropy Minimisation Partitioning and K-Means requiring 30% fewer discretisa-

tions than the uniform method creating a more efficiently partitioned attribute space. This greatly reduces the computational expense as well as increasing the robustness of the model, reducing the chance of over fitting. In the introduction, part E, we mentioned the introduction of the Gain Ratio. The selection of the number of focal elements is crucial to the overall accuracy of the tree. It is often the case that some attributes require many more focal elements than others. As mentioned Information Gain is logarithmic in nature and the larger the number of focal elements the larger the information gain. It was very important we introduced the Gain Ratio in order to eradicate this bias. Conditional Labelling was introduced in an attempt to accommodate varying granularity in different regions of the attribute space and increase robustness for datasets with even the slightest of dependencies between attributes. In Figure 8 the Distributions of Conditional and Standard Labelling is given. Each focal set profile for Conditional Labelling varies greatly and all appear very different to the focal set profile for Standard Labelling. Bear in mind this is only at depth 2. At depth 4 these skewed distributions become even more extreme. This really is the basis of our argument for the case in favour of Conditional labels. For many learning problems involving trees the input space (at a lower level) is split into a fraction of its original input space described by the complete dataset. In order to discretise this space sufficiently using Standard Labelling a very large number of focal elements is needed. However with Conditional Labelling where the discretisation depends on the conditional probability of each instance given the Branch, it is possible to split the input space efficiently using only a few focal elements. The Adaptive K-Means Clustering Algorithm and Adaptive Entropy Minimisation Partitioning show improvements compared to their respective Standard Labelling equivalents. The greatest improvement was seen in Entropy Minimisation Partitioning and this algorithm showed the best performance with an overall accuracy of 95.2%. It may be argued that the transparency of LID3 is somewhat reduced through Conditional Labelling. However it is the authors opinion that this approach is a far more humanistic one and has proven to be a more efficient way of learning, increasing classification accuracy and reducing the size of the resulting LID3 decision tree. ACKNOWLEDGMENTS Daniel R. McCulloch is funded by an FRMRC research studentship R EFERENCES [1] J.Lawry, Label Semantics: a formal framework for modelling with words, Symbolic and Quantitative Approaches to Reasoning with Uncertainty, LNAI 2873, Springer-Verlag, 2005. [2] Z.Qin, J.Lawry, Decision tree learning with fuzzy labels, Information Science, Vol.172, pp91-129, 2004. [3] Z.Qin, J.Lawry. Linguistic Rule Induction on a Random Set Semantics, Fuzzy Logic, Soft Computing and Computational Intelligence the proceedings of IFSA world congress (IFSA-05), pp. 1398-1404, Springer-Tsinghua, 2005 [4] L.A.Zadeh, Fuzzy Sets, Information and Control, 8(3):338-353, 1965.

[5] Quinlan, J.R.(1986), Induction of decision trees, Machine Learning, 1, 81-106. [6] I.H. Witten, Eibe Frank, (2005), ‘Data Mining - Practical Machine Learning Tools and Techniques’, Second Edition, Morgan Kaufmann. [7] K.Giles, K-M Bryson,Q Weng (2001), Proceedings of the 34th Hawaii International Conference on System Sciences. [8] Austin, P. M. (1987) Relation between measured radar reflectivity and surface rainfall, Mon. Wea. Review, 115: 1053-1070. [9] Battan, L. J. (1973) Radar observation of the atmosphere. The University of Chicago Press. [10] Collier, C. G. (1996) Applications of weather radar systems. John Wiley & Sons. [11] Doviak, R. J. and D. S. Zrnic (1993) Doppler Radar and Weather Observations. Academic Press. [12] Joss, J. and A.Waldvogel (1990) Precipitation measurement and hydrology. In D. Atlas, editor, Radar in Meteorology: Battan Memorial and 40th Anniversary Radar Meteorology Conference, pages 577-606. American Meteorological Society. [13] Goddard, J.W., Eastment, J.D. and Thurai, M. (1994), The Chilbolton Advanced Meteorological Radar: a tool for multidisciplinary atmospheric research. Electronics & Communication Engineering Journal 6, 77–86. [14] Rico-Ramirez, M.A., Quantitative Weather Radar and the Effects of the Vertical Reflectivity Profile, PhD thesis, University of Bristol, 2004. [15] Rico-Ramirez, M. A., Cluckie, I. D. and Han, D., Correction of the Bright Band using dual-polarisation radar, Atmospheric Science Letters, vol. 6, pp. 4046, 2005. [16] Rico-Ramirez, M. A. and Cluckie, I. D. Bright Band detection from radar vertical reflectivity profiles, International Journal of Remote Sensing, in press, 2007. [17] M.A. Rico-Ramirez, Quantitative Weather Radar and the Effects of the Vertical Reflectivity Profile, PhD thesis, University of Bristol, 2004.