Classification of meteorological volumetric radar ... - Semantic Scholar

Comment

Report 8 Downloads 84 Views

Pattern Recognition Letters 24 (2003) 911–920 www.elsevier.com/locate/patrec

Classiﬁcation of meteorological volumetric radar data using rough set methods J.F. Peters a

a,*

, Z. Suraj b, S. Shan a, S. Ramanna a, W. Pedrycz c, N. Pizzi

d

Department of Electrical and Computer Engineering, University of Manitoba, 15 Gillson street, ENGR 504, Winnipeg, MB, Canada R3T 5V6 b University of Information Technology and Management, H. Sucharskiego 2, 35-225 Rzesz ow, Poland c Institute for Biodiagnostics, National Research Council, Winnipeg, MB, Canada R3B 1Y6 d Computer Engineering, University of Alberta, Edmonton, AB, Canada T6G 2G7

Abstract This paper reports on a rough set approach to classifying meteorological volumetric radar data used to detect storm events responsible for summer severe weather. The classiﬁcation of storm cells is a diﬃcult problem due to the complex evolution of storm cells, the high dimensionality of the weather data, and the imprecision and incompleteness of the data. A rough set approach is used to classify diﬀerent types of meteorological storm events. A considerable of diﬀerent classiﬁcation strategies techniques have been considered and compared to determine which approach will best classify the volumetric storm cell data coming from the Radar Decision Support System database of Environment Canada. The criterion for comparison is the accuracy coeﬃcient in the classiﬁcation over a testing data. The contribution of this paper is a new application of rough set theory in classifying meteorological radar data. â Elsevier Science B.V. All rights reserved. Ó 2002 Elsevier Science B.V. All rights reserved. Keywords: Classiﬁcation; Data mining; Decision classes; Pattern recognition; Rough sets; Meteorological volumetric radar data; Storm event

1. Introduction In this paper, the rough set approach (Pawlak, 1991; Bazan, 1998; Ohrn, 1999; Ohrn et al., 1998; Wroblewski, 1995; Komorowski et al., 1998; Nguyen, 1999) is used to classify diﬀerent types of

*

Corresponding author. Tel.: +1-204-474-9603; fax: +1-204261-4639. E-mail address: [email protected] (J.F. Peters).

storm events. Several classiﬁcation strategies and preprocessing techniques are taken (Fayyad et al., 1996). This work compares also the rough set classiﬁcation strategies with other ones presented in (Alexiuk et al., 1999; Alexiuk, 1999; Li et al., 1999; Li et al., 2000; Ramirez et al., 2000) to determine which approach will best classify the volumetric storm cell data coming from the Radar Decision Support System (RDSS) database of Environment Canada (Westmore, 1999). A radar data processing system gathers meteorological

0167-8655/03/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 8 6 5 5 ( 0 2 ) 0 0 2 0 3 - 9

912

J.F. Peters et al. / Pattern Recognition Letters 24 (2003) 911–920

volumetric radar data by conducting a volume scan. Meteorologists use these radar data to detect thunderstorms. Radar subsystem exists that allows operational meteorologists to focus their attention on the regions of interest, known as storm cells, within the volumetric radar scan. When a storm cell is found a number of parameters known as products are derived including the cell’s geographical location, volume, vertically integrated reﬂectivity, precipitation accumulation, maximum wind gust potentials, gradient proﬁles (Lakshmanan and Witt, 1997) and bounded weak echo regions. But it is diﬃcult to classify detected storm cells into speciﬁc types of storm events due to a number of confounding factors (Denoux and Rizand, 1995) such as incomplete data, complex evolution of storm cells and high dimensionality of the data. Our objective is to identify patterns (rules) in the data that indicate, with a high degree of accuracy, the onset of a severe weather event using either the derived features of matched-cell ﬁles from RDSS (Westmore, 1999), or the raw data of the volume scans. Several approaches (including fuzzy clustering (Alexiuk, 1999), neural networks (Li et al., 1999), and (Alexiuk et al., 1999), genetic algorithms (Li et al., 2000), the support vector machine (Ramirez et al., 2000) have been attempted for classifying volumetric storm cells. Alexiuk et al. (1999) used a group averaging to represent each storm cell. They stated a testing data classiﬁcation accuracy of 75% using neural networks with fuzzy labels to deal with the imprecision in the sampling and conﬁrmation of the cells. This result is quite similar to the 75.54% obtained by Ramirez et al. (2000) in the ten decision classes with the neural network approach. The best result presented by Alexiuk et al. (1999) was 80% of accuracy using fuzzy cmeans clustering with the option of rejecting any data point that was assigned to a cluster whose ratio of dominant vectors to total vectors was small. This result is outperformed by the support vector machine approach, because without any rejection the achieved accuracy is 80.95% for the ten decision classes. In this paper, the rough set approach is used to classify both four types of storm events: hail, heavy rain, tornado and wind or tenth types of

storm events: hail, heavy rain, tornado, wind, hail or rain, hail or tornado, hail or wind, rain or tornado, rain or wind, and tornado or wind. This work compares also the rough set classiﬁcation strategies with other ones presented in (Ramirez et al., 2000) to determine which approach will best classify the volumetric storm cell data coming from the RDSS database of Environment Canada. The criterion for comparison is the accuracy coeﬃcient in the classiﬁcation over testing data. The results obtained with the rough set approach based on the decision tree method show that it is a little better than the approaches presented in (Alexiuk et al., 1999), and Ramirez et al. (2000) in terms of accuracy in volumetric storm cell classiﬁcation. The structure of this paper is as follows. Volumetric storm cell data are described in Section 2. A short description of investigated methods for the volumetric storm cell classiﬁcation is given in Section 3. In Section 3, the methodology and experimental results are presented.

2. Data acquisition and derived features The data studied in this paper came from Environment Canada’s radar stations in Broadview, SK, and Vivian, MB, between May 1997 and September 1999. A total of 577 storm cells were localized by the Vivian radar in Manitoba for summer of 1997 and by the Broadview radar for July of 1998. This data is referred to as raw radar data. In order to get meaningful insight into the weather events from the raw radar data, postprocessing analysis must be performed. Environment Canada currently uses software developed by InfoMagnetics Technologies Corporation known as RDSS to reads the raw radar data and generates matched-cell ﬁles based on a continuity of statistical parameters of the raw data over a period of time and across a region of space. 2.1. Raw radar data acquisition The weather raw radar data are currently gathered by meteorological volumetric radar scan. The process of collecting the data is described in detail in (Westmore, 1999). Brieﬂy, a radar station

J.F. Peters et al. / Pattern Recognition Letters 24 (2003) 911–920

transmits a burst of microwave radiation in a particular direction, and then receives energy reﬂected back from particles in the atmosphere. When a burst of radar microwave radiation encounters a particle (usually water droplets, ice crystals or dust), some of that energy is absorbed by the particle, the rest is scattered in all directions and some of it is reﬂected back in the direction of the source. The bigger the particle, or the better its scattering characteristics, the more energy is returned to the source. The energy received back at the source is measured, and the reﬂectivity factor z is calculated. The radar reﬂectivity factor z is measured in decibels as dBZ (a unit of z per volume). Large dBZ values imply a high density of particles per unit volume, which increases the probability of heavy rain, hail or snow. The radar data are gathered by scanning 360 around the azimuth at a prescribed number of angles of elevation witch is known as a volume scan. Each volume scan takes a certain amount of time to complete, which for our data is 5 min. Within each volume scan, there is an entry for each angle of azimuth and elevation giving the dBZ value and the range (distance to reﬂector). The raw radar data indicating the dBZ value of the reﬂectivity at a measured distance (range) for each angle of azimuth and elevation within the volume scan are called the direct features of a weather event. The volumetric scan data are processed by the RDSS system. RDSS is a knowledge-based pattern recognition weather radar decision support tool for meteorologists responsible for real-time detection and analysis of storms. RDSS deﬁnes a storm cell to be a spatial cluster of high dBZ values. A storm cell snapshot is a collection of features of a storm cell, observed at a speciﬁc instance in time. RDSS maintains a collection of storm cells obtained from volumetric radar scans with a minimum reﬂectivity threshold (47 dBZ) that is indicative of storm severity. 2.2. Derived features By applying a meteorological analysis to the dBZ values in the volume scans, RDSS is able to derive additional features of the weather event. These derived features give the meteorologist a

913

diﬀerent view of the information contained in the raw data, enabling them to make more accurate predictions about weather. It was by considering the wealth of information available in the derived features that researchers saw an opportunity to apply advanced pattern recognition and further post-processing techniques to this data for the purpose of identifying more subtle relationships between the derived features and severe weather. There are numerous features derived by RDSS, some of them two-dimensional, but this paper has been primarily concerned with features listed in Table 1 (a detailed description is given in (Shan, 2001)). As RDSS calculates the derived features of a weather event, it identiﬁes the growth, movement and dissolution of storm cells. This process can identify several storm cells in a single volume scan, and then track the lifetimes of these storm cells from the ﬁrst volume scan, where it is identiﬁed, to the last volume scan, where it has dissolved to the point where the dBZ values have fallen below the Table 1 Listing of 22 derived features and one decision with data type and description No. Feature

Data type þ

1 2 3 4

Z value X value Y value Core volume

R Nþ Nþ Nþ

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Core height Supercell severity Wind gust severity Hail occurrence Core tilt angle Supercell ﬂag Joint count Split count Tilt X Tilt Y Tilt Z Velocity set ﬂag Velocity X Velocity Y Velocity Z Core X Core Y Orientation Cell type

Rþ {0; 1; 2; 3} {0; 1; 2; 3} {0; 1; 2} Rþ {0, 1} {0, 1} {0, 1} Rþ Rþ Rþ {0, 1} Rþ Rþ Rþ Rþ Rþ Rþ {1; 2; 3; 4}

Description Height oﬀset (km) X extent of a cell (km) Y extent of a cell (km) Volume of a cell core (km3 ) Height of a cell core Heuristic Heuristic Heuristic Core ﬁtted angle Heuristic Has a cell joined? Has a cell split? Centroid parameter Centroid parameter Centroid parameter Next ﬁeld available? (km/h) (km/h) (km/h) Size of a cell core Size of a cell core Orientation of a vector Decision

914

J.F. Peters et al. / Pattern Recognition Letters 24 (2003) 911–920

threshold. This information is grouped together by RDSS and output as a matched-cell ﬁle. These matched-cell ﬁles are saved as unsigned characters; so much of the ﬁle cannot be interpreted by an ASCII text viewer. In order to make the matched-cell ﬁles readily accessible to further postprocessing, we extract the data from these matched-cell ﬁles by the modiﬁed Project EC-NRC program ﬁles (Li et al., 1999; Dietrich, 2000). 2.3. Analysis of radar data To correlate the list of features in Table 1 to a weather condition, one must have reference to a list of observed events. Such a list is called a groundtruth event table (a sample groundtruth table for hail is given in Fig. 1). Groundtruth tables are compiled by Environment Canada based on reports from weather stations and volunteers. The groundtruth events are subdivided into categories, such as (1) hail, (2) rain, (3) tornado, and (4) wind. Indeed, one of the biggest hindrances to this project is the lack of groundtruth observations compared to the number of matched-cell ﬁles indicating that there may be an observable event. Having a list of desired features and a groundtruth table, we generate data for further analysis by cross-referencing the list of groundtruth events with all of the matched-cell ﬁles based on the time and location of groundtruth event. This means choosing a suitable ‘‘window’’ in time and space to which the both the groundtruth event and matched-cell ﬁles belong. Usually, one or more matched-cell ﬁles pertain to each groundtruth observation. Sometimes, the same

matched-cell ﬁle may be associated with more than one groundtruth event––hail and rain, for example. This could be minimized by matching the groundtruth event to the speciﬁc cell-snapshot within the matched-cell ﬁle. The cross-reference process is realized by Matlabâ program and output is a text ﬁle with a header and two columns. The header is just the integer number of row entries to fellow. The ﬁrst column of data is the ﬁlename and the second column of data is the integer relating to groundtruth category {1; 2; 3; 4}. The data for further analysis are generated by C++ program StormIT (Li et al., 1999) which uses the Matlab program’s output text ﬁle as input. The analysis data output is a text ﬁle of 23 columns whose ﬁrst 22 columns are the 22 features listed in Table 1 and 23rd column is the groundtruth classiﬁcation. There will be as many rows in the output as there are Cell Snapshots in all of the matchedcell ﬁles. The output ﬁles can be imported into the RSES system (RSES, 2001) and other platforms for the subsequent analyses (e.g. Rosetta, 2001; Ohrn et al., 1998). The volumetric data used in this work were obtained from the Vivian radar in Manitoba for the summer of 1997 and from the Broadview radar for July of 1998. A time and distance range of 10 min, 0.25° latitude and 0.45° longitude of the groundtruth event are used. 2.4. Consistency analysis The decision table constructed on the base of a meteorological volumetric radar data is inconsistent for about 25% of all considered objects

Fig. 1. Sample groundtruth event table for hail, May 1997.

J.F. Peters et al. / Pattern Recognition Letters 24 (2003) 911–920

(Ramirez et al., 2000). There are 146 objects with two diﬀerent decision values distributed in the following way: 20 objects are either hail or rain, 10 objects are either hail or tornado, 33 objects are either rain or tornado, 33 objects are either rain or wind and 50 objects are either tornado or wind. There are 431 objects with only one decision value. In order to cover those objects with two decision values we have added six more decision classes to the four originals (hail, rain, tornado, wind), see Table 3.

3. Methodology and experimental results There are a considerable number of identical cells (that is, the 22 features used to describe them have the same values) with two diﬀerent labels. This happened for about 25% of the cells. There are 146 cells with two diﬀerent labels and 431 cells with only one label. This indicates the uncertainty of the label assignments. To solve this problem we have added six more classes to the four originals (hail, rain, tornado, wind) in order to cover those cells with two labels. For both the 4-classes decision and the 10-classes decision, the data have been randomly resampled into the training set and the testing set. 75% of the data will form the training set and remaining 25% form the test set due to following two reasons: (1) our data set is not large (enough large training data set is needed to discovery the hidden patterns that represent the knowledge in the whole data); (2) we have the same resample conditions in the paper by Ramirez et al. (2000) so that we can compare the results obtained with diﬀerent methods. This resampling was performed ﬁve times over the whole data set in order to obtain ﬁve groups of training and testing data sets for each type of classiﬁcation. The experiments have also been tried on the data (50% of the data for the training data and 50% of the data for the testing data) and the similar results can be gained individually. But the stability of the results for 75% and 25% seems to be a bit better than that of the results for 50% and 50%. It indicates that enough data set should be provided. The mean pattern distribution for ﬁve groups for the 4-class and 10-class decision is shown in Tables 2 and 3, respectively.

915

Table 2 Pattern distribution for four decision classes Decision class, Vd

Class name

Object count

Training set size

1 2 3 4

Hail Rain Tornado Wind

166 54 265 92

126 36 207 64

40 18 58 28

Total

577

433

144

Testing set size

Table 3 Pattern distribution for ten decision classes Decision class, Vd

Class name

Object count

Training set size

1 2 3 4 5

Hail Rain Tornado Wind Hail or rain Hail or tornado Hail or wind Rain or tornado Rain or wind Tornado or wind

150 22 207 52 20

112 17 155 46 16

38 5 52 6 4

10

6

4

0

0

0

33

25

8

33

23

10

50

33

17

Total

577

433

144

6 7 8 9 10

Testing set size

In supervised learning, we are given a set of labeled example objects in a decision system S, and we want to construct a mapping dj that maps elements in U to elements in Vd , using only attributes contained in A. In practice S is almost always a ﬁnite and limited collection of possible examples, it is customary to randomly divide the examples in S into two disjoint subsets, a training set and a test set. The training set is used to construct j, while the test set is used to assess its performance. Under assumption that the two sets comprise independent samples, this ensures us that the performance estimate will be unbiased. The Rosetta toolset utilizes a confusion matrix that summarizes the performance of a classiﬁer j, applied to the objects in an information system S. A confusion matrix C is a cardðVd Þ cardðVd Þ matrix with integer entries

916

J.F. Peters et al. / Pattern Recognition Letters 24 (2003) 911–920

that summarizes the performance of a classiﬁer j, applied to the objects in a decision system S (Ohrn, 1999). The entry Cði; jÞ counts the number of objects that really belong to the decision class i, but were classiﬁed by j as belonging to the decision class j, i.e., Cði; jÞ ¼ cardðfu 2 U : dðuÞ ¼ igÞ and dj ðuÞ ¼ j. Of course, it is desirable for the diagonal entries to be as large as possible. The accuracy of a classiﬁer is represented by a probability obtained from a confusion matrix C as in (1) P i Cði; jÞ PrðdðuÞ ¼ dj ðuÞÞ ¼ P P : ð1Þ i j Cði; jÞ We assume that Vd is the set of integers {1; 2; 3; 4} for 4-classes decision and {1; 2; 3; 4; 5; 6; 7; 8; 9; 10} for 10-classes decision given in Tables 2 and 3. The entry Cði; jÞ counts the number of objects that really belong to class i, but were classiﬁed by j as belonging to class j. 3.1. Experimental results In this section, the results of experiments, which were performed independently for the 4-classes

decision and 10-classes decision over the ﬁve randomly resampled groups, are presented. The following abbreviations are used in this section: FR (full reducts), OOR (object-oriented reducts), GR (genetic reducts), DR (dynamic reducts), and DT (decomposition tree). The results presented include (1) average over the ﬁve groups for the training and testing confusion matrices, (2) performance evaluation of the used classiﬁers. Tables 4 and 5 summarize the classiﬁcation results with four and ten decision classes using the Rosetta system, respectively. Tables 6 and 7 summarize the RSES classiﬁcation results with four and ten decision classes, respectively. Tables 8 and 9 summarizes the classiﬁcation results for 4- and 10-classes using other methods, namely, k-nearest neighbor (1NN, NNET), radial basis functions (RBF), support vector machine (1-v-R), and directed graphs (DDAG) reported in (Alexiuk, 1999; Ramirez et al., 2000). Rules have been generated using various reduct and decision tree methods during this study. A detailed explanation of these methods is given in Suraj et al. (2001). We brieﬂy explain these

Table 4 Rosetta classiﬁcation results with four decision classes Method

FR

OOR

GR-FR

GR-OOR

DT

Avg.

Max.

Avg.

Max.

Avg.

Max.

Avg.

Max.

Avg.

Max.

Training Testing

0.912 0.563

0.927 0.596

0.912 0.729

0.920 0.763

0.895 0.611

0.910 0.641

0.893 0.639

0.905 0.671

0.912 0.750

0.914 0.811

Table 5 Rosetta classiﬁcation results with ten decision classes Method Training Testing

FR

OOR

GR-FR

GR-OOR

DT

Avg.

Max.

Avg.

Max.

Avg.

Max.

Avg.

Max.

Avg.

Max.

1.000 0.618

1.000 0.649

1.000 0.743

1.000 0.781

0.952 0.708

1.000 0.743

0.952 0.611

1.000 0.642

0.963 0.708

1.000 0.735

Table 6 RSES classiﬁcation results with four decision classes Method Training Testing

FR

OOR

GR-FR

GR-OOR

DT

Avg.

Max.

Avg.

Max.

Avg.

Max.

Avg.

Max.

Avg.

Max.

0.897 0.615

0.912 0.652

0.904 0.594

0.912 0.622

0.893 0.606

0.895 0.644

0.893 0.603

0.895 0.652

1.000 0.659

1.000 0.741

J.F. Peters et al. / Pattern Recognition Letters 24 (2003) 911–920

917

Table 7 RSES classiﬁcation results with ten decision classes Method Training Testing

FR

OOR

GR-FR

GR-OOR

DT

Avg.

Max.

Avg.

Max.

Avg.

Max.

Avg.

Max.

Avg.

Max.

1.000 0.751

1.000 0.789

1.000 0.751

1.000 0.789

1.000 0.754

1.000 0.780

1.000 0.751

1.000 0.780

1.000 0.820

1.000 0.877

Table 8 Classiﬁcation results with four decision classes using other methods Method

1-NN

NNET

RBF

1-v-R

DDAG

Training Testing

100 63.56

91.67 63.90

91.81 54.86

88.35 67.26

91.32 68.63

Table 9 Classiﬁcation results with ten decision classes using other methods Method

1-NN

NNET

RBF

1-v-R

DDAG

Training Testing

100 78.58

95.71 75.54

99.95 77.77

98.32 78.99

99.51 80.95

methods here. Using the FR method, rules are selected relative to a set of conditional attributes that are necessary and suﬃcient to discernible objects from diﬀerent decision classes in the same degree as by using all conditions. Using the kOOR, rules are selected relative to a set of those conditional attributes that are necessary and sufﬁcient to discernible object k from all objects with another decision values. The DR method is used to select rules relative to attributes that are a reduct of some kind (FR or k-OOR) on most of subtables received after resampling an input table. The GR method is used to select rules relative to a set of attributes that were selected by a genetic algorithm using FR or k-OOR methods. It should be added

that SVM is a classiﬁer based on optimal hyperplanes or optimal surfaces generation method. After interpretation of objects from a decision table as points of space Rn (n is a number of numeric conditions), the hyperplanes (surfaces) separating decision classes in the best way are searched. Best here means as little as possible misclassiﬁed objects and as maximal as possible distance between the separating hyperplanes (surface) and the decision classes. For data with more than two decision classes, a set of hyperplanes or surfaces that separates decision classes according to a speciﬁc graph (DDAG) has been used. The same proportion of training table size and test table size used in (Alexiuk, 1999; Ramirez et al., 2000) has been preserved in the study reported in this paper. The experimental data has been randomly chosen ﬁve times. Mean distribution of patterns is presented in Tables 10 and 11. 3.2. Observations In the four decision class case, the decision tree (DT) method with 81% accuracy signiﬁcantly outperforms the other rough set methods used in Table 4 all of the methods reported in Table 8 (Alexiuk, 1999; Ramirez et al., 2000). The results from training and testing with ten decision classes are mixed. That is, from Table 5, it can be seen that the OOR and GR-FR methods produce slightly better results the DT method using

Table 10 Distribution of patterns for four decision class table Class size

Training pattern size Ramirez et al. (2000)

Test pattern size Ramirez et al. (2000)

Training pattern avg. size Peters et al. (2001a)

Test pattern avg. size Peters et al. (2001b)

1 2 3 4

124 40 198 69

42 14 67 23

126 36 207 64

40 18 58 28

918

J.F. Peters et al. / Pattern Recognition Letters 24 (2003) 911–920

Table 11 Distribution of patterns for ten decision class table Class size

Training pattern size Ramirez et al. (2000)

Test pattern size Ramirez et al. (2000)

Training pattern avg. size Peters et al. (2001a)

Test pattern avg. size Peters et al. (2001b)

1 2 3 4 5 6 7 8 9 10

112 16 155 39 15 7 0 24 24 37

38 6 52 13 5 3 0 9 9 13

112 17 155 46 16 6 0 25 23 33

38 5 52 6 4 4 0 8 10 17

Rosetta. By contrast, from Table 7, we found that the DT method yielded signiﬁcantly better results than all of the other rough set methods using RSES. In addition, the DT method with an accuracy of 87% out-performed each of the other methods using RSES with ten decision classes reported in (Alexiuk, 1999; Ramirez et al., 2000). Coincidentally, it is the DDAG method (80.95% accuracy) in Table 9 that approaches the performance of the DT method in Table 7. One can conclude from what we have seen so far that rough set methods (i.e., DT method) evidently provide better classiﬁcation results than what has been reported previously. These results have been corroborated by a more recent study Suraj et al. (2001). Although our approach based on rough set theory looks quite promising, as demonstrated by results of our experiments, more experiments with the data is needed. Therefore, more radar data and ground truth information need to acquire so that they provide the researchers with rich source of

data. One should consider new representations of the data that will facilitate more discretization of the data. So, instead of using whole numbers like 0; 1; 2; 3, one might consider fractional or realvalues in the interval [0,3]. The idea would be to strive for greater reﬁnement, and care in the representation of data. It is desirable that meteorologists introduce reﬁnements in the classiﬁcation of the weather data. For example, it would be very helpful if the decision class ‘‘hail mixed with rain’’ (instead of ‘‘hail or rain’’) is introduced at the source of the data, not as a result of data mining. More radar observations using the reﬁned observation system are needed to improve the weather data classiﬁcation. Finally, in the event that meteorologists introduce new decision classes and/or new features of the radar observations, then the classiﬁcation system needs to be recalibrated and tested. Further experiments with the same weather data have recently been carried out by Suraj et al. (2001)

Fig. 2. Performance comparison (4-class case).

J.F. Peters et al. / Pattern Recognition Letters 24 (2003) 911–920

919

Fig. 3. Performance comparison (10-class case).

and Suraj and Rzasa (2001), where three rough set toolsets were used, namely, Rosetta, RSES, and LERS (Grzymala-Busse, 1997). The more recent experiments have shown that the GR-OOR and the OOR methods produce the best results in classifying the weather data for the four decision class case (see Fig. 2). By contrast, the DT method out-performs the other methods in the ten decision class case (see Fig. 3).

4. Conclusion Our main objective is to ﬁnd the best method classifying unseen objects connected with the weather data. Results of our experiments have been presented and corroborated for the four and ten decision class cases. Using rough set methods, the result of recent studies indicate that the classiﬁcation based on the GR-OO and OOR methods have signiﬁcantly better performance with four decision classes than all other methods, and the DT method has been demonstrated to be slightly better than other methods mentioned in this study with ten decision classes. Moreover, the applicability of the rough set approach for classifying new objects from volumetric storm cell data seems sometimes to be a little better than the approaches presented in (Ramirez et al., 2000; Pawlak et al., 2002a,b). The results reported in this paper are promising. It will be helpful to corroborate these ﬁndings by considering the application of rough set methods with additional weather data. In addition, it will be essential to develop methodologies for classiﬁcation of unseen objects with using in-

formation fusion and attribute relevance measurements made possible with the rough integral (Pawlak et al., 2001, 2002a,b; Peters et al. LNAI, 2001a; Peters et al. IFSA, 2001b), and simulated annealing method (Borkowski, 2000). Acknowledgements The authors of this article would like to thank Prof. A. Skowron and Prof. J. Komorowski for making RSES and Rosetta accessible for this research. The authors also extend their thanks to Maciej Borkowski for his help in preparing this article for publication. The research of James Peters and Sheela Ramanna has been supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) research grant 185986 and research grant 194376, respectively. The research of Zbigniew Suraj is supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC) and also partially supported by the National Committee for Scientiﬁc Research in Poland under grant #8T11C02519. The research of Songqing Shan was supported by NSERC and the National Research Council (NRC) Laboratory of Winnipeg. References Alexiuk, M., Pizzi, N., Pedrycz, W., 1999. Classiﬁcation of volumetric storm cell patterns. Proc. of the 1999 IEEE Canadian Conference on Electrical and Computer Engineering, Edmonton. Alexiuk, M.D., 1999. Pattern recognition techniques as applied to the classiﬁcation of convective storm cells. M.Sc. Thesis. University of Manitoba.

920

J.F. Peters et al. / Pattern Recognition Letters 24 (2003) 911–920

Bazan, J., 1998. A comparison of dynamic and non-dynamic rough set methods for extracting laws from decision tables. In: Polkowski, L., Skowron, A. (Eds.), Rough Sets in Knowledge Discovery 1: Methodology and Applications. Physica-Verlag, Heidelberg, pp. 321–365. Borkowski, M., 2000. Konstruowanie system ow decyzyjnych ze zmienna przestrzenia atrybut ow. M.Sc. Thesis. Supervisor: A. Skowron, Institute of Mathematics, Warsaw University (in Polish). Denoux, T., Rizand, P., 1995. Analysis of radar images for rainfall forecasting using neural networks. Neural Comput. Appl. 3, 50–61. Dietrich, J., 2000. Report on Project EC-NRC. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., 1996. The KDD process for extracting useful knowledge from volumes of data. Commun. ACM 39 (11), 27–34. Grzymala-Busse, J.W., 1997. A new version of the rule induction system LERS. Fundam. Inform. 31, 27–39. Komorowski, J., Pawlak, Z., Polkowski, L., Skowron, A., 1998. A rough set perspective on data and knowledge. In: Polkowski, L., Skowron, A. (Eds.), Rough Sets in Knowledge Discovery 1: Methodology and Applications. PhysicaVerlag, Heidelberg, p. 1998. Lakshmanan, A., Witt, A., 1997. A fuzzy logic approach to detecting severe updrafts. AI Appl. 11, 1–12. Li, P.C., Pizzi, N., Pedrycz, W., 1999. Classiﬁcation of hail and tornado storm cells using neural networks. Project EC-NRC Research Reports. Li, P.C., Pizzi, N., Pedrycz, W., Westmore, D., Vivanco, R., 2000. Severe storm cell classiﬁcation using derived products optimized by genetic algorithm. Proc. of the 2000 IEEE Canadian Conference on Electrical and Computer Engineering, Halifax. Nguyen, S.H., 1999. Data regularity analysis and applications in data mining. Ph.D. Thesis. Faculty of Math., Comp. Sci. and Mechanics, Warsaw University, Warsaw. Ohrn, A., 1999. Discernibility and rough sets in medicine: tools and applications. Ph.D. Thesis. Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway. Ohrn, A., Komorowski, J., Skowron, A., Synak, P., 1998. The design and implementation of a knowledge toolkit based on rough sets––The Rosetta system. In: Polkowski, L., Skowron, A. (Eds.), Rough Sets in Knowledge Discovery 1: Methodology and Applications. Physica-Verlag, Heidelberg, pp. 376–399. Pawlak, Z., 1991. Rough Sets––Theoretical Aspects of Reasoning about Data. Kluwer Academic Publ, Dordrecht. Pawlak, Z., Peters, J.F., Skowron, A., Suraj, Z., Ramanna, S., Borkowski, M., 2001. Rough measures: theory and applications. In: Hirano, S., Inuiguchi, M., Tsumoto, S. (Eds.),

Bulletin of the International Rough Set Society, Vol. 5 (1/2), pp. 177–184. Pawlak, Z., Peters, J.F., Skowron, A., Suraj, Z., Ramanna, S., Borkowski, M. (2002a). Rough measures and integrals. In: Hirano, S., Inuiguchi, M., Tsumoto, S. (Eds.), Lecture Notes in Computer Science, in press. Pawlak, Z., Peters, J.F., Skowron, A., Suraj, Z., Ramanna, S., Borkowski, M. (2002b). Rough measures, rough integrals, and sensor fusion. In: Hirano, S., Inuiguchi, M., Tsumoto, S. (Eds.), Rough Set Theory and Granular Computing. Berlin: Physica Verlag, in press. Peters, J.F., Ramanna, S., Skowron, A., Borkowski, M., 2001a. Fusion of remote sensors in a web agent: a rough measure approach. In: Zhong, N., Yao, Y., Liu, J., Ohsuga, S. (Eds.), Web Intelligence: Research and Development, Lecture Notes in Artiﬁcial Intelligence No. 2198, pp. 413– 422. Peters, J.F., Ramanna, S., Skowron, A., Stepaniuk, J., Suraj, Z., Borkowski, M., 2001b. Sensor fusion: a rough granular approach. In: Proc. Joint 9th International Fuzzy Systems Association (IFSA) World Congress and 20th North American Fuzzy Information Processing Society (NAFIPS) Int. Conf., Vancouver, British Columbia, Canada, pp. 1367–1372. Ramirez, L., Pedrycz, W., Pizzi, N., 2000. Storm Cell Classiﬁcation With the Use of Support Vector Machines (draft version). Rosetta, 2001. The ROSETTA WWW homepage. Available from . RSES, 2001. The RSES WWW homepage. Available from . Shan, S., 2001. Classiﬁcation of weather data: a rough set approach. M.Sc. Thesis. Supervisor: Peters, J.F., Department of Electrical and Computer Engineering, University of Manitoba. Suraj, Z., Peters, J.F., Rzasa, W., 2001. A comparison of diﬀerent decision algorithms used in volumetric storm cells classiﬁcation. In: Czaja, L. (Ed.), Proc. of the Concurrency Speciﬁcation & Programming Workshop, Warsaw, October, pp. 269–279. Suraj, Z., Rzasa, W., 2001. Volumetric Storm Cell Classiﬁcation with the Use of Rough Set Methods, Zeszyty Naukowe WSIiZ, Nr 1/2001 (in print). Westmore, D., 1999. Radar Decision Support System: User Manual. InfoMagnetics Technologies Corporation Technical Document. Wroblewski, J., 1995. Finding minimal reducts using genetic algorithms. In: Wang, P.P. (Ed.), Proc. Of the International Workshop on Rough Sets Soft Computing at Second Annual Joint Conference on Information Sciences (JCIS’95), Wrightsville Beach, NC, pp. 186–189.

Recommend Documents

Classification of Weather Radar Images using ... - Semantic Scholar

Self-learning classification of radar features for ... - Semantic Scholar

Impact of surface meteorological measurements ... - Semantic Scholar

Recent Biennial Variability of Meteorological ... - Semantic Scholar

Characterization of a Volumetric Metamaterial ... - Semantic Scholar

Volumetric Parameterization of Complex Objects ... - Semantic Scholar