Development of neural network committee machines for automatic

Report 8 Downloads 23 Views
Pattern Recognition 37 (2004) 2039 – 2047 www.elsevier.com/locate/patcog

Development of neural network committee machines for automatic forest "re detection using lidar Armando M. Fernandesa;∗ , Andrei B. Utkinb , Alexander V. Lavrovb;1 , Rui M. Vilara a Departamento

de Engenharia de Materiais, Instituto Superior Tecnico, Av. Rovisco Pais 1, Lisbon 1049-001, Portugal b INOV—Inesc Inovac ) a˜ o, Rua Alves Redol 9, 1000-029 Lisbon, Portugal Received 30 May 2003; accepted 7 April 2004

Abstract Lidar has considerable potential as an early forest "re detection technique, presenting considerable advantages when compared to the passive detection methods based on infrared cameras currently in common use, due to its higher sensitivity, ability to accurately locate the "re and the fact that it does not need line of sight to the 3ames. The method has recently been demonstrated by the authors, but its automation requires the availability of a rapid signal analysis technique, for prompt alarm emission whenever required. In the present paper a novel method of classifying lidar signals using committee machines composed of neural networks is proposed. A new method based on ROC curves and the Neyman-Pearson criterion is used to choose the optimal number of training epochs for each neural network in order to avoid over"tting. The best committee machine, obtained on the basis of these principles and selected to lead to the lowest percentage of false alarms for a true detection percentage of 90% for a test set created by adding random noise to patterns obtained experimentally, was composed of three single-layer perceptrons and presented a true detection e:ciency of 94.4% and 0.553% of false alarms in the validation set. ? 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Lidar; Forest "re; Automatic detection; Committee machine; Neural network; Backpropagation; Early stopping; Hold-out

1. Introduction Light detection and ranging (LIDAR) has been widely used in atmospheric and environmental studies [1–3]. A lidar is composed of a radiation emitter, receiver optics (usually a telescope), a photo-detector, and signal acquisition and processing hardware and software. Frequently, the emitter is a laser that sends short intense radiation pulses through the atmosphere. Part of the radiation that is backscattered by the molecules and particles present in the atmosphere is collected by the receiver and measured as a function of time. These obstacles appear in the lidar signal as peaks. The distance to an obstacle R may be calculated using the equation ∗

Corresponding author. Fax: +351-21-841-81-20. E-mail addresses: [email protected] (A.M. Fernandes), [email protected] (R.M. Vilar). 1 On leave from Russian Science Center “Applied Chemistry”, St. Petersburg 197198, Russia.

R = c=2, where  represents the time delay between the laser pulse emission and the reception of the backscattered radiation, and c stands for the velocity of light. Early forest "re detection up to distances of 6:5 km using a single-wavelength lidar technique was recently demonstrated by the authors [4–6]. Full automation of the detection process requires prompt identi"cation of the peaks corresponding to smoke plumes in the lidar return signals, but these signals often present other peaks, resulting from atmospheric perturbations, electric cables and other obstacles, which complicate automatic recognition of smoke plumes. The stochastic character of both atmospheric phenomena and smoke-plume evolution in natural conditions, as well as the high variability of parameters that aIect smoke plume signatures in lidar return signals (weather conditions, "re burning rate, aerosol distribution, etc.) make neural methods particularly suitable for smoke signature recognition, as previously demonstrated by the authors [7,8]. In forest "re detection, the small number of smoke plume signatures

0031-3203/$30.00 ? 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2004.04.002

2040

A.M. Fernandes et al. / Pattern Recognition 37 (2004) 2039 – 2047

available for training requires the use of neural networks with a small number of weights, such as single-layer perceptrons, but these present an excessive level of false alarms. To overcome this limitation while keeping training simple, committee machines composed of neural networks [9] were applied in the present work to the recognition of smoke plume signatures in lidar signals resulting from experiments with small bon"res. Committee machines are associations of neural networks, each one specialising in solving part of a whole recognition problem. By associating several neural networks their capabilities are brought together in order to solve complex problems. Two types of neural networks were used in the committee machines: single-layer perceptrons containing only one processing neuron and several input nodes, and neural networks with two processing neurons in a cascade architecture [10], in which one neuron is connected to the input nodes and the other is connected to the previous neuron as well as to the input nodes. In the present paper a new method, based on Relative Operating Characteristic (ROC) curves [11,12], is proposed for choosing the optimal number of training epochs for the neural network. In order to avoid over"tting, the number of training epochs should be that which presents the best ROC curve for a given Neyman–Pearson criterion [13,14], which means the ROC curve with the maximum true detection percentage for a prescribed maximum percentage of false alarms. Neural network training was performed by minimising the sum-of-squares cost function using a previously described [7] algorithm based on backpropagation and called Polynomial Approximation with Periodically Restarted Conjugate Gradient (PPRCG). This algorithm uses a conjugate gradient to "nd the descent direction and polynomial interpolation to calculate the optimal learning rate for each training epoch. The neural networks have one input node with "xed input equal to +1 for bias representation and the neurons use a hyperbolic tangent activation function, with output between −1 and 1. The e:ciency of the committee machines was evaluated using a type of cross-validation commonly designated as hold-out [15], based on a validation set and a test set. Cross-validation is important for testing the ability to correctly classify patterns not included in the training set, and hence choosing the committee machines with the best generalisation ability. To assess the in3uence of random noise amplitude on the classi"cation e:ciency of the committee machines, a test set with patterns created by adding random noise of diIerent amplitudes to the experimentally obtained data was used. The patterns with arti"cially added noise were not included in the training and validation sets, since this could lead to biased neural networks [16]. 2. Pattern generation and characterisation Lidar curves containing smoke plume signatures were experimentally obtained from bon"res with a burning rate of

Fig. 1. Segment of a lidar curve with smoke plume signature at approximately 6:1 km.

0:02 kg=s. Each lidar curve (see Fig. 1) was composed of 2000 backscattered power measurements, corresponding to a spatial resolution of 6 m, and resulted from the accumulation of 32 or more lidar returns. To create the patterns for automatic classi"cation, a moving window with a width equal to 21 or 41 points was passed over the lidar signal, collecting experimental values when a local maximum coincided with the window centre. These patterns, which correspond to regions in the lidar signal where a smoke plume signature might occur, are then analysed by the committee machines. The width of the moving window was selected to be considerably larger than the typical smoke plumes peak width (approximately 7 points), in order to allow to obtain information about the shape of the peak and the background noise in its vicinity. The choice of two window width (21 and 41 points) allowed to put into evidence the in3uence of this width on the classi"cation results. This region-of-interest (ROI) approach is better than using the complete lidar return curve, because it enables the exact distance to the "re to be promptly determined in the event of an alarm and neural networks with only a small number of input nodes and weights and, consequently, a small number of training patterns are required. This segmentation of the lidar signal is possible because the smoke signature shape is independent of distance. Since the lidar curves present a time-dependent background originating from the electronics and atmospheric noise, in order to make the patterns background- and scale-independent, they were normalised so that the minimum and maximum values were equal to −0:9 and 0.9, respectively. This normalisation avoided obtaining weights that are, in modulus, much larger or much smaller than unity, which could result in neuron saturation and slower training. In order to characterise the noisiness of the patterns, a parameter called peak-to-noise-ratio (PNR) was de"ned. The concept of PNR is an extension of the de"nition of signal-to-noise ratio (SNR) to patterns where the central

A.M. Fernandes et al. / Pattern Recognition 37 (2004) 2039 – 2047

Fig. 2. PNR of experimentally obtained patterns.

peak itself results from atmospheric noise. It is de"ned as the ratio between the central peak amplitude and the standard deviation of the pattern points, excluding the seven central ones, which correspond to the approximate width of a peak from a smoke plume. Both values are calculated with respect to the background, de"ned by a root-mean-square interpolation of the pattern points, also excluding the seven central points. The PNR histograms of 141 smoke signatures and 8334 atmospheric noise patterns with 41 points used in the training and validation sets are shown in Fig. 2. The two histograms overlap, making it impossible to distinguish the smoke signatures from atmospheric noise patterns based only on PNR values, thus justifying the use of neural networks to solve this problem. 3. Structure of the committee machines In the present investigation, the neural networks in the committee machines were connected sequentially, each one eliminating the patterns that it classi"es as atmospheric noise and passing on the patterns that it considers to be smoke signatures. An alarm is generated when all the neural networks classify a pattern as a smoke signature. Since the neural networks are connected sequentially the ability to solve non-linearly separable problems is maintained even when single-layer perceptrons are used [17,18]. The neural networks that constitute the committee machine were trained with the same smoke signatures and noise patterns that caused false alarms in the previous neural network. In this way, the number of atmospheric noise patterns passing from one neural network to the next decreases with increasing number of neural networks in the committee machine [17]. Therefore, the maximum number of neural networks in a committee machine is limited by the minimum number of patterns required for training and validation of its last neural network.

2041

The purpose of each neural network in the committee machine is to "lter the largest number of atmospheric noise patterns with the smallest possible percentage of misdetections (calculated by subtracting the true detection percentage from 100%), hopefully resulting in a system presenting a high probability of true detection and low probability of emitting false alarms. In this context, the true detection percentage of a neural network or committee machine is the ratio between the detected smoke signatures and the total number of smoke signatures, while the percentage of false alarms is the ratio between the number of atmospheric noise patterns wrongly considered as smoke signatures and the total number of atmospheric noise patterns fed into that neural network or committee machine. The misdetection percentage for committee machines is the sum of the misdetection percentage of each neural network, because misdetected smoke signatures are discarded and do not progress to the next neural network, whereas the false alarm percentage for the committee machine is the product of the false alarm percentages of each neural network, because atmospheric noise patterns classi"ed as smoke pass from one neural network to the next. 4. Learning procedure In order to prevent over"tting, early stopping [19] was used in the training of the neural networks, which means stopping the training before the classi"cation error in the validation set starts to increase [20]. Several criteria have been described in the literature to automatically perform early stopping [19], but these methods do not provide the information required to establish a trade-oI between misdetections and false alarms. In order to achieve this objective, ROC curves, which are plots of the true detection percentage as a function of the percentage of false alarms, were used. These curves change with the number of training epochs, making them useful in detecting over"tting. The ROC curves presented in Fig. 3 were obtained for the validation set, by varying the detection threshold between −0:8 and 0.8 with a 0.1 step. The detection threshold is the output value of the neural network above which a pattern is classi"ed as a smoke signature. In the "gure l, m, n, j, and k represent the number of training epochs, and l ¡ m ¡ n. In this case l training epochs do not enable the highest true detection percentage to be achieved, because the training is not su:cient to extract all the possible information from the training set. Conversely, after n training epochs, the true detection percentage starts to fall, because the neural network is learning the particular characteristics of the training set. The optimum number of training epochs leads to the highest true detection percentage for each false alarm percentage, which corresponds to m training epochs in Fig. 3. Fig. 4 presents a situation where the previous criterion does not allow a clear decision because the ROC curves intersect. When this occurs, the optimum number of training

2042

A.M. Fernandes et al. / Pattern Recognition 37 (2004) 2039 – 2047

Fig. 3. ROC curves of a real neural network, for several numbers of training epochs with l ¡ m ¡ n.

machines with best performance on the validation set may end up over"tting the validation set itself, an independent set of patterns, called a test set, must be used to check this propensity for over"tting (hold-out cross-validation method Haykin [15]). In the present case it is di:cult to gather suf"cient experimental patterns for constructing the training, validation and test sets. To overcome this di:culty new patterns are created by adding noise to the experimental patterns [21–24]. Using these patterns for training has harmful eIects because the added noise changes the cost function used for training, increasing the probability of producing biased neural networks [16]. For this reason, only the experimental patterns were used in the training and the validation sets, while the test set was created by adding random noise of diIerent amplitudes to the experimental patterns. This test set enabled the in3uence of noise on the committee machine’s classi"cation e:ciency to be studied. A second test set generated in the same way was used for creating ROC curves to compare the committee machines’ e:ciency. 6. Numerical results 6.1. Study of di9erent committee machine structures

Fig. 4. ROC curves of a real neural network, for several numbers of training epochs showing intersections between curves, in contrast to the example in Fig. 3.

epochs must be chosen using the Neyman-Pearson criterion. According to this criterion the detection threshold and the optimal number of training epochs should maximise the true detection percentage, subject to the constraint that the percentage of false alarms does not exceed a prescribed value. The choice of this value results from a trade-oI between true detection and false alarms, a very low level of false alarms meaning a rigorous choice of patterns, which results in a decrease in the true detection percentage. 5. Testing generalisation ability with a small number of patterns The ability of the neural network to classify new patterns must be checked, using a validation set. Since the committee

Committee machines composed of 2, 3 or 4 single-layer perceptrons or cascade architecture neural networks were built. The training and validation sets were composed of 141 smoke signature patterns and 17174 atmospheric noise patterns with 21 points or 8334 atmospheric noise patterns with 41 points. The neural network training sets contained between 62 and 70 smoke signatures and a similar number of atmospheric noise patterns, while the validation set contained a number of atmospheric noise patterns similar to or higher than the number of smoke signature patterns. The neural networks composing the committee machines were selected from a total of 100, which were trained with the same training set and diIerent initial weights generated in the interval [ − 0:5; 0:5]. The results of the recognition tests performed with several committee machines on the validation set patterns are shown in Table 1. These results were compared on the basis of the ROC curves, as will be described in point 6.3. 6.2. Committee machine e;ciency versus noise amplitude As previously mentioned, the test set was composed of experimental patterns to which random noise was arti"cially added after pattern normalisation. Ten patterns were created from each experimental pattern, resulting in 1410 patterns containing smoke signatures, 171740 atmospheric noise patterns with 21 points and 83340 atmospheric noise patterns with 41 points for each value of the added noise amplitude. Fig. 5 shows the variation of the misclassi"cations and false alarm percentages as a function of added noise

A.M. Fernandes et al. / Pattern Recognition 37 (2004) 2039 – 2047

2043

Table 1 Committee machine results in the recognition of patterns from validation sets Committee machine number

Structure of each neural net

Number of weights

False alarms of each neural net (%)

Committee’s false alarms (%)

Committee’s misdetections (%)

1

22-1 22-1 22-1

22 22 22

9.7 15 48

0.707 (120/16971)

14.1 (10/71)

2

22-1-1 22-1-1 22-1-1

45 45 45

14 15 39

0.813 (138/16968)

14.1 (10/71)

3

22-1 22-1 42-1

22 22 42

20 15 18

0.553 (45/8131)

5.63 (4/71)

4

22-1-1 22-1-1 42-1-1

45 45 85

28 15 15

0.664 (54/8128)

5.63 (4/71)

5

22-1 22-1 22-1 42-1

22 22 22 42

20 15 30 19

0.173 (14/8069)

14.1 (10/71)

6

22-1-1 22-1-1 22-1-1 42-1-1

45 45 45 85

29 15 22 15

0.149 (12/8065)

15.5 (11/71)

7

22-1 42-1

22 42

12 7.8

0.915 (75/8195)

0 (0/71)

8

22-1-1 42-1-1

45 85

28 3.2

0.903 (74/8196)

0 (0/72)

amplitude for a committee machine composed of one, two and three neural networks. Increasing the number of neural networks in the committee machine leads to a fall in the false alarm percentage and an increase in the misdetection percentage, because the pattern classi"cation becomes more rigorous. A comparison between the committee machines that led to the results presented in Table 1 is illustrated in Fig. 6. Despite the diIerences in committee machine structures, the degradation of the true detection percentage with increasing noise amplitude was similar for all committee machines composed of neural networks with 42 input points. By contrast, committee machines 1 and 2, which are based on neural networks with 22 input points, presented slower degradation of true detection. The main reason for this behaviour is that when the pattern space has more dimensions the probability of creating patterns that present unusual features increases, leading to a higher misdetection rate. The false alarm

percentage decreases 1.5–3 times when the peak-to-peak amplitude of the added random noise increases from 2% to 34%, because of the destruction of the features causing false alarms. The committee machines based on cascade neural networks and those composed of single-layer perceptrons present similar generalisation abilities when considering misdetections or false alarms, an unexpected result because neural networks with cascade architecture can solve non-linearly separable problems, while single-layer perceptrons cannot, but it seems that in the present case two processing neurons do not enable a more realistic decision region to be obtained than just one. In general, increasing the amplitude of the random noise leads to a performance degradation in the classi"cation of smoke signature patterns and to lower false alarm percentages, in apparent contradiction of previous publications [7]. In fact, the contradiction does not exist because atmospheric noise is diIerent from random noise.

2044

A.M. Fernandes et al. / Pattern Recognition 37 (2004) 2039 – 2047

Fig. 5. Misdetections and false alarms as a function of the added noise for a committee machine with several neural networks.

6.3. Committee machine comparison by ROC curve analysis The ROC curves for the committee machines were obtained using a test set created by adding random noise to the experimental patterns. As previously, ten patterns were created from each experimental pattern, by adding to it random noise, with a maximum peak-to-peak amplitude of 4% of the pattern amplitude after normalisation. In Fig. 7 the region of the ROC curves corresponding to true detection percentages greater than 80% and false alarm percentages lower than 1% is plotted by changing the detection threshold of the last neural network in the committee machines. Committee machines numbers 3, 4, 7 and 8 present true detection percentages higher than 90% for false alarm percentages higher than 0.8%, while committee machines numbers 1, 2, 5 and 6 showed true detection percentages lower than 88% for all the false alarm percentages presented. Committee machines 1 and 2 led to the ROC curves with the lowest true detection percentages, because they are composed of neural networks with only 22 input points, containing less

Fig. 6. Comparison of misdetections and false alarms for several committee machines.

information than committee machines with 42 input point neural networks. The four neural networks of committee machines numbers 5 and 6 lead to low false alarm percentages, but could not achieve true detection percentages higher than 88%. Committee machine number 3, made up of 3 single-layer perceptrons, presented the lowest percentage of false alarms for a true detection percentage of 90%, while committee machine number 7, composed of 2 single-layer perceptrons, presented the highest true detection percentage for false alarm percentages above 0.6%. This committee machine presents higher true detection percentages than committee machines 4 and 8, composed of 3 and 2 cascade architecture neural networks, respectively. These results suggest that, although neural networks with cascade architecture have higher representational capability than single-layer perceptrons due to their ability to build non-linear decision regions, the increase from one to two processing neurons was not su:cient to improve the classi"cation results.

A.M. Fernandes et al. / Pattern Recognition 37 (2004) 2039 – 2047

2045

present comparable results, because using 2 neurons instead of one does not improve the neural network’s representational ability in order to signi"cantly improve pattern classi"cation. Finally, a committee machine using one single-layer perceptron of 42 input nodes and two single-layer perceptrons of 22 input nodes was created, demonstrating that it is possible to achieve a true detection percentage of 94.4% with 0.553% of false alarms for the validation set, and 0.477% of false alarms for a true detection percentage of 90% when using a set of arti"cially created patterns. 8. Summary

Fig. 7. ROC curves, constructed using a test set, for the committee machines described in Table 1.

7. Conclusions The possibility of applying committee machines to automatic detection of forest "res using lidar signals was demonstrated. Committee machines composed of neural networks connected sequentially were trained using a small number of smoke signatures and several thousand noise patterns, proving that it is possible to obtain a high true detection percentage and low false alarm percentage. A new method based on ROC curves and the Neyman–Pearson criterion was shown to produce neural networks that do not over"t training data. It can be concluded that committee machines using neural networks with 42 input points show better classi"cation e:ciencies than those using neural networks with 22 input points, because increasing the number of input points is equivalent to providing more information to the neural network; however, committee machines with a higher number of points are more sensitive to random noise addition. This study also revealed that committee machines composed of single-layer perceptrons and those composed of neural networks with 2 processing neurons in a cascade architecture

Lidar has considerable potential as an early forest "re detection technique, presenting considerable advantages when compared to the passive detection methods based on infrared cameras currently in common use, due to its higher sensitivity, ability to accurately locate the "re and the fact that it does not need line of sight to the 3ames. The method has recently been demonstrated by the authors, but its automation requires the availability of a rapid signal analysis technique, for prompt alarm emission whenever required. The stochastic character of both atmospheric phenomena and smoke-plume evolution in natural conditions, as well as the high variability of parameters that aIect smoke plume signatures in lidar return signals (weather conditions, "re burning rate, aerosol distribution, etc.) make neural methods particularly suitable for smoke signature recognition. In forest "re detection, the small number of smoke plume signatures available for training requires the use of neural networks with a small number of weights, such as single-layer perceptrons, but these present an excessive level of false alarms. To overcome this limitation while keeping training simple, committee machines composed of neural networks were applied in the present work to the recognition of smoke plume signatures in lidar signals resulting from experiments with small bon"res. Committee machines are associations of neural networks, each one specialising in solving part of a whole recognition problem. By associating several neural networks their capabilities are brought together in order to solve complex problems. Two types of neural networks were used in the committee machines: single-layer perceptrons containing only one processing neuron and several input nodes, and neural networks with two processing neurons in a cascade architecture, in which one neuron is connected to the input nodes and the other is connected to the previous neuron as well as to the input nodes. Neural network training was performed by minimising the sum-of-squares cost function using a algorithm based on backpropagation which uses a conjugate gradient to "nd the descent direction and polynomial interpolation to calculate the optimal learning rate for each training epoch. In the present paper a new method, based on ROC curves, is proposed for choosing the optimal number of training epochs for the neural network. In order to avoid over"tting,

2046

A.M. Fernandes et al. / Pattern Recognition 37 (2004) 2039 – 2047

the number of training epochs should be that which presents the best ROC curve for a given Neyman–Pearson criterion, which means the ROC curve with the maximum true detection percentage for a prescribed maximum percentage of false alarms. The best committee machine, obtained on the basis of these principles and selected to lead to the lowest percentage of false alarms for a true detection percentage of 90% for a test set created by adding random noise to patterns obtained experimentally, was composed of three single-layer perceptrons and presented a true detection e:ciency of 94.4% and 0.553% of false alarms in the validation set. Acknowledgements A.M. Fernandes gratefully acknowledges Ph.D. grant SFRH/BD/2943/2000 from FundaOca˜ o para a Ciˆencia e a Tecnologia. A.B. Utkin gratefully acknowledges "nancial support from Instituto de Ciˆencia e Engenharia de Materiais e SuperfRScies. A.V. Lavrov wishes to thank INTAS for a fellowship grant. This research was partially supported by INTAS grant N.99-1634. The authors are grateful to the Portuguese Air Force for help in organising "eld experiments. References [1] R.M. Measures, Laser Remote Sensing, Wiley, New York, 1984. [2] A.V. Jelalian, Laser Radar Systems, Artech House, Boston, 1992. [3] J. Bosenberg, D. Brassington, P.C. Simon (Eds.), Instrument Development for Atmospheric Research and Monitoring, Springer, Berlin, 1997. [4] A.B. Utkin, A.V. Lavrov, L. Costa, F. Sim˜oes, R. Vilar, Detection of small forest "res by LIDAR, Appl. Phys. B 74 (1) (2002) 77–83. [5] R. Vilar, A. Lavrov, Application of lidar at 1.54 micron for forest "re detection, Proc. SPIE 3868 (1999) 473–477. [6] R. Vilar, A. Lavrov, Estimation of required parameters for detection of small smoke plumes by lidar at 1:54 m, Appl. Phys. B 71 (2000) 225–228. [7] A. Fernandes, A.B. Utkin, R. Vilar, A. Lavrov, Recognition of smoke signatures in lidar signal with a perceptron, Proceedings of the Sixth World Multiconference on Systemics, Cybernetics and Informatics, Orlando, Florida, USA, July 14–18, 2002, Image Acoustic Speech Signal Process. II IX (2002) 504. [8] A.B. Utkin, A. Fernandes, F. Sim˜oes, R. Vilar, A. Lavrov, Forest-"re detection by means of lidar, in: D.X. Viegas (Ed.), Proceedings of IV International Conference on Forest Fire Research, Luso, Portugal, November 18–23, 2002, p. 58.

[9] S. Haykin, Neural Networks, 2nd Edition, Prentice-Hall, London, 1999, pp. 351–391. [10] S.E. Fahlman, C. Lebiere, The cascade-correlation learning architecture, Technical Report CMU-CS-90-100, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 1990. [11] J. Principe, M. Kim, J. Fisher, Target discrimination in synthetic aperture radar using arti"cial neural networks, IEEE Trans. Image Process. 7 (8) (1998) 1136–1149. [12] A. Freking, M. Biehl, C. Braun, W. Kinzel, M. Meesmann, Receiver Operating Characteristics of Perceptrons: In3uence of Sample Size and Prevalence, Technical Report WUE-ITP-98-049, Institut fur Theoretische Physik, Universitat Wurzburg, Germany, 1998. [13] M. Scott, M. Niranjan, R. Prager, Parcel: feature subset selection in variable cost domains, Technical Report CUED/F-INFENG/TR.323, Engineering Department, Cambridge University, UK, 1998. [14] S. Haykin, Neural Networks, 2nd Edition, Prentice-Hall, London, 1999, 28pp. [15] S. Haykin, Neural Networks, 2nd Edition, Prentice-Hall, London, 1999, pp. 213–218. [16] J. Van Gorp, On Generalisation by Adding Noise, Ph.D. Thesis, Vrije Universiteit Brussel, Department of ELEC, 2000, pp. 177–196 (Chapter 7). [17] S. Knerr, L. Personnaz, G. Dreyfus, Single-layer learning revisited: a stepwise procedure for building and training a neural network, NATO Workshop on Neurocomputing, Les Arcs, France, February 1989, in: F. Fogelman, J. HRerault (Eds.), Neurocomputing, Springer, Berlin, 1990. [18] S. Knerr, L. Personnaz, G. Dreyfus, Handwritten digit recognition by neural networks with single-layer training, IEEE Trans. Neural Networks 3 (1992) 962. [19] L. Prechelt, Automatic early stopping using cross validation: quantifying the criteria, Neural Networks 11 (1998) 761–767. [20] S. Haykin, Neural Networks, 2nd Edition, Prentice-Hall, London, 1999, pp. 192, 216. [21] C. Bishop, Training with noise is equivalent to tikhonov regularization, Technical Report NCRG/4290, Neural Computing Research Group, Department of Computer Science, Aston University, Birmingham, UK, 1994. [22] L. Holmstrom, P. Koistinen, Using additive noise in backpropagation training, IEEE Trans. Neural Networks 3 (1) (1992) 24–38. [23] S. Chakrabarti, N. Bindal, K. Theagharajan, Robust radar target classi"er using arti"cial neural networks, IEEE Trans. Neural Networks 6 (3) (1995) 760–766. [24] R. Slaviero, The eIect of noise on the generalisation and classi"cation capabilities of simple image recognition arti"cial neural networks, Fourth International Symposium on Signal Processing and its Applications (ISSPA 96), Vols. 1 and 2, August 25–30, 1996, pp. 563–564.

About the Author—ARMANDO M. FERNANDES was born in Barreiro, Portugal in 1974. Received his degree, in Physics Engineering from Instituto Superior TRecnico, Lisbon, Portugal in 1997. Worked as a product technology engineer at Siemens Semiconductors in Portugal and Germany from 1997 to 1998. Worked at Philips Medical Systems as an equipment maintenance engineer from 1998 to 2000 having completed a total of 2 months of intensive training in the Netherlands and in Germany. Is presently doing a Ph.D. in Physics entitled “Forests Fire Detection By Analysis of Backscattered Laser Radiation”, at Instituto Superior TRecnico.

A.M. Fernandes et al. / Pattern Recognition 37 (2004) 2039 – 2047

2047

About the Author—ANDREI B. UTKIN was born in St. Petersburg, Russia in 1959 and received a diploma in Physics from St. Petersburg State University 1983. He obtained a Ph.D. in physics and mathematics from the St. Petersburg State University in 1986 and a diploma in software engineering from Scienti"c and Educational Centre of Leningrad Industrial Corporation on Computer Engineering and Informatics in 1989. From 1986 till 1999 he was researcher at the Institute for Laser Physics, St. Petersburg, 1999–2002 invited scientist in Instituto Superior TRecnico, Lisbon, Portugal. Since 2003 he has been working as an invited scientist in INOV—Inesc InovaOca˜ o, Lisbon, Portugal. About the Author—ALEXANDER V. LAVROV was born in Leningrad, USSR in 1947 and received a diploma in quantum mechanics from the University of Leningrad in 1971. He obtained a Ph.D. in mechanics of 3uids and plasma from the Leningrad Polytechnic Institute in 1978. From 1971 to 1980 he was scientist of Laboratory of Thermodynamics at the State Institute for Applied Chemistry in Leningrad, 1980–1994 senior scientist of Laboratory of Thermodynamics at State Institute for Applied Chemistry of St.-Petersburg, 1994–1995 Deputy laboratory chief of Laboratory of Thermodynamics of State Institute for Applied Chemistry, St.-Petersburg, 1995–1998 leading scientist of Laboratory of Thermodynamics at State Institute for Applied Chemistry, St.-Petersburg, 1998–2000 scientist at Department of Material Sciences, Instituto Superior Tecnico, Lisbon, Portugal. Since 2003 he is scientist at lidar group of INOV, Lisbon, Portugal. About the Author—RUI M. VILAR got his degree in Metallurgical Engineering in Porto University in 1973. He did his Master Degree at R the National Institute Nuclear Science and Technology, Paris, in 1978 and his Ph.D. (ThZese de Doctorat d’Etat) at Paris-Orsay University in 1983. He his presently Associate Professor in the Department of Materials Engineering, Technical University of Lisbon. Present scienti"c interests are mainly in the area of laser materials processing and laser applications. He authored more than 250 publications in these "elds.