March 24, 2004 16:26 WSPC/157-IJCIA
00115
International Journal of Computational Intelligence and Applications Vol. 4, No. 1 (2004) 77–108 c Imperial College Press
NEURAL NETWORKS AND GENETIC ALGORITHMS FOR DOMAIN INDEPENDENT MULTICLASS OBJECT DETECTION
MENGJIE ZHANG School of Mathematical and Computing Sciences, Victoria University of Wellington P. O. Box 600, Wellington, New Zealand
[email protected] VICTOR CIESIELSKI School of Computer Science and Information Technology, RMIT University GPO Box 2476v, Melbourne, Victoria, Australia
[email protected] Received 14 April 2003 Revised 3 December 2003
This paper describes a domain independent approach to multiple class rotation invariant 2D object detection problems. The approach avoids preprocessing, segmentation and specific feature extraction. Instead, raw image pixel values are used as inputs to the learning systems. Five object detection methods have been developed and tested, the basic method and four variations which are expected to improve the accuracy of the basic method. In the basic method cutouts of the objects of interest are used to train multilayer feed forward networks using back propagation. The trained network is then used as a template to sweep the full image and find the objects of interest. The variations are (1) Use of a centred weight initialization method in network training, (2) Use of a genetic algorithm to train the network, (3) Use of a genetic algorithm, with fitness based on detection rate and false alarm rate, to refine the weights found in basic approach, and (4) Use of the same genetic algorithm to refine the weights found by method 2. These methods have been tested on three detection problems of increasing difficulty: an easy database of circles and squares, a medium difficulty database of coins and a very difficult database of retinal pathologies. For detecting the objects in all classes of interest in the easy and the medium difficulty problems, a 100% detection rate with no false alarms was achieved. However the results on the retinal pathologies were unsatisfactory. The centred weight initialization algorithm improved the detection performance over the basic approach on all three databases. In addition, refinement of weights with a genetic algorithm significantly improved detection performance on the three databases. The goal of domain independent object recognition was achieved for the detection of relatively small regular objects in larger images with relatively uncluttered backgrounds. Detection performance on irregular objects in complex, highly cluttered backgrounds such as the retina pictures, however, has not been achieved to an acceptable level. Keywords: Network training; network refinement; network sweeping; evolutionary process; domain independent; object recognition; target recognition; target detection. 77
March 24, 2004 16:26 WSPC/157-IJCIA
78
00115
M. Zhang & V. Ciesielski
1. Introduction As more and more images are captured in electronic form the need for programs which can find objects of interest in a database of images is increasing. For example, it may be necessary to find all tumors in a database of X-ray images, all cyclones in a database of satellite images or a particular face in a database of photographs. The common characteristic of such problems can be phrased as “Given subpicture1 , subpicture2...subpicturen which are examples of the objects of interest, find all pictures which contain this object and the locations of all of the objects of interest”. Examples of this kind include target detection problem,1,2 where the task is to find, all tanks, trucks or helicopters in a picture. Unlike most of the current work in the object recognition area, where the task is to detect only objects of a single class,1,3,4 the aim of the work presented in this paper is to detect multiple objects of a number of different classes in a database of large pictures in one pass. The object recognition task using traditional image processing and computer vision methods5,6 usually involves the following subtasks: preprocessing, segmentation, feature extraction and classification. The main goal of preprocessing is to remove noise or enhance edges. Segmentation aims to divide an image into coherent regions. Feature extraction is concerned with finding transformations to map patterns to lower-dimensional spaces for pattern representation and to enhance class separability. The output of feature extraction, which often is a vector of feature values, is then passed to the classification step. Using these vectors, the classifier determines the distinguishing features of each object class, such that new vectors are placed in the correct class. To obtain good performance, a number of “important” specific features need to be manually determined (selected and extracted) and the classifier has to be chosen for the specific domain. In contrast, this paper focuses on the development of a domain independent method without preprocessing and segmentation for multiple class object detection. In recent years, neural networks and genetic algorithms have attracted attention as very promising methods of solving automatic target recognition and detection problems.7 – 9 A wide variety of problem domains have been shown to be amenable to being treated with these learning and adaptive techniques due to the enormous flexibility of the representation afforded. In terms of input patterns for the neural networks for object detection and recognition, two main approaches have previously been used — feature based and pixel based. In feature based approaches, various features such as brightness, colour, size and perimeter are extracted from the subimages of the objects of interest and used as inputs.3,10,11 These features are usually different and specific for different problem domains. In pixel based approaches,8,12,13 the pixel values are used directly as inputs. To avoid the disadvantages of handcrafting feature extraction programs, the approach described in this paper uses raw pixel data.
March 24, 2004 16:26 WSPC/157-IJCIA
00115
Neural Networks and Genetic Algorithms for Object Detection
79
1.1. Goals The overall goal of this paper is to determine whether domain independent object detection systems which use raw pixel data and neural and genetic algorithm learning can be built. We require that the systems find objects of different types in one pass. Furthermore we would like to characterize the kinds of problems for which such systems are likely to be successful. In particular we will examine the following approaches: (1) Train a feed forward neural network on cutouts of the objects of interest and use the trained network as a template for sweeping the large images to detect the objects of interest. We refer to this as the basic approach [BP-train]. (2) Use an alternative weight initialization procedure, centred weight initialization, which is designed to focus the learning on the objects of interest. (3) Use a genetic algorithm with mean squared error as the fitness function, instead of back propagation, for network training [GA-train]. (4) Use a two stage process in which the first step is to train the network on the cutouts using a genetic algorithm and the second stage is to refine the trained network with a genetic algorithm that uses a fitness function based on detection rate and false alarm rate [GA-train+GA-refine]. (5) As in (4) but using back propagation in step 1 [BP-train+GA-refine].
2. Background In this section we define the terminology that we use and relate it to the literature. We also examine related work in object classification and detection. 2.1. Object detection versus object classification The term object classification here refers to the task of discriminating between images of different kinds of objects. Each image contains only one of the objects of interest. The term object detection here refers to the detection of small objects in large pictures. This includes both object classification, as described above, and object localization, which gives the positions of all objects of interest in the large pictures. 2.2. Multiclass object detection In automatic object detection systems, in most cases, all the objects of interest are considered as one class of interest.8,14 These systems generally focus on finding the locations of the objects of interest and all other locations are considered nonobject or the background. In contrast, the multiple class detection problem refers to the case where there is more than one class of object and both their classes and locations must be determined. In general, multiclass object detection problems
March 24, 2004 16:26 WSPC/157-IJCIA
80
00115
M. Zhang & V. Ciesielski
are much harder than single class detection problems, and multiclass detection using a single trained program, such as a neural network, is an even more difficult problem. 2.3. Performance evaluation Performance in object detection is measured by detection rate (DR) and false alarm rate (F AR). The detection rate is the number of objects correctly reported as a percentage of the total number of real objects and false alarm rate is the number of objects incorrectly reported as a percentage of the total number of real objects. For example, there are 18 grey squares in Fig. 1 (left). A detection system looking for grey squares may report that there are 25. If 9 of these are correct the detection rate will be (9/18) × 100 = 50%. The false alarm rate will be (16/18) ∗ 100 = 88.9%. It is important to note that detecting objects in pictures with very cluttered backgrounds is an extremely difficult problem and that false detection rates of 200– 2,000% (that is, the detection system suggests that there are 20 times as many objects as there really are) are common.4,8 Also note that most research which has been done in this area so far only presents the results of the classification stage and assumes that all other stages have been properly done. However, the results presented in this paper are the performance for the whole detection problem (both the localization and classification) in a single pass. 2.4. Related work 2.4.1. Neural networks for object classification and detection Since the late 1980s, the use of neural networks in object classification and detection has been investigated in a variety of application domains. These domains include military applications,15,16 human face recognition,17 agricultural product classification,18 handwritten character recognition19 and medical image analysis.20 The types of the neural networks used include multilayer feed forward networks,21 self-organizing maps,22 higher order networks23 and ART networks.24,25 Most of the object detection work reported so far is for one-class object detection problems, where the objects in a single class in large pictures need to be detected. Work in multiple class object detection based on a single network or in one stage has not been reported so far. In this paper, we will investigate pixel based neural network approaches for multiclass object detection problems. 2.4.2. Genetic algorithms for evolving neural networks In addition to using the back propagation algorithm, genetic algorithms can be also used to train (evolve) neural networks. Related work in this area can be grouped into two approaches.
March 24, 2004 16:26 WSPC/157-IJCIA
00115
Neural Networks and Genetic Algorithms for Object Detection
81
Evolving weights in fixed networks26,27 – 30 : In this approach, the neural network architecture and learning parameters are pre-defined. The genetic algorithm is used to train the given network by evolving the weights and biases. In the genetic algorithm, the network weights and biases are encoded into a chromosome. Each chromosome is an individual member of a population and a population often has several hundred chromosomes. During the evolutionary process, the genetic operators of selection, crossover and mutation are applied to these individuals. After each generation of the process the chromosomes are applied to the network, that is, the weights and biases are set from the values which the chromosomes represent. The error rate on the training patterns is then computed and used as the fitness function for the genetic algorithm. Evolving network architectures31,32 : In this approach, genetic algorithms are used to evolve not only the network weights and biases, but also the network architecture or topology as well. References 33 and 34 present overviews and classifications of the research in the area of evolutionary design of neural network architectures. The major advantage claimed for evolutionary approaches is that the local minima problem in which gradient descent algorithms often result can be avoided. The networks evolved by genetic algorithms presented in the literature are relatively small and the process runs quite well. For the object detection problems investigated in this paper, however, the networks are very large due to the use of pixels as inputs. We will investigate whether genetic algorithms can be used to train such large networks.
3. Image Data One of our goals is to characterize the kinds of image detection problems for which our techniques are likely to be successful. To this end we have investigated three object detection problems of increasing difficulty. Example pictures and key characteristics are given in Fig. 1. 3.1. Easy pictures The first database consists of several synthetic pictures (the easy pictures), which were generated to give well defined objects against a uniform background. The pixels of the objects were generated by using a Gaussian generator based on the normal distribution with different means and variances for each class. All the objects in each class have the same size, but are located at different positions. There are three classes of objects of interest in this database: black circles (class1), grey squares (class2) and white circles (class3) against a uniform grey background (class other). The three kinds of objects were generated with different intensities. 10 and 5, 180 and 25, 230 and 20, and 140 and 0 were taken as the mean and standard deviation for class1, class2, class3 and other, respectively.
March 5, 24,2004 2004 12:1 16:26WSPC/157-IJCIA WSPC/157-IJCIA February
82
00115 p-final
M. Zhang & V. Ciesielski Neural Networks and Genetic Algorithms for Object Detection
Number of images: 10 Number of classes: 4 Input field size: 14×14 Number of objects: 240 Picture size 700x700
Number of images: 20 Number of classes: 5 Input field size: 24×24 Number of objects: 200 Picture size 640x480
Number of images: 20 Number of classes: 5 Input field size: 16×16 Number of objects: 164 Picture size 1024x1024
Easy (circles and squares)
Medium difficulty (coins)
Very difficult (retinas)
5
Fig. 1. Object Detection Problems of Increasing Difficulty. Fig. 1. Object detection problems of increasing difficulty.
the weights and biases are set from the values which the chromosomes represent. 3.2. Coin pictures The error rate on the training patterns is then computed and used as the fitness The second (the coin pictures) were intended to be somewhat harder than function for database the genetic algorithm. 31,32a CCD camera over a number of days with theEvolving easy pictures and were taken with network architectures: In this approach, genetic algorithms are relatively similar In these pictures background slightly arin used to evolve notillumination. only the network weights andthe biases, but alsovaries the network 33,34images. The brightness of objects also different areas of the image andReferences between the chitecture or topology as well. present overviews and classifications in a similar way. Theofobjects to be detected morenetwork complexarchitectures. than those in ofvaries the research in the area evolutionary design ofare neural theThe easymajor pictures, but still regular. All the objects in each class have a similar advantage claimed for evolutionary approaches is that the local size. minThey are located at arbitrary positions and with different rotations. ima problem in which gradient descent algorithms often result can be avoided. The Each evolved of the pictures contains four object classesinofthe interest, thatare is, relatively the head networks by genetic algorithms presented literature side of 5 cent (Australian) coins (class head005), the head side of 20 cent coins small and the process runs quite well. For the object detection problems investi(class head020), the tail side of 5 cent coins (class tail005) and the tail side of 20 gated in this paper, however, the networks are very large due to the use of pixels cent coins (class tail020). The background (class other) is relatively uniform — not as inputs. We will investigate whether genetic algorithms can be used to train such totally uniform because of the different lighting conditions and camera positions. large networks. 3.3. Retina pictures 3. Image Data The retina pictures (database 3) were taken by a professional photographer with One of our goals isat toacharacterise the kinds image detection problemsand for which special apparatus clinic. Compared withofthe previous two databases other our techniques are likely to be successful. To this end we have investigated three databases used in automatic target detection problems such as the recognition of object detection problems of increasing difficulty. Example pictures and key charregular, man-made small objects, the detection problems in this database are very acteristics are images given incontain Figurevery 1. irregular and complex objects of varying sizes, in difficult. The several classes against a highly cluttered background. two object classes of interest: haemorrhages (class haem) and micro3.1.There Easyare pictures aneurisms (class micro). To give a clearer view of representative samples of the The first database consists severalone synthetic (the easy pictures), which target objects in the retina of pictures, sample pictures piece of these pictures is presented were generated to give well defined objects against a uniform background. The pixels
March 24, 2004 16:26 WSPC/157-IJCIA February 5, 2004 12:1 WSPC/157-IJCIA
00115 p-final
Neural Networks and Genetic Algorithms for Object Detection Neural Networks and Genetic Algorithms for Object Detection
7
83
enlarged view of part of the retina pictures. Fig.Fig. 2. 2.AnAnenlarged view of part of the retina pictures.
in Fig. 2. In this figure, haemorrhage and micro-aneurism examples are labeled using white surrounding squares. These objects are not only located in different places, but the sizes of the objects in each class are different as well, particularly for the haemorrhages. In addition, there are also other objects of different classes, such as veins (class vein) with 3.4. Training and testing subsets different shapes and the retina “edges” (class edge). The backgrounds (class other) To avoidsome confusion, of terms related thebright, image data. set ofparts are varied, partswe aredefine quitea number black, some parts are to very and A some images in a database constitutes an image data set for a particular problem domain. are highly cluttered.
In this paper, it is randomly split into two parts: a detection training set, which is used to learn a detector, and a detection test set, which is used for measuring detection Cutouts refer to subimages which are cut out from a 3.4. object Training andperformance. testing subsets detection training set. Some of these subimages contain examples of the objects of To avoid confusion, define a number These of terms related the image data. set of interest and somewe contain background. cutouts formtoa classification data A set, images in aisdatabase image data set fortraining a particular problem domain. which randomly constitutes split into twoan parts: a classification set used for network training, andit aisclassification for network testing in object classification. In this paper, randomly test splitsetinto two parts: a detection training set,An which input refers to a square in the large images. used as moving is used tofield learn a detector, and a detection testThis set, iswhich is aused forwindow measuring fordetection the network sweeping (detection) The size of thewhich input field is the same object performance. Cutoutsprocess. refer to subimages are cut out from a as that of the cutouts for network training. The relationships between the various detection training set. Some of these subimages contain examples of the objects of data sets are shown in figure 3.
interest and some contain background. These cutouts form a classification data set, which is randomly split into two parts: a classification training set used for network training, and a classification test set for network testing in object classification. An input field refers to a square in the large images. This is used as a moving window for the network sweeping (detection) process. The size of the input field is the same
March 24, 2004 16:26 WSPC/157-IJCIA
00115
February 5, 2004 12:1 WSPC/157-IJCIA
84
8
p-final
M. Zhang & V. Ciesielski
as that of the cutouts for network training. The relationships between the various data sets are shown in Fig. 3.
Mengjie Zhang and Victor Ciesielski
Image Data Set (Entire images)
Random split
Detection Training Set (Entire images)
Detection Test Set (Entire images)
Generate cutouts
Classification Data Set (Cutouts) Random Split
Classification Training Set (Cutouts)
Fig. 3.
Classification Test Set (Cutouts)
Relationships between classification and detection data sets.
Fig. 3.
Relationships between classification and detection data sets.
4. The Basic Approach 4. The Basic Approach
4.1. Overview
4.1. Overview
An overview of the basic approach 35 is presented in figure 4. It consists of the An overview of the basic approach35 is presented in Fig. 4. It consists of the following following main main steps:steps: (1) Assemble a database of pictures in which the locations and classes of all the objects of interest are manually determined. Divide these images into two sets: a detection trainingCutouts set and a detection test set. (Classification Trainingsize Set) (n) of a square which will cover all objects of (2) Determine an appropriate interest and form the input field of the network. (3) Generate a classification data set by cutting out squares of size n × n from the detection training set.Training Include examples which are centred on the objects of Network interest and examples of background. (4) If rotation invariance is required, generate new rotated examples from the cutouts. Randomly split the cutouts into a classification training set and a Entire Images (Detection Test Set)
Trained Network
Object Detection
4. The Basic Approach
March 24, 2004 16:26 WSPC/157-IJCIA
00115
4.1. Overview An overview of the basic approach35 is presented in figure 4. It consists of the following main steps: Neural Networks and Genetic Algorithms for Object Detection
85
Cutouts (Classification Training Set)
Network Training
Entire Images (Detection Test Set)
Trained Network
Object Detection
Detection Results
Fig. 4.
An overview of the basic approach (BP-train).
Fig. 4.
An overview of the basic approach (BP-train).
classification test set. (5) Determine the network architecture. A three layer feed forward neural network is used in this approach. The n × n pixel values form the inputs of a training pattern and the classification is the output. The number of hidden nodes is empirically determined. (6) Train the network by the backward error propagation algorithm on the classification training data. Test on the classification test set to measure the object classification performance. (7) Using search, find a threshold which results in all objects in the detection training set being found with the smallest false alarm rate. (8) Use the trained network as a moving window template5 to detect the objects of interest in the detection test set. If the output of the network for a class exceeds the threshold then report an object of that type at the current location. (9) Evaluate the object detection performance of the network by comparing the classes and locations detected with the known classes and locations in the detection test set and calculating the detection rate and the false alarm rate. In the remainder of this section, we describe the network training, testing and sweeping procedure and present the object classification and object detection results.
March 24, 2004 16:26 WSPC/157-IJCIA
86
00115
M. Zhang & V. Ciesielski
4.2. Object classification 4.2.1. Network training We use the backward error propagation algorithm21 with online learning and the fan-in factor36 to train the networks. In online learning (also called the stochastic gradient procedure) weight changes are applied to the network after each training pattern. The fan-in is the number of elements that either excite or inhibit a given node of the network. The weights are divided by the number of inputs of the node to which the connection belongs before network training and the size of the weight change of a node is updated in a similar way. Training is terminated when the classification accuracy in the classification training set reaches a pre-defined percentage (100% for the easy and coin pictures and 65% or 75% for the retina pictures). When training is terminated, the trained network weights and biases are saved for the use in network testing or subsequent resumption of training. 4.2.2. Network testing The trained network is then applied to the classification test set. If the test performance is reasonable, then the trained network is ready to be used for object detection. Otherwise, the network architecture and/or the learning parameters need to be changed and the network re-trained, either from the beginning or from a previously saved, partially trained network. During network training and testing, the classification is regarded as correct if the output node with the largest activation value corresponds to the desired class of a pattern. 4.3. Object detection After network training is successfully done, the trained network is used to detect the classes and locations of the objects of interest in the detection test set, which is not used in any way for network training. Classification and localization are performed by the procedures: network sweeping, finding object centres and object matching. 4.3.1. Network sweeping During network sweeping, the trained neural network is used as a template matcher, and is applied, in a moving window fashion, over the large pictures to detect the objects of interest. The template is swept across and down these large pictures, pixel by pixel in every possible location. After the sweeping is finished, an object sweeping map for each object class of interest will be produced. An object sweeping map contains the outputs of the network for each pixel position in the large image and can be visualized as a grey level image. Sample object sweeping maps for class1, class2 and class3 together with the original image for the easy detection problem are shown in Fig. 5. During
March 24,5,2004 WSPC/157-IJCIA February 2004 16:26 12:1 WSPC/157-IJCIA
00115 p-final
Neural Networks and Genetic Algorithms for Object Detection Neural Networks and Genetic Algorithms for Object Detection
Original picture
Class1-sweeping-map
Class2-sweeping-map
87 11
Class3-sweeping-map
Fig. 5. Sample object sweeping maps in object detection. Fig. 5. Sample object sweeping maps in object detection.
the of process, the objects based on match the corresponding objectinsweeping map. The the centres sweeping if there is no between a square a detecting picture centre-finding algorithm is shown in figure 6. and the template, then the neural network output is 0, which corresponds to black If two or moremaps; object centres for different classes theonsame position in the sweeping partial match corresponds to at grey the centre of are the found, object the decision will be made according to the network activations at this position. Fora and best match is close to white. The object sweeping map can be used to get example, if the centre-finding algorithm finds one object centre for class2 and one qualitative indication of how accurate the detection step is likely to be. Figure 5 for class3that at position (260, 340) objects and thewill activations for the three classes ofbut interest reveals class1 and class3 be detected very accurately, there and the background at this position are (0.27, 0.57, 0.93, 0.23), then the object for will probably be errors in class2. class3 will be considered the detected object at this position since the activation for thisFinding class is the largest one. 4.3.2. object centres We developed a centre-finding algorithm to find the centres of all objects detected 4.3.3. Object matching by the trained network. For each class of interest, this algorithm is used to find the centres of thecompares objects based the corresponding object map. The Object matching all theonobject centres reported bysweeping the centre finding centre-finding is shown in Fig. 6. centres and reports the number of algorithm withalgorithm all the desired known object objects correctly detected. Here, we allow location error of T OLERAN CE pixels in the x and y directions. We have used a value of 4 for T OLERAN CE. For For eachifobject sweeping map: example, the coordinates of a known object are (21, 19) and the coordinates of a 1 Set object a threshold for the class Sec. 4.3.4). detected are (22, 21), we (see consider that the object has been correctly located. 2 Set all of the values in the sweeping map to zero, if they are less than the threshold. 3 Search for the largest value, save the corresponding position (x, y), and label this 4.3.4. position Choice as of an thresholds object centre. 4 Set all values in the square input field of the labeled centre (x, y) to zero. During the object detection process, various thresholds result in different detection 5 Repeat step 3 and step 4 until all values in the object sweeping map are zero.
results. The higher the threshold, the fewer the objects that can be detected by the Fig. 6.
Centre-finding algorithm.
For each object sweeping map:
If more object forsection different classes at the same position are found, 1 two Set aor threshold for thecentres class (see 4.3.4). the 2decision will made according thetonetwork activations at this position. For Set all of thebe values in the sweepingtomap zero, if they are less than the threshold. example, if the centre-finding algorithm finds one object centre for class2 3 Search for the largest value, save the corresponding position (x, y), and labeland thisone position as an object centre. for class3 at position (260, 340) and the activations for the three classes of interest Setbackground all values in the square input field the labeled y) tothen zero. the object for and4the at this position areof(0.27, 0.57, centre 0.93, (x, 0.23), class3 will be considered detected object thissweeping positionmap since 5 Repeat step 3 and step the 4 until all values in the at object are the zero.activation for this class is the largestFig. one.6. Centre-finding algorithm.
March 24, 2004 16:26 WSPC/157-IJCIA
88
00115
M. Zhang & V. Ciesielski
4.3.3. Object matching Object matching compares all the object centres reported by the centre-finding algorithm with all the desired known object centres and reports the number of objects correctly detected. Here, we allow location error of T OLERAN CE pixels in the x- and y-directions. We have used a value of 4 for T OLERAN CE. For example, if the coordinates of a known object are (21, 19) and the coordinates of a detected object are (22, 21), we consider that the object has been correctly located.
4.3.4. Choice of thresholds During the object detection process, various thresholds result in different detection results. The higher the threshold, the fewer the objects that can be detected by the trained network, which results in a lower detection rate but also a lower false alarm rate. Similarly, the lower the threshold selected, the higher the detection rate and the higher the false alarm rate. Thus there is a trade-off between the detection rate and the corresponding false alarm rate. Ideally there will be a threshold which gives 100% detection with 0% false alarms. If such a threshold cannot be found we use the one that gives 100% detection with fewest false alarms as the best detection result. The threshold can be found by exhaustive search. We use the following heuristic procedure to speed up the search: (1) Initialize a threshold (T ) to 0.7, apply the centre-finding algorithm and object matching process and calculate the detection rate (DR) and the corresponding false alarm rate (F AR) to the detection training set. (2) If the DR is less than 100%, decrease the T and calculate the new DR and F AR. Repeat this to obtain all the possible DRs and F ARs until a new DR reaches 100% or a new T is less than or equal to 0.40. (3) From the point of step 1, if the F AR is not zero, increase T in order to obtain a new point (a DR with its corresponding F AR). Repeat this procedure until either the F AR is zero, or DR is zero, or the T is greater than or equal to 0.999. The constants 0.7, 0.40 and 0.999 were empirically determined and worked well for all image databases. To illustrate the relationship between the detection rate/false alarm rate and the threshold selection, the detection results of one trained network for class2 in the easy pictures are presented in Table 1.
4.4. Results We first give the results for the network training on the cutouts and then the object detection performance.
March 24, 2004 16:26 WSPC/157-IJCIA
00115
Neural Networks and Genetic Algorithms for Object Detection Table 1.
89
Object detection results for class2 in the easy pictures with different thresholds. Object Classes
Easy Pictures (Basic Method)
Class2
Threshold
0.54
0.56
0.57
0.595
0.625
0.650
0.673
0.700
Detection Rate (%)
100
96.67
93.33
90.00
86.67
83.33
80.00
76.67
False Alarm Rate (%)
90.5
35.1
13.8
11.9
11.2
8.1
5.6
3.7
Threshold
0.725
0.747
0.755
0.800
0.835
0.865
···
0.940
Detection Rate(%)
73.33
70.00
66.67
63.33
60.00
56.67
···
0
3.1
2.5
2.3
1.0
0.25
0
0
0
False Alarm Rate (%)
Table 2. Results of network training and testing on cutouts, 15 runs (SD: standard deviation).
Size of input field Training set size Test set size No. of input nodes No. of output nodes No. of hidden nodes Learning Rate Momentum Training epochs mean Training epochs SD Training accuracy % Test accuracy % mean Test accuracy % SD
Easy
Coins
Retinas
14 60 180 196 4 4 0.5 0 199 18 100 100 0
24 100 100 576 5 3 0.5 0 234 65 100 100 0
16 100 61 256 5 4 1.5 0 475 132 75 71 3
4.4.1. Object classification results The results for network training on the cutouts are shown in Table 2. From the table it can be seen that there is large variation in the number of epochs of training and testing needed and that the number of epochs increases with increasing complexity of the objects, as would be expected. The number of hidden nodes was determined empirically by finding the smallest number which gave successful training, however, the performance was quite robust for up to 10 hidden nodes. Also the easy and coin objects can be classified without error but not so for the retina objects. For the easy and coin objects training was terminated when all of the training examples were correctly classified. For the retina objects this could not be achieved and training was terminated when 75% of the training examples were correctly classified.
March 24, 2004 16:26 WSPC/157-IJCIA
90
00115
M. Zhang & V. Ciesielski Table 3.
Detection results on detection test set, 15 runs.
Size of input field Network architecture DR for class1 (%) FAR for class1 (%) DR for class2 (%) FAR for class2 (%) DR for class3 (%) FAR for class3 (%) DR for class4 (%) FAR for class4 (%)
Easy
Coins
Retinas
14 196-4-4 (black circles) 100 0 (grey squares) 100 91.2 (white circles) 100 0
24 576-3-5 (head005) 100 0 (tail005) 100 0 (head020) 100 182 (tail020) 100 37.5
16 256-5-4 (haem) 74 2859 (micro) 100 10104
4.4.2. Object detection results This section describes the detection performance of the basic approach on the three detection problems. The 15 networks from the runs described in the previous section were used in the object detection step of the methodology. The results are shown in Table 3. These are averages over 15 training and detection runs. On the easy pictures the basic approach always achieved a 100% detection rate and a corresponding zero false alarm rate for class1 (black circles) and class3 (white circles). However, this could not be achieved for class2 (grey squares). At a detection rate of 100% the false alarm rate was 91.2%. On the coin pictures, in each run it was always possible to find a threshold for the network output for class head005 and tail005 which resulted in detecting all of the objects of these classes with no false alarms. However, detecting classes head020 and tail020 was a relatively difficult problem with this method. Although all the objects in these two classes were correctly detected (100% detection rate), the neural networks produced some false alarms. The average false alarm rates for the two classes at a 100% detection rate were 182% and 37.5% respectively. Compared with the performance of the easy and coin pictures, the results on the very difficult retina pictures are disappointing. The best detection rate for class haem was 73.91% with a corresponding false alarm rate of 2,859%. Even at a detection rate of 50%, the false alarm rate was still quite high (about 1,800%). All the objects of class micro were correctly detected (a detection rate of 100%) but with a false alarm rate of 10,104%. By adjusting the thresholds the false alarm rate could be reduced but only at the cost of a decrease in the detection rate. 4.5. Discussion The experimental results showed that this approach performed very well for detecting a number of simple and regular objects against a relatively uniform background. It performed poorly on the detection of class haem and class micro objects in the retina pictures. As expected, the performance degrades when the approach is ap-
March 24, 2004 16:26 WSPC/157-IJCIA
00115
Neural Networks and Genetic Algorithms for Object Detection
91
plied to detection problems of increasing difficulty. The remainder of the paper examines ways of improving detection performance, in particular how to lower the false alarm rate when a 100% detection rate can be achieved, as for the gray squares and head020 and tail020. Three lines of investigation will be pursued: (1) Some networks perform better than others. This indicates that the starting point, that is, the initial weights, affect detection performance. Is there a way to initialize the weights that can give better performance? (2) In the basic approach, the network is trained by the backward error propagation algorithm. Can improved detection performance be obtained if a genetic algorithm replaces the backward error propagation algorithm for network training? The genetic algorithm is expected to search different areas of weight space and perhaps give better networks. (3) In the basic approach, the network is trained on the cutouts of the objects and the trained network was directly applied to the entire images in the detection test set. Is there any way to improve the trained network by using detection performance on the full training images?
5. Centred Weight Initialization In this section, we introduce a centred weight initialization method, which is expected to improve both object classification and object detection performance. The intuition behind this idea is twofold: • In general neural approaches to pattern classification, the inputs are usually independent. Accordingly, the weights are initialized randomly, which usually results in reasonable network training speed and accuracy. However, in image data, adjacent pixels are clearly not independent. Pixels that are adjacent in an object are very likely to have similar intensities or colours. Perhaps the weights should be initialized to reflect this. • The pixels in the centre of the input field are more important in classifying the object than the ones at the periphery. Perhaps the weights should be initialized to reflect this. 5.1. The centred weight initialization method In the basic method, three layer feed forward neural networks are used and the weights in a network are initialized with small random floating point numbers between −0.5 and +0.5. This results in a pattern of weights between the input layer and a hidden node, an input-hidden weight matrix, as shown in the Hinton diagram in Fig. 7(a). In this figure the filled squares represent positive weights and the outline squares represent negative weights, while the size of the square is proportional to the magnitude of the weight. To facilitate visualization the weights are shown
March 24, 2004 16:26 WSPC/157-IJCIA
00115
February 5, 2004 WSPC/157-IJCIA p-final p-final February 5, 2004 12:112:1 WSPC/157-IJCIA 92
M. Zhang & V. Ciesielski
as a parallel array to the input field. Thus weight(i, j) in Fig. 7(a) corresponds to pixel(i, j) in the input field. Figure 7(b) shows a matrix with the centred initial weights. The centred weight initialization method is described as follows: Neural Networks and Genetic Algorithms for Object Detection Neural Networks and Genetic Algorithms for Object Detection 15
(a) Random Initialisation (a)(a) Random Initialisation Random Initialization
15
(b)(b)Centred Initialisation (b) Centred Initialisation Centred Initialization
Fig. Fig. 7. Sample initial input-hidden weights. (a) Random initialinitial weights; (b) Centred initialinitial 7. Sample initial input-hidden weights. (a) Random weights; (b) Centred Fig. 7. Sample initial input-hidden weights. (a) Random initial weights; (b) Centred initial weights. weights. weights.
• In• (1) general neural approaches to pattern classification, the inputs are usually In Set general neural approaches classification, thethe inputs are usually the parameter max weightto forpattern the central weight. Obtain gap between independent. the weights aregap) initialised randomly, usually independent. Accordingly, the(weight weights areaccording initialised which usually the two Accordingly, neighbouring weights to randomly, Eq (1).which results in reasonable network training speed and accuracy. However, in image results in reasonable network training speed and accuracy. However, in image weight gap = 2 × max weight / size square . (1) data, adjacent pixels are clearly not independent. Pixels that that are adjacent in anin an data, adjacent pixels are clearly not independent. Pixels are adjacent For input-hidden matrix, initialize the weights according to steps object are each very likely to have similar intensities orallcolours. Perhaps the weights object are very likely toweight have similar intensities or colours. Perhaps the weights 2 to 4. should be initialised to reflect this.this. should be initialised to reflect Calculate magnitude theinput four weights corresponding the four • The pixels in the centre of the fieldcentral are more important in classifying the the • (2) The pixels inthe the centre ofofinput the field are more important in to classifying central pixels of the object according to Eq. (2). Because the width and the object than the ones at the periphery. Perhaps the weights should be initialised object than the ones at the periphery. Perhaps the weights should be initialised height (both equal to size square) of a training square that we use are even to reflect this.this. to number reflect of pixels, there are four central pixels in the training square. cen weight =method max weight + 5.1.5.1. TheThe centred weight initialisation centred weight initialisation method
(2)
is a small random number generated from a normal distribution as in Eq. (3):
In the basicbasic method, threethree layerlayer feed feed forward neural networks are used and and the the In the method, forward neural networks are used = normalized gaussian(µ, σ) . (3) beweights in ain network are initialised withwith smallsmall random floating pointpoint numbers beweights a network are initialised random floating numbers tween -0.5 -0.5 and +0.5. results in a in pattern weights between input layer and and tween and +0.5. This results a pattern of weights the input layer In order toThis make very small we set of µ to zero, and σbetween to athe very small number a hidden node, angap/30. input-hidden weight matrix, as shown in the diagram in in weight a hidden node, an input-hidden weight matrix, as shown in Hinton the Hinton diagram figure 7(3) (a). Ineach this figure the filled squares represent positive weights and the out(weight(i, j)) other than the central ones, calculate the and distance figure 7For (a). In weight this figure the filled squares represent positive weights the outof the corresponding pixel (pixel(i, j)) from the nearest central pixel in the line line squares represent negative weights, while the size of the square is proportional squares represent negative weights, while the size of the square is proportional object according to Eq. (4). to the magnitude of the To facilitate visualisation the weights are shown as as to the magnitude of weight. the weight. To facilitate visualisation the weights are shown p 2 2 a parallel array to the input field. Thus weight(i, j) in figure 7 (a) corresponds to to distance(i, = (i − x) + (j . (4) a parallel array to the input field.j)Thus weight(i, j)−iny)figure 7 (a) corresponds pixel(i, j) inj)the input field.field. pixel(i, in the input Figure 7 (b) shows a matrix withwith the centred initial weights. The The centred weight Figure 7 (b) shows a matrix the centred initial weights. centred weight initialisation method is described as follows: initialisation method is described as follows: weight for the weight. Obtain the gap (1) (1) Set Set the the parameter maxmax parameter weight for central the central weight. Obtain the between gap between gap) according to equation 1. the the two two neighbouring weights (weight neighbouring weights (weight gap) according to equation 1. weight gap = * max weight / size square (1) (1) weight gap2 = 2 * max weight / size square For each input-hidden weight matrix, initialise all the weights according to steps
March 24, 2004 16:26 WSPC/157-IJCIA
00115
Neural Networks and Genetic Algorithms for Object Detection
93
Here, (x, y) denotes the coordinates of the nearest central pixel and (i, j) is the position of the pixel in the training square. (4) Calculate the magnitude of each weight according to Eq. (5): weight(i, j) = max weight − distance(i, j) ∗ weight gap + .
(5)
If weight (i, j) < 0, then weight(i, j) = 0. In this way, all corresponding weights in the different input-hidden weight matrices are slightly different due to the small random number . (5) For initialization of the hidden-output weights and all the biases, the standard random weight initialization method is used. The algorithm ensures that weights are largest at the centre and decrease uniformly to the perimeter as shown in Fig. 7(b). Due to the use of the random number the initial weights will be different for each run. 5.2. Results 5.2.1. Object classification results The centred weight initialization method gave reduced training epochs and reduced mean squared error over a large range of values for max weight. In addition the variation in number of training epochs over 15 runs was considerably smaller than for the basic method. These results are shown in Table 4. On the coins database, for example, a network of 576-3-5 shows improved performance for 0.024 < max weight < 0.09. There is an average decrease in training epochs of 27.15% and 19.23% in test mean squared error. There does not appear to be a relationship between problem difficulty and the amount of improvement. Unfortunately there does not appear to be a reliable way of choosing the best value for max weight. However, as suggested earlier, the major problem in these kinds of object detection problems is a very high number of false alarms. If the centred weight method can lower this number significantly then a short search for a good max weight is a small price to pay. The next subsection compares the detection performance of the two weight initialization methods. 5.2.2. Object detection results The object detection results for networks initialized with the centred weights method are given in Table 5. The detection rates are the same in each case but the false alarm rate has dropped significantly in each case. In particular the class tail020 can now be detected without any false alarms and the false alarm rate for class micro has dropped from 10,104% to 2,903%. 5.3. Analysis of weights This section investigates the network internal behaviour through visual analysis of the weights in trained networks. For presentation convenience, we use two trained
March 24, 2004 16:26 WSPC/157-IJCIA
94
00115
M. Zhang & V. Ciesielski Table 4. Summary of the improvement in training time and test performance of the centred weight initialization method over the random weight initialization. Database
Network Arch.
Range of Centred Initial Parameter (max wei/wei gap)
Improvement of Training Speed (µ/σ)
Improvement of Test MSE (µ/σ)
Circles and Squares (Easy)
196-4-4
> < >
0.024/0.002 and 0.030/0.0025 and < >
1 means that detection rate is more important that false alarm rate, and Wd < 1 means that false alarm rate is more important than detection rate. Under this design, the smaller the fitness, the better the detection performance. The best case is zero fitness when the detection rates for all classes are 100% and the false alarm rates are zero, that is, all the objects of interest are correctly detected by the network without any false alarms. 6.3. Results We first give the classification and detection results for the GA-train method and compare them with the back propagation results. We then give the detection results after the refinement step has been applied to networks resulting from both training methods.
March 24, 2004 16:26 WSPC/157-IJCIA
100
00115
M. Zhang & V. Ciesielski
6.3.1. Object classification results for GA-train This subsection presents the object classification performance of the GA-train method on the cutouts of the three databases. The number of evaluations and the test mean squared error are compared with those obtained by the BP-train algorithm in the basic approach. For the purpose of comparison with the BP-train algorithm, a single evaluation of an individual network in the population of the GA-train algorithm is considered to be computationally equivalent to a single forward or backward pass of the BPtrain algorithm. A generation requires evaluation of each member of the population for the entire classification training set, so in terms of the BP-train algorithm a generation is typically computationally equivalent to some number of epochs. The number of epochs that a generation is equivalent to varies with the number of individuals in the population. For example, for the GA-train algorithm with a population of 100 networks which takes 20 generations to train, each network will have been evaluated 20 times for any training pattern, so there will be 20 × 100 = 2, 000 network evaluations per training pattern. For the BP-train algorithm, a training period of 500 epochs will have resulted in 500 × 2 = 1, 000 network evaluations per training pattern. This takes into consideration the forward and backward passes of the backward propagation technique. For ease of comparison, the number of trials for the GA-train algorithm and the number of epochs for the BP-train algorithm are converted into the number of network evaluations per training pattern. The main parameters used in the GA-train algorithm for the three problems are shown in Table 6. Table 7 shows a comparison of the network training and testing results for the two algorithms for the three databases. For the easy and coins databases the same accuracy was achieved by GA-train as the basic approach, but with considerably more computation. For the retina data base the same accuracy could not be achieved; to achieve a training accuracy of 65%, 31,350 evaluations were required. More evaluations did not lead to a significantly higher accuracy. In contrast BP-train achieved a training accuracy of 75% after 950 evaluations. However, the real goal is object detection so we persevere with GA-train.
6.3.2. Object detection results for GA-train The object detection results for GA-train on the three databases are shown in Table 8. The detection results are disappointing. While the detection rate for the easy and coins pictures remained at 100% there are now a significant number of false alarms for each class. The detection and false alarm rates for the retina pictures are considerably worse. The alternative training method is not as good as back propagation. The expectation that the GA will search a bigger portion of weight space and find a better network has not been realized.
March 24, 2004 16:26 WSPC/157-IJCIA
00115
Neural Networks and Genetic Algorithms for Object Detection
101
Table 6. The main parameters used for the network training in the GA-train algorithm for the three detection problems. Parameters
Easy Pictures
Coin Pictures
Retina Pictures
200 95% 5% 100 ±0.06 ±0.02 196-4-4
200 95% 5% 100 ±0.01 ±0.004 576-3-5
300 90% 10% 150 ±0.004 ±0.002 256-4-5
Population size Crossover rate Mutation rate Generations Delta1 range Delta2 range Net architecture
Table 7.
GA-train compared to BP-train on cutouts, 15 runs. Easy
Coins
Retinas
Size of input field Training set size Test set size No. of input nodes No. of output nodes No. of hidden nodes Learning rate Momentum
14 60 180 196 4 4 0.5 0
24 100 100 576 3 5 0.5 0
16 100 61 256 5 4 1.5 0
BP-train BP-train BP-train BP-train BP-train
Training evaluations mean Training evaluations SD Training accuracy % Test accuracy % mean Test accuracy % SD
398 36 100 100 0
469 132 100 100 0
950 264 75 71 3
GA-train GA-train GA-train GA-train GA-train
Training evaluations mean Training evaluations SD Training accuracy % Test accuracy % mean Test accuracy % SD
5146 794 100 100 0
9379 1031 100 100 0
31350 4021 65 63 3
6.3.3. Object detection results with GA-refine We now look at the effect of the refinement step. As mentioned earlier, this involves taking the trained network and using a genetic algorithm to maximize the detection rate and minimize the false alarm rate by further training on the full training images. We have applied the refinement step to networks trained by both GA-train and BP-train. The GA parameters and the detection results for GA-train+GA-refine are given in Table 9 and those for BP-train+GA-refine in Table 10. The parameter values were carefully selected through empirical search. For both cases, the evolutionary process for network refinement was terminated when either the problem was solved or the number of generations reached 50. It is evident from Tables 8 and 9 that the refinement step has improved detection performance over GA-train. For example the false alarm rate for class1 in the easy
March 24, 2004 16:26 WSPC/157-IJCIA
102
00115
M. Zhang & V. Ciesielski Table 8.
Detection results for GA-train on detection test set, 15 runs.
Size of input field Network architecture Population size Crossover rate Mutation rate Max generations Delta1 range Delta2 range DR for class1 (%) FAR for class1 (%) DR for class2 (%) FAR for class2 (%) DR for class3 (%) FAR for class3 (%) DR for class4 (%) FAR for class4 (%)
Easy
Coins
Retinas
14 196-4-4 200 95% 5% 50
24 576-3-5 200 95% 5% 50
16 256-5-4 500 90% 10% 50
±0.08 ±0.05
±0.04 ±0.02
±0.02 ±0.005
(black circles) 100 80 (grey squares) 100 839 (white circles) 100 273
(head005) 100 357 (tail005) 100 19 (head020) 100 333 (tail020) 100 778
(haem) 50 4000 (micro) 90 5606
Table 9. Detection results using GA-train+GA-refine on detection test set, 15 runs.
Size of input field Network architecture Population size Crossover rate Mutation rate Max generations Delta1 range Delta2 range Wd DR for class1 (%) FAR for class1 (%) DR for class2 (%) FAR for class2 (%) DR for class3 (%) FAR for class3 (%) DR for class4 (%) FAR for class4 (%)
Easy
Coins
Retinas
14 196-4-4 200 95% 5% 50
24 576-3-5 200 95% 5% 50
16 256-5-4 500 90% 10% 50
±0.08 ±0.05 0.4
±0.04 ±0.02 1.5
±0.02 ±0.005 0.67
(black circles) 100 0 (grey squares) 100 663 (white circles) 100 144
(head005) 100 125 (tail005) 100 0 (head020) 100 37 (tail020) 100 215
(haem) 82 2298 (micro) 100 5055
pictures has fallen from 80% to 0. The detection rates for haem and micro have both risen while the false alarm rates have fallen. It appears that the refinement step is effective in both raising the detection rate and lowering the false alarm rate. A similar improvement was evident when the refinement procedure was applied to networks generated by BP-train. This can be seen from Tables 3 and 10. In fact this combination achieves the best overall detection performance. All the classes in the easy and coin databases are detected without error and the best detection and false alarm rates are achieved for the retina pictures.
26
Mengjie Zhang and Victor Ciesielski
Table 10. Detection results using BP-train+GA-refine on detection March 24, 2004 16:26 WSPC/157-IJCIA 00115 Table 10. Detection results using BP-train+GA-refine on detection test test set, set, 15 runs. 15 runs.
Easy Coins Retinas Easy Coins Retinas of input SizeSize of input fieldfield 14 14 24 24 16 16 Network architecture 196-4-4 576-3-5 256-5-4 Network architecture 196-4-4 576-3-5 256-5-4 Population Population sizesize 200 200 200 200 500 500 Crossover Crossover raterate 95%95% 95%95% 90%90% Mutation Mutation raterate 5% 5% 5% 5% 10%10% Max Generations 50 50 Neural Networks and Genetic Algorithms for Object Detection Max Generations 50 50 50 50 Delta1 Range ±0.08 ±0.04 ±0.02 Delta1 Range ±0.08 ±0.04 ±0.02 Delta2 Range ±0.05 ±0.02 ±0.005 Delta2 Range ±0.05 ±0.02 ±0.005 Table on detection test 0.67 Wd 10. Detection results using BP-train+GA-refine Wset, 0.4 0.4 1.5 1.5 0.67 d 15 runs. for class1 (black circles) (head005) (haem) DRDR for class1 (%)(%) (black circles) 100 100(head005) 100 100 (haem) 82 82 FAR for class1 0 FAR for class1 (%)(%) 0 0 21562156 Easy Coins 0 Retinas for class2 (grey squares) (tail005) (micro) DRDR for class2 (%)(%) (grey squares) 100 100 (tail005) 100 100(micro) 100 100 FAR for class2 Size of input field 14 24 16 27062706 FAR for class2 (%)(%) 0 0 0 0 DR forarchitecture class3 (white circles) (head020) Network 196-4-4 576-3-5100 100 256-5-4 DR for class3 (%)(%) (white circles) 100 100(head020) Population size(%)(%) 200 200 500 FAR for class3 FAR for class3 0 0 0 0 Crossover rate 95% 95% 100 100 90% DR for class4 (tail020) DR for class4 (%)(%) (tail020) Mutation 5% 5% FAR forrate class4 FAR for class4 (%)(%) 0 0 10% Max generations 50 50 50 Delta1 range
±0.08
±0.04
103
±0.02
Delta2 range ±0.05 ±0.02 ±0.005 at all levels of tails of 20 cent coins in figure As from figure, tails of 20 cent coins in figure 12. 12. As cancan be be seenseen from thisthis figure, at all levels of Wd 0.4 1.5 0.67 detection rate, the BP-train+GA-refine method did not produce alarms, detection rate, the BP-train+GA-refine method did not produce anyany falsefalse alarms, DR for class1 (%) (black circles) 100 (head005) 100 (haem) 82 GA-train+GA-refine method always resulted in 0fewer alarms than the GAthethe GA-train+GA-refine method always falsefalse alarms FAR for class1 (%) 0 resulted in fewer 2156than the GAtrain method GA-train always resulted in (tail005) more false alarms than BP-train. DR and for and class2 (%) always (grey squares) 100 100 (micro) 100 train method GA-train resulted in more false alarms than BP-train. FAR13forshows class2 (%) 0 0classes haem 2706 Figure the results in ROC curves for and micro, which Figure 13 shows the results in ROC curves for classes haem and micro, which DR for class3 (%) (white circles) 100 (head020) 100 show same as the other databases except that GA-train method show thethe same pattern the other two0two databases except the the GA-train method FAR for pattern class3as (%) 0 that DRfewer forfalse class4 (%) (tail020) produced false alarms than BP-train method at most detection rates produced fewer alarms than thethe BP-train method at100 most detection rates for for FAR for class4 (%) 0 detecting class micro. However, it could not achieve 100% detection rate. detecting class micro. However, it could not achieve 100% detection rate.
100 100
80
60
40
40
20
20
0 0
BP-train BP-train BP-train + GA-refine BP-train + GA-refine GA-train GA-train GA-train + GA-refine GA-train + GA-refine
0
100 100
200 200 300 300 400 400 500 500
Detection Rate (%)
60
80
80
Detection Rate (%)
Detection Rate (%)
Detection Rate (%)
80
0
100 100
60
60
40
40
20
20
0
0 0
BP-train BP-train BP-train + GA-refine BP-train + GA-refine GA-train GA-train GA-train + GA-refine GA-train + GA-refine
0
200 200
400
400
600
600
800
800
Alarm FalseFalse Alarm Rate Rate (%) (%)
Alarm Rate (%) FalseFalse Alarm Rate (%)
ROCROC curve for "head020" in theincoin pictures curve for "head020" the coin pictures (a) (a)
ROCROC curvecurve for "tail020" in theincoin pictures for "tail020" the coin pictures (b) (b)
(a)
(b)
Fig. 12. Comparison ofofthe results for head020 and class tail020 in coincoin pictures using Fig. 12. Comparison thethe results forclass class head020 andand class tail020 in the pictures using Fig. 12. Comparison of results for class head020 class tail020 in the pictures using the four detection methods. the four detection methods. the four detection methods.
From the results presented so far, detecting heads and tails of 5 cent coins turned out to be relatively straight forward, while detecting the heads and tails of 20 cent coins was a more difficult problem. To give a clearer view of the comparison of the four methods, we present extended ROC curves of detecting the heads and tails of 20 cent coins in Fig. 12. As can be seen from this figure, at all levels of
February 5, 2004 WSPC/157-IJCIA March 24, 200412:1 16:26 WSPC/157-IJCIA p-final 00115 February 5, 2004 12:1 WSPC/157-IJCIA p-final
104
M. Zhang & V. Ciesielski Neural Networks andand Genetic Algorithms forfor Object Detection Neural Networks Genetic Algorithms Object Detection 2727
100 100
60
40
40
20
20
0
0
Detection Rate (%)
60
0
80 80
80
BP-train BP-train BP-train + GA-refine BP-train + GA-refine GA-train GA-train GA-train + GA-refine GA-train + GA-refine
0
10001000
20002000
30003000
Detection Rate (%)
Detection Rate (%)
Detection Rate (%)
80
100 100
60 60
40 40
20 20
40004000
0
0 0 0
BP-train BP-train BP-train + GA-refine BP-train + GA-refine GA-train GA-train GA-train + GA-refine GA-train + GA-refine 5000 5000
10000 10000
FalseFalse Alarm RateRate (%)(%) Alarm
False Alarm Rate (%)(%) False Alarm Rate
ROCROC curve for "haem" in the retina pictures curve for "haem" in the retina pictures
ROC curve forfor "micro" in the retina pictures ROC curve "micro" in the retina pictures
(a)(a) (a)
(b)(b)
(b)
Comparison of the results detecting class haem and class micro retina pictures Fig.Fig. 13. 13.Comparison of the results forfor detecting class haem and class micro in in thethe retina pictures using the four detection methods. using the four detection methods. Fig. 13. Comparison of the results for detecting class haem and class micro in the retina pictures using the four detection methods.
detection rate, the BP-train+GA-refine method did not produce any false alarms, the GA-train+GA-refine method always resulted in fewer false alarms than the GA-train method and GA-train always resulted in more false alarms than BPtrain. 6.4. Discussion 6.4. Discussion Figure 13 shows the results in ROC curves for classes haem and micro, which This section investigated the use of genetic algorithms network training and This section investigated use oftwo genetic algorithms forforthe network training and show the same pattern asthe the other databases except that GA-train method network refinement for the object classification and detection problems. The three produced fewer false alarms than the BP-train method at most detection rates for network refinement for the object classification and detection problems. The three methods introduced the approach and the basicapproach approach werecomcomdetecting class micro. However, itphase could not achieve 100% detection rate. were methods introduced in in the twotwo phase approach and the basic pared three detection problems. The methods which incorporated therefinerefinepared on on thethe three detection problems. The methods which incorporated the ment genetic algorithm always resulted in better detection performance than the ment algorithm always resulted in better detection performance than the 6.4.genetic Discussion corresponding methods without the refinement in all three databases. Of the four corresponding methods without the refinement in all three databases. Of the four This section investigated the use ofbackward genetic algorithms for network trainingand andthe methods, combination errorpropagation propagation algorithm methods, thethe combination of of thethe backward error algorithm and the network refinement for the object classification and detection problems. The three refinement genetic algorithm, BP-train+GA-refinemethod, method,always alwaysproduced produced refinement genetic algorithm, thethe BP-train+GA-refine methods introduced in the two phase approach and the basic approach were comthe best detection performance. thepared best detection performance. on the three detection problems. The methodswe which incorporated the values refineachieve the goal of domain independence, usedthe theraw rawpixel pixel To To achieve goal of always domain independence, used valuesthedi-diment geneticthe algorithm resulted in better we detection performance than rectly inputs to neural networks. One question that remains whetherallallpixel pixel rectly as as inputs to neural networks. question remains is iswhether corresponding methods without theOne refinement inthat all three databases. Of the four values needed whether a smaller domain independent featuresofofsome some values areare needed or or whether a smaller setset of of domain independent features methods, the combination of the backward error propagation algorithm and the kind could be used. We have explored one approach using features. We have used kind could begenetic used. We have explored one approach using features. We produced have used refinement algorithm, the BP-train+GA-refine method, always features computed from regions in the input field of the networks as shown in figure features computed regions in the input field of the networks as shown in figure the best detectionfrom performance. These features favour the kinds small objectsthat thatexist existininour ourthree threedata data 14.14. These features kinds of of small objects To achieve thefavour goal ofthe domain independence, we used the raw pixel values dibases. An investigation which used the basic approach as described in section rectly inputs to neural networks. Onebasic question that remains is whether all pixel 4 4 bases. Anasinvestigation which used the approach as described in section but with features based on means and standard deviations of the regions needed or whether a smaller set of domain independent ofshown someinin butvalues with are features based on means and standard deviations of the features regions shown figurecould 14 found thatWe thehave sameexplored detection rates could be achieved, but that the false kind bethat used. one approach features. have figure 14 found the same detection rates could beusing achieved, butWe that theused false 37 alarm rates were much higher . This suggests that some of the pixels are necessary 37 alarm rates were much higher . This suggests that some of the pixels are necessary for the reduction of false alarms.
for the reduction of false alarms.
March 24, 2004 February 5, 200416:26 12:1 WSPC/157-IJCIA WSPC/157-IJCIA
00115 p-final
Neural Networks and Genetic Algorithms for Object Detection Mengjie Zhang and Victor Ciesielski
28
E1
A1
n/2
A2
n
G1
E2
G2
B1
B2
H2
H1
n/2
O
D2
F2
D1
C2
F1 n/2
C1 n/2
n
Features mean sd F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20
105
Regions and Axes of interest big square A1-B1-C1-D1 small central square A2-B2-C2-D2 upper left square A1-E1-O-G1 upper right square E1-O-H1-B1 lower left square G1-O-F1-D1 lower right square O-H1-C1-F1 central row of the big square G1-H1 central column of the big square E1-F1 central row of the small square G2-H2 central column of the small square E2-F2
Fig. 14. Features used instead of pixels. Fig. 14. Features used instead of pixels.
7. Conclusions features computed from regions in the input field of the networks as shown in Fig. 14. These features favour the kinds of small objects that exist in our three The goal of the work presented in this paper was to develop and evaluate a domain databases. An investigation which used the basic approach as described in Sec. 4 independent approach to multiple class rotation invariant object detection problems but with features based on means and standard deviations of the regions shown in and to characterise the kinds of problems for which it is likely to be successful. We Fig. 14 found that the same detection rates could be achieved, but that the false have been successful in developing a method based on learning a neural network alarm rates were much higher.37 This suggests that some of the pixels are necessary classifier trained on cutouts of the objects of interest and then refining the network for the reduction of false alarms. weights using a genetic algorithm with fitness based on maximising the detection rate and minimising the false alarm rate. We have determined that the technique is likely to be successful if (1) the objects of interest are smaller than about 24 × 24 7. Conclusions pixels, (2) the objects are on a relatively uniform background, (3) the objects in each arethe roughly the same in size and (4) the are and regular but can have The class goal of work presented this paper was objects to develop evaluate a domain complex internal detail.to multiple class rotation invariant object detection problems independent approach independence has of been achieved using pixel as inputs andDomain to characterize the kinds problems for by which it israw likely to values be successful. We to thebeen network and thus avoiding the problemsbased of hand-crafting have successful in developing a method on learning feature a neuralselection network programs. Rotation invariance is achieved havingand thethen objects in the classifier trained on cutouts of the objects ofbyinterest refining thetraining network pictures at random orientations and/or by rotating the cutouts prior to network weights using a genetic algorithm with fitness based on maximizing the detection training. the method findsalarm objects of We different in a single pass, unlike rate and Also, minimizing the false rate. have classes determined that the technique most current in this ifarea use different programs in multiple independent is likely to bework successful (1) which the objects of interest are smaller than about 24 × 24 stages achieve localisation classification. pixels,to(2) the objects are onand a relatively uniform background, (3) the objects in There negative aspects the(4) method – thereareare a number of new each classare aretwo roughly the same sizeofand the objects regular but can have parameters whose detail. values must be determined by the user and the run times for the complex internal genetic algorithms can be quite could raw be ameliorated use Domain independence has long, been although achieved this by using pixel valuesbyasthe inputs of hardware. toparallel the network and thus avoiding the problems of hand-crafting feature selection The method achieved a 100% is detection rate a 0%the falseobjects alarm rate for training simple programs. Rotation invariance achieved byand having in the object detection against a uniform background in the easy pictures and medium pictures at random orientations and/or by rotating the cutouts prior to network complexity object against a relatively uniform background the unlike coin training. Also, thedetection method finds objects of different classes in a single in pass, pictures. However, onthis a very problem involving retinal pathologies on a most current work in areadifficult which use different programs in multiple independent highly background theand detection performance was disappointing. However, stagescluttered to achieve localization classification. it was competitive with alternative techniques on similar problems. There are two negative aspects of the method — there are a number of new The success of the method easy andbymedium duetimes to success parameters whose values must on be the determined the userpictures and theisrun for the genetic algorithms can be quite long, although this could be ameliorated by the use
March 24, 2004 16:26 WSPC/157-IJCIA
106
00115
M. Zhang & V. Ciesielski
of parallel hardware. The method achieved a 100% detection rate and a 0% false alarm rate for simple object detection against a uniform background in the easy pictures and medium complexity object detection against a relatively uniform background in the coin pictures. However, on a very difficult problem involving retinal pathologies on a highly cluttered background the detection performance was disappointing. However, it was competitive with alternative techniques on similar problems. The success of the method on the easy and medium pictures is due to success in learning templates for the objects of interest. As shown by the weight analysis, each template is actually a composite of a number of more primitive templates. Whether such a composite template can be learned for the retina pathologies by some alternative technique remains an open question. The method that gave the best detection performance was training using back propagation followed by weight refinement using a genetic algorithm. Also, weight refinement always improved detection performance no matter what training method was used. Acknowledgement We would like to thank Dr. James Thom and Dr. Zhi-Qiang Liu for a number of useful discussions on image processing and retrieval, and Chris Kamusinski for providing and labeling the retina pictures. References 1. P. D. Gader, J. R. Miramonti, Y. Won and P. Coffield, Segmentation free shared weight neural networks for automatic vehicle detection, Neural Netw. 8(9) (1995) 1457–1473. 2. A. M. Waxman, M. C. Seibert, A. Gove, D. A. Fay, A. M. Bernandon, C. Lazott, W. R. Steele and R. K. Cunningham, Neural processing of targets in visible, multispectral ir and sar imagery. Neural Netw. 8(7/8) (1995) 1029–1051. 3. H. L. Roitblat, W. W. L. Au, P. E. Nachtigall, R. Shizumura and G. Moons, Sonar recognition of targets embedded in sediment, Neural Netw. 8(7/8) (1995) 1263–1273. 4. M. W. Roth, Survey of neural network technology for automatic target recognition, IEEE Trans. Neural Netw. 1(1) (1990) 28–43. 5. D. H. Ballard and C. M. Brown, Computer Vision (Englewood Cliffs, NJ, PrenticeHall, Inc., 1982). 6. R. Brunelli and T. Poggio, Face recognition through geometrical features, ed. S. M. Ligure, in Proc. ECCV ’92, 1992, pp. 792–800. 7. R. Brunelli and T. Poggio, Face recognition: Features versus templates, IEEE Trans. PAMI 15(10) (1993) 1042–1052. 8. M. V. Shirvaikar and M. M. Trivedi, A network filter to detect small targets in high clutter backgrounds, IEEE Trans. Neural Netw. 6(1) (1995) 252–257. 9. L. Spirkovska and M. B. Reid, Higher-order neural networks applied to 2D and 3D object recognition, Mach. Learning 15(2) (1994) 169–199. 10. D. P. Casasent and L. M. Neiberg, Classifier and shift-invariant automatic target recognition neural networks, Neural Netw. 8(7/8) (1995) 1117–1129.
March 24, 2004 16:26 WSPC/157-IJCIA
00115
Neural Networks and Genetic Algorithms for Object Detection
107
11. P. Winter, S. Sokhansanj, H. C. Wood and W. Crerar, Quality assessment and grading of lentils using machine vision, Agricultural Inst. Cana. Ann. Conf., Saskatoon, SK S7N 5A9, Canada, July 1996, Canadian Society of Agricultural Engineering, CASE Paper No. 96-310. 12. V. Ciesielski and J. Zhu, A very reliable method for detecting bacterial growths using neural networks, in Proc. Int. Joint Conf. Neural Netw., Beijing, November 1992, pp. 62–67. 13. J. S. N. Jean and J. Wang, Weight smoothing to improve network generalisation, IEEE Trans. Neural Netwo. 5(5) (1994) 752–763. 14. S.-H. Lin, S.-Y. Kung and L.-J. Lin, Face recognition/detection by probabilistic decision-based neural network, IEEE Trans. Neural Netw. 8(1) (1997) 114–132. 15. A. Howard, C. Padgett and C. C. Liebe, A multi-stage neural network for automatic target detection, 1998 IEEE World Cong. Comput. Intell. — IJCNN’98, Anchorage, Alaska, 1998. pp. 231–236, 0-7803-4859-1/98. 16. Y. C. Wong and M. K. Sundareshan, Data fusion and tracking of complex target maneuvers with a simplex-trained neural network-based architecture, 1998 IEEE World Comput. Intell. — IJCNN’98, Anchorage, Alaska, May 1998. pp. 1024–1029, 0-78034859-1/98. 17. D. Valentin, H. Abdi and O’Toole, Categorization and identification of human face images by neural networks: A review of linear auto-associator and principal component approaches, J. Biol. Syst. 2(3) (1994) 413–429. 18. P. Winter, W. Yang, S. Sokhansanj and H. Wood, Discrimination of hard-to-pop popcorn kernels by machine vision and neural network, ASAE/CSAE Meeting, Saskatoon, Canada, September 1996, Paper No. MANSASK 96-107. 19. Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradient-based learning applied to document recognition, Intelligent Signal Processing, IEEE Press, 2001, pp. 306–351. 20. B. Verma, A neural network based technique to locate and classify microcalcifications in digital mammograms, 1998 IEEE World Cong. Comput. Intell. — IJCNN’98, Anchorage, Alaska, IEEE, 1998. pp. 1790–1793, 0-7803-4859-1/98. 21. D. E. Rumelhart, G. E. Hinton and R. J. Williams, Learning internal representations by error propagation, in Parallel Distributed Processing, Explorations in the Microstructure of Cognition: Foundations, eds. D. E. Rumelhart, J. L. MeClelland and the PDP research group, Vol. 1, Chap. 8 (The MIT Press, Cambridge, MA London, 1986). 22. T. Kohonen, Self-Organization and Associative Memory, 3rd edn. (Springer, Berlin Heidelberg New York, 1988). 23. G. L. Giles and T. Maxwell, Learning, invariances, and generalisation in high-order neural networks, Appl. Opt. 26(23) (1987) 4972–4978. 24. G. A. Carpenter and S. Grossberg, A massively parallel architecture for a selforganising neural pattern recognition machine, Comput. Vision Graphics Image Process. 37 (1987) 54–115. 25. G. A. Carpenter and S. Grossberg, Stable self-organisation of pattern recognition codes for analog input patterns, Appl. Opt. 26 (1987) 4919–4930. 26. P. G. Korning, Training neural networks by means of genetic algorithms working on very long chromosomes, Int. J. Neural Syst. 6(3) (1995) 299–316. 27. D. J. Montana and L. Davis, Training feedforward networks using genetic algorithms, in Proce. 11th Int. Conf. Artif. Intell., (Morgan Kaufmann, San Mateo, CA, 1989), pp. 762–767. 28. D. Whitley and T. Hanson, Optimising neural networks using faster more accurate genetic search, in Procce. 3rd Int. Conf. Genetic Algorithms and their Appl. (Morgan Kaufman, 1989), pp. 391–396.
March 24, 2004 16:26 WSPC/157-IJCIA
108
00115
M. Zhang & V. Ciesielski
29. V. Ciesielski and J. Riley, An evolutionary approach to training feed forward and recurrent neural networks, in Proc. 2nd Int. Conf. Knowledge Based Intell. Electronic Syst., eds. L. C. Jain and R. K. Jain, April 1998, Adelaide, pp. 596–602. http://www.cs.rmit.edu.au/˜vc/papers/kes98.pdf. 30. R. Krishnan and V. Ciesielski, 2DELTA-GANN: a new approach to using genetic algorithms to train neural networks, in Proc. 5th Austral. Neural Netw. Conf. ed. A. C. Tsoi, University of Queensland, Brisbane, Feb 1994, pp. 38–41. http://www.cs.rmit.edu.au/˜vc/papers/acnn94-2delta.pdf. 31. M. A. Potter, A genetic cascade-correlation learning algorithm, in Proc. Int. Workshop on Combinations of Genetic Algorithms and Neural Netw., eds. Schaffer and Whitley (Morgan Kaufmann, July 1992), pp. 366–372. 32. X. Yao and Y. Liu, A new evolutionary system for evolving artificial neural networks, IEEE Trans. Neural Netw. 8(3) (1997) 694–713. 33. X. Yao, A review of evolutionary artificial neural networks, Int. J. Intell. Syst. 8(4) (1993) 539–567. 34. X. Yao, Evolving artificial neural networks, in Proc. IEEE 87(9) (1999) 1423. 35. M. Zhang, A domain independent approach to 2D object detection based on the neural and genetic paradigms PhD thesis, Department of Computer Science, RMIT University, Melbourne, Australia, August (2001). 36. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard W. Hubbard and L. D. Jackel, Backpropagation applied to handwritten zip code recognition, Neural Comput. 1 (1989) 541–551. 37. N. Rai, Pixel statistics in neural networks for domain independent object detection, Minor thesis, RMIT, Department of Computer Science (2000).