A Hybrid Learning System for Image Deblurring Min Su and Mitra Basu Department of Electrical Engineering The City College & The Graduate Center of the City University of New York New York, NY 10031
[email protected] [email protected] Abstract In this paper we propose a 3-stage hybrid learning system with unsupervised learning to cluster data in the rst stage, supervised learning in the middle stage to determine network parameters and nally a decision making stage using voting mechanism. We take this opportunity to study the role of various supervised learning systems that constitute the middle stage. Speci cally, we focus on one-hidden layer neural network with sigmoidal activation function, radial basis function network with Gaussian activation function and projection pursuit learning network with Hermite polynomial as the activation function. These learning systems rank in increasing order of complexity. We train and test each system with identical data sets. Experimental results show that learning ability of a system is controlled by the shape of the activation function when other parameters remain xed. We observe that clustering in the input space leads to better system performance. Experimental results provide compelling evidences in favor of use of the hybrid learning system and the committee machines with gating network. Committee machine, Gating network, Image deblurring, Radial basis function, Projection pursuit learning, Unsupervised learning, supervised learning
This work was supported by a PSC-CUNY grant from the City University of NewYork.
1
1 Introduction The goal of this work is to explore eectiveness of mixed learning in a committeemachine environment. The proposed system is tested on blurred images. To evaluate performance, we use the degree of successful deblurring of corrupted images attained by the system as a scale of measurement. The problem of image deblurring is particularly suited for learning for a number of reasons: There is little information available on the source of blurring. Usually blurring is the result of a combination of events which makes it too complex to be mathematically described. Sucient data is available and it is conceivable that data captures the fundamental principle at work. There are large complex information-rich data sets available in image restoration problems. We observe that there is currently a paradigm shift from the classical modeling based on rst principles to developing models from data. The ability of a system to extract useful knowledge in these data and to act on that knowledge is the key to its success. The learning systems that we will explore in this paper are data-driven. There are two common types of learning - supervised and unsupervised. In supervised learning, data set contains the input vectors and the corresponding output vectors (also known as the desired responses). There are two stages in the operation of a supervised learning system (1) learning/estimation of the mapping function from the given data set (also known as training samples) and (2) prediction of output using the mapping learned from (1) for unknown input sample. In the learning stage the network parameters are adjusted iteratively under the combined in uence of the input vectors and the error1 signal. In unsupervised learning, there are no labeled examples (desired outputs) available. In this paper all three learning systems that we explore employ supervised learning strategy. We reformulate the problem in the framework of learning as : Training Phase :Given an original image I and its blurred version I b, train a learning network with I b as input and I as output. Testing Phase :Use blurred image J b as input for the trained network to recover the original image J . We consider three learning systems with identical architecture. Each has a single hidden layer and same number of neurons in each layer. A back-propagation neural network A radial-basis function network A projection pursuit learning network The only distinguishing characteristic here is the form of the activation function. This constitutes the middle stage of the proposed system. The image data is clustered by using unsupervised learning before it goes into a network in the middle stage. The Usually, the error signal is de ned as the Euclidean distance between the desired response and the actual response of the network at that instance. 1
2
clustering provides the basis for use of a committee of networks. In the last stage, we make use of a gating network to produce the nal output. This paper is organized in the following manner. Next section describes the problem of blurring. The learning systems are introduced in section 3. We devote section 4 to discuss the issues involved with implementation of PPLN. The technique used for clustering is described in section 5. In the same section, we provide the architecture for committee machine along with the gating network. Experimental results are presented in section 6. The last section contains a brief discussion.
2 Problem of Blurring Image blurring is the most common type of degradation found in images. Blurring is hard to avoid in any image acquisition system, and can be caused by many sources such as: (1) atmospheric turbulence, (2) an out-of-focus optical system ( which results in the point spread function of the imaging system not being an impulse function), and (3) aberrations in the imaging system. Obviously, all these causes cannot be simultaneously captured in a single model. However, the traditional approach has been to approximate the net result of the complex interplay among these independent random sources by a Gaussian blur [12]. Before de ning any solution, we must know how the image was formed, i.e. we must have a mathematical model of the image formation system. For this work, the linear shift invariant model will be used. The linear model gives the following relation: g(m; n) = h(m; n) f (m; n) (1) where g; h and f are the blurred image, the point-spread function (PSF) of the blurring system and the original image, respectively. The aim of image restoration is to bring the image back to what it would have been without degradation, i.e. to recover f (m; n) from the information in g(m; n). In order to extract the original image f (m; n) from the output image g(m; n), a deconvolution is required. In the frequency domain, this can be expressed as an inverse lter G(u; v) = H (u; v)F (u; v) (2) The inverse or restoring lter, Q(u; v), may be easily found from H (u; v): 1 Q(u; v) = H (u; (3) v) However, it is very hard to determine and implement the inverse lter. For this reason an alternative to nding an inverse operator for image restoration is of interest [11]. The signi cance of polynomials in deblurring images has been explored by several researchers. One possible way of approaching the problem was considered by Hummel et al. [3, 9]. It was noted that the process of reversing blur is unstable and cannot in general be represented as a convolution lter in the spatial domain. They posed the
3
deblurring problem in a variational framework, and constrained the spaces of the original and blurred signals, in an attempt to convert it to a well-conditioned problem. They demonstrated that it is possible to invert the Gaussian blur in a stable manner, if the class of input functions is restricted to polynomials of xed nite degree. They assume a convolution form for the inverse operator so that the inversion may be realized as a lter operation. The aim was to nd an inverse lter which when convolved with the blurred function restores it to the original. The inverse (or deblur) lter DN (x), which involves a polynomial of degree N , restores the blurred function f (x) = g(x) DN (x) (4) Martens [9] presented an algorithm for deblurring and interpolating digital images. He assumed that a signal can be locally described by polynomial coecients, and that these coecients can be estimated from the sampled signal S (kT ) by means of digital lters Hn , for n = 0; : : : ; N . These estimated coecients can then be used to make a deblurred estimate of the original signal. The resulting signal estimate is given by
L(x) =
N X S(jT ) X I (x , jT )
(5)
X H (k)P (x , kT )
(6)
n=0
j
where
In (x) =
n
k
n
n
is the n-th order deblurring lter, and Pn are pattern functions. The overall deblurring function
I (x) =
N X I (x )
n=0
(7)
n
can be controlled by the order N .
3 The Learning Networks We consider three learning networks. We limit our discussion on back-propagation and radial basis function networks since a number of standard text books contain detailed information. The description of projection pursuit learning network (PPLN) is usually not found in such text books. So, we attempt to make the subsection covering PPLN selfcontained. A survey of works on application of PPLN to problems in pattern recognition can be found in [2]. In [1, 2] application of PPLN to image deblurring is discussed in detail.
4
Figure 1: Back-propagation Neural Network
3.1 Back-propagation neural networks
We use a single hidden layer feedforward structure (see Figure 1). The activation function for the hidden layer neuron is a sigmoidal function. The output for the lth neuron in the hidden layer is given by 1 hl = PP , 1 + exp j=1 lj xj where P denotes number of neurons in the input layer, xj is the j th component of the current input vector x and lj is the weight associated between the lth neuron in the hidden layer and the j th neuron in the input layer. The notation si inside the partially shaded circles in the gure indicates the use of sigmoidal function for these neurons. The output unit is linear; that is, the output is computed as the weighted sum of the signals from the hidden neurons.
3.2 Radial basis function networks
The radial basis function (RBF) network is a feedforward structure with a modi ed activation function in the hidden layer. The activation function is derived from a special class of function known as radial functions. The characteristic feature is that its response decreases monotonically with distance from a central point. The center, the distance scale, and the precise shape of the radial function are the parameters of the model. RBF networks have traditionally been associated with radial function in a single hidden layer network (see Figure 2). Here, we focus on single hidden layer networks with activation functions (denoted by gi inside the partially shaded circles in the gure). We use Gaussian activation function in the hidden layer. The output hji of the ith
5
Figure 2: Radial Basis Function Network hidden unit, when the j th input vector is presented to the network, is given by hji = exp,jj j , i jj2=2i 2 Here ti denotes the center where the Gaussian activation function corresponding to the ith neuron in the hidden layer is centered. This function's maximum response is concentrated in the neighborhood of the input vectors that are similar to ti , falling o exponentially with the square of the distance. The variance of the Gaussian function, i2 , is another parameter that is used to adjust its width. In our experiment, both parameters i.e., mean and variance are learnable. The output units are usually linear units; that is, the output of the kth neuron : x
ok =
t
m X w i=1
ki hji
where m denotes the number of neurons in the hidden layer and wki is the weight associated with the kth output neuron and the ith hidden neuron.
3.3 Projection pursuit learning networks
A projection pursuit learning network (PPLN) possesses a structure very similar to a one hidden layer neural network (with sigmoidal nonlinearity). In the case of the PPLN, the sigmoidal functions are replaced by unknown functions to be learned from the data. Since the sigmoidal functions are replaced by more general functions, PPLN can be viewed as a generalization of a one hidden layer sigmoidal feedforward neural network. The nonparametric regression problem can be stated as follows: given n vector pairs in a p-dimensional space (y ; x ) = (yl1 ; yl2 ; : : : ; ylq ; xl1 ; xl2 ; : : : ; xlp ); l = 1; 2; : : : ; n (8) l
l
6
that have been generated from unknown models yli = gi(x ) + li ; l = 1; 2; : : : ; n; i = 1; 2; : : : ; q (9) the aim of regression is to construct the estimators, g^1 , g^2 , . . . , g^q , and to use these estimates to predict a new y given a new x. The PPLN for regression [5, 13] is mathematically modeled as a one-hidden layer feedforward network (see Figure 3) and it approximates a function using: l
m X y^ = y + i
i
k=1
p X f(
ik k
j =1
kj xj )
(10)
where ik are the projection strengths, fk are the unknown smooth activation functions of a particular form (e.g. Hermite polynomials)and kj are the projection directions. These three parameters are determined by training the network to minimize the mean squared error loss function:
L2 =
Xq W E(y , y^ ) i=1
i
i
i
(11)
2
where E is the expectation operator de ned as Xn E (y ) = 1 y = y
(12) i n l=1 li and the weights Wi indicate the relative contribution of each mean squared output error to the total L2 loss. i
The traditional training algorithm for a PPLN trains the hidden units one at a time, as opposed to all at once as is the case in a back-propagation neural network. The algorithm can be described as follows for the k-th hidden layer neuron: 1. Make initial guesses for k , fk and k . 2. Estimate ^k = k + using an iterative optimization method. 3. Given ^k , estimate f^k as the smooth curve which best ts the scatterplot [zkl ; fk (zkl )], where zkl = ^k T xl . 4. Repeat steps 2-3 for several iterations. 5. Use the most recent values of f^k and ^k to evaluate ik . ( ik can be computed by setting the derivatives of the loss function L2 with respect to ik equal to zero). 6. Repeat steps 2-5 until the loss function is minimized with respect to all ik , k and fk associated with the k-th neuron. This procedure is then repeated for the (k + 1)-th hidden layer neuron. When Hwang et al. [4] compared PPLN and batch Gauss Newton BPNN, they found that both networks have comparable training speed and accuracy for independent test data. However, PPLNs are signi cantly more parsimonious as they require fewer neurons.
7
Figure 3: A standard PPLN The role of the activation functions in deblurring an image can be better understood from their frequency plots of the magnitude portion. In blurred images, the high frequency content is attenuated. Thus, the major task of any deblurring method is to recover the lost frequency information. Let us examine the individual frequency plots shown in Figure 4. The sigmoidal function has a narrow width and thus can recover only small amount of high frequency content. The Gaussian function can recover more high frequency information than the sigmoidal function. The Hermite function includes the Gaussian function. In addition, it contains the derivatives of the Gaussian function. This function with its variable width window can recover most of the lost frequency information. Note that, both Gaussian and Hermite functions can adjust their window width through the variance parameter. However, only Hermite function can control the shape of the window using the coecients and the order.
4 Implementation We have used MATLAB Neural Network ToolBox to implement back-propagation and RBF networks. The programming part for PPLN is more involved. We discuss some of the issues involving implementation later in this section. The fundamental idea in this work is that of a learning system as a lter. In developing a solution, the exibility of the learning system should lead to a method which is robust in the sense that it may be applied in situations where little or no a priori information about the degradation is available. For the network to undo the degradation under this condition, however, it is necessary to know something about the image. Therefore, we train the network to develop a deblurring lter on a small region of a blurred image. Provided the blur is shift invariant, then the same inverse lter may be applied to the entire image, as well as dierent images [8].
8
1
1
1
0.9
0.9
0.9
0.8
0.8
0.8
0.7
0.7
0.7
0.6
0.6
0.6
0.5
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.3
0.2
0.2
0.2
0.1
0.1
0.1
0
0
0
20
40
60
80
100
120
Sigmoid Function
140
160
0
20
40
60
80
100
120
140
160
0
0
Gaussian Function
20
40
60
80
100
120
140
160
Hermite Function (9th orders)
Figure 4: Frequency plots for activation function This application of learning system to image deblurring is an example of regression performance in high-dimensional spaces. In the simulations we considered gray-scale images sized 512 x 512 pixels. An image is convolved with the Gaussian lter of spread , and from the resulting blurred image we select a 128 x 128 sub-image to be used for training. During training, each pixel and its eight surrounding neighbors in a 3 x 3 region of the sub-image are presented as a 9 x 1 input vector to the learning system. The appropriate output value for each input vector is obtained from the intensity value in the original image corresponding to the location of the center pixel in the 3 x 3 region. For uniformity, all intensity values are normalized. We then apply all three networks obtained at convergence to other images, each with a dierent level of blurring.
4.1 Model selection strategy for PPLNs
The construction of the PPLN consists of two steps { a forward growing procedure and a backward pruning procedure. During the forward growing procedure, hidden layer neurons are added and optimized one at a time using the training algorithm described in Section 3. After the parameters associated with the neuron under consideration have been estimated, a back tting technique is used to update the parameters of previously installed neurons. Once the network has grown to m neurons, a backward pruning procedure is applied to remove over tting neurons one at a time. The PPLN again uses the back tting procedure to t models of decreasing size, m~ = m , 1; m , 2; : : : ; m, to the data, where m and m are speci ed by the user. The most important m out of (m~ + 1) hidden neurons are kept at each step. The importance is measured by
Ik =
Xq W j ^ j; j jk
j =1
k = 1; : : : ; m~ + 1
(13)
where ^jk are the estimates for the (m~ + 1)-hidden neuron model. Intuitively, a low value for Ik indicates that the weights, ^jk s, for the kth neuron do not make signi cant
9
180
Figure 5: Committee of Neural Networks contribution to the output and should be ignored. More important neurons are the ones that produce large value for I . The parameters Wj allows the users to specify the relative contribution of each ^jk . Typically, one chooses Wj = 1=var(yj ). These variances are either known or estimated from the data. To determine an appropriate number of neurons for the PPLN model, we applied a strategy used in [10] for nonparametric regression. First, we run the PPLN with m = 1 and set m at a value large enough for the problem at hand. For a relatively small number of variables p (p 4), we choose m p. For large p, we choose m < p, hoping for a parsimonious representation. For each neuron m~ , 1 m~ m , the PPLN evaluates the fraction of unexplained variance Pq Wi hyi , yi , Pmk~=1 ^k f^k(^T xi)i2 Pni=1 Wi[yi , yi]2 k (14) e2 (m~ ) = i=1 The idea is to minimize the fraction of unexplained variance (weighted sum of squared residuals) through the choice of , f and . In a plot of e2 (m~ ) versus m~ which is decreasing, e2 (m~ ) often decreases rapidly when m~ is smaller than a good number of neurons m0 , and then tends to decrease more slowly for m~ larger than m0 . As a result, we must run the PPLN twice for m~ = m; : : : ; m , using two dierent values of m. The rst time m = 1 is used to nd a good number of neurons m0 . The second time m = m0 is used to obtain the output of the simulations.
10
5 Committee Machines & Gating Network The idea of committee machines is based on a simple engineering principle namely, divide and conquer. A complex computational task is divided into a set of less complex tasks. The solutions of these are combined at a later stage to produce the solution to the original problem. Jacobs et. al. [6] proposed that the training data be separated into groups and each group be used to train one network. They use their model in the multispeaker vowel recognition problem and achieve shorter training time and improved recognition accuracy. Later it has been shown that this architecture provides the exibility of general nonlinear regression while retaining a strong avor of parametric statistics [7]. We postpone the discussion on the methods to combine outputs of several networks for later. Our objective is to divide the input space into a number of sub-spaces, Sn, described by directional unit vectors, vn , that correspond to some useful information. This creates a certain clustering eect on the input vectors since a vector will lie in the subspace Sn represented by vn that is most similar to this vector with respect to its informationcontent. For each such cluster we train a network. In this manner, we create specialized networks - the Committee of Neural Networks. By following this procedure we have eliminated the problem of forcing one network to learn input vectors that are distant from each other. A general structure for a network with processed data set is shown in Figure 5. Of course, the choice of number and directions of vectors remains dependent on problem at hand. A general procedure is outlined below. 1. Decide the number of sub-spaces Ns 2. Select the sub-space direction vector vn , (n = 1; 2; : : : ; Ns ) that will best represent the sub-space Sn . 3. Normalize the direction vector vn . 4. Calculate the correlation between each input vector and vn to assign it to the correct sub-space. The Committee of Neural Networks consists of Ns networks. Each network is specially trained to recognize a vector from the corresponding input sub-space. Note that each sub-space direction vector is a function of the distribution of the input values. This is a crucial point, since we are attempting to discover structure in the data. Consider the following example with Ns = 3 and the size of the input vector as 5. We create three directional vectors vh , vc , vl in the following manner : vh is obtained by rearranging (any) 5 integers in descending order. vc is obtained by rearranging (any) 5 integers in a triangular fashion where the highest value occurs in the middle and values on either side are in descending order. vl is obtained by rearranging (any) 5 integers in ascending order. Examplepwith a 5 1 vector : vh = (1=p 55)[5 4 3 2 1] vl = (1= 55)[1 2 3 4 5]
11
p vc = (1= 55)[2 4 5 3 1] The introduction of K directional vectors divides the multi-dimensional input space into K distinct sub-regions. Since the input is normalized, the focus is on the direction of the vector. This requires that we train K networks2 (neti i = 1; 2; : : : ; K ) (see Figure 5). In the training phase, an input vector is compared with all K directional vectors for the best match. This selects the network that receives this input vector for training purpose. Note that the mean for each sub-region is the directional vector that represents it. However, after the training is completed the mean for each sub-region may shift according to the distribution of the input vectors assigned to that sub-region. Thus for each sub-region Ni , we associate two directional vectors namely; the original vector vi and the shifted vector, v^ i , produced by the training process. In the testing phase, the test vector is sent to all K networks. This procedure is repeated for each test vector.
5.1 The Gating Network
In this part we focus on the methods that can be used to combine the outputs from several networks. This can be broadly divided into two categories. Static Structure The response of several networks are combined by means of mechanisms that do not involve the initial input data. Dynamic Structure The initial input data is used to integrate the outputs of the individual networks into an overall output. In this paper, we use dynamic structure for our gating network. The novel method of involving input in the nal decision process shows that the role of the input is not over after the training process. Rather, it has a de nite say in preserving its information content in the overall output. For every input vector, we obtain K vectors produced by K networks. At this point a decision must be made to produce one and only one nal output vector. The gating network parameter gij controls the output of the j th network when the input is the vector xi . The overall output vector yi is computed as follows.
yi =
K X gij fj
j =1
The output of the j th network is denoted by fj . Note that the gating network has K parameters. Computation of gij consists of two steps. The rst step involves the computation of an intermediate parameter, tij , where the input xi and the directional vectors vj (the shifted directional vectors v^ j ) play a direct role. For example, one may compute tij = xTi vj which measures the similarity between the input and the directional vectors. In the second step, tij is used to compute gij . We describe a number of ways to do so. P 1. Normalize the gating network parameter by the norm of the sum , gij = tij = Kj=1 tij 2
The network is either a neural net with sigmoidal activation function or a RBF network or a PPLN.
12
2. Normalize the gating network parameter q 2 by 2the distance (the Euclidean norm), gij = tij =jjtij jjj where jjtij jjj = ti1 + ::: + tiK . 3. Use average method to compute the gating network parameter, gij = tij =K . 4. Normalize the gating network parameter by the maximum tij , gij = tij =maxj (tij ). 5. Normalize the gating network parameter by an exponential mapping function, gij = (tij ) PKexpexp . (t ) j=1
ij
where i = 1; : : : ; N and N is the total number of input vectors and K is the number of expert networks. Note that the gating network parameters gij s do not have xed values are not learned decide the weights to be assigned to individual network output in a dynamic manner
6 Experimental Results We carried out three sets of experiments. Experiment 1 In this experiment neither the clustering nor the gating network is used. Experiment 2 Here we use clustering to separate input data. Gating with static structure produces the overall output. Experiment 3 Both clustering and gating network are used in this experiment. However, the gating employs dynamic structure.
6.1 Design of Experiments
The data set consists of input vectors of size 9 1 prepared from the 3 3 region of the blurred image. The training set consists of such vectors as input and the central pixel values from the corresponding region of the original image as the output. Let us denote this training set as TRegular . We use three networks to study the role of the activation function. All three networks have the same number of neurons in the input, hidden and output layer (9-9-1). Each is a fully connected network with weights in the hidden and the output layers. Linear activation function is used for the output neuron for all networks. NN - This is a backpropagation network that uses a sigmoidal function as its activation function. RBF - This is a radial basis function network with Gaussian as the activation function. PPLN - This is a projection pursuit learning network with 9th order Hermite function as its activation function.
13
Table 1: Parameters for each network Network BP RBF PPLN
Activation Function Sigmoid Gaussian Hermite Polynomial
Learnable Parameters Wh , Wo Wh , Wo , mi, i Wh , Wo , ci
Here the weights between the input and the hidden layer is denoted by Wh . Similarly the weights between the hidden and the output layer is denoted by Wo . The symbols mi and i respectively represent the mean and the variance of the activation function for the RBF. The parameters ci s' are the coecients associated with the terms of the Hermite polynomial. Speci cally, we have 9 such coecients since we use a Hermite polynomial of order 9.
6.2 Experiments
For training, we use a portion of the blurred Lena image (see the boxed region in Figure 6. The testing is done on two blurred images, (a) Madonna image (see Figures 7(a) and 8(a) for the original and the blurred images respectively) and (b) Clock image (see Figures 7(b) and 8(b) for the original and the blurred images respectively). The original test images are provided solely for comparison purpose. They have not been used in training or testing phases. Experiment 1 used TRegular for training all three networks. Each network is tested with the corrupted images shown in Figure 8. We show the restored outputs from NN, RBF and PPLN in Figures 11 and 12 (see images in the column with Experiment 1 as the heading). Both Experiment 2 and Experiment 3 require clustering of the data set TRegular . This is described in the following subsection.
6.3 Data Preparation for Experiments 2 and 3
We use three directional vectors to characterize three types of edges in the image. Our choice is dictated by the problem of deblurring. Here are the few issues that we consider: Edges are important image characteristics. Blurring causes loss of edge information from images. The process of deblurring may produce a more useful image if enhancement is also achieved along with deblurring. Here, Ns = K = 3 and the size of the input vector is 9. We create three directional vectors vh , vc , vl in the following manner : vh (vl ) is obtained by rearranging (any) 9 integers in descending (ascending) order (a slowly varying edge).
14
vc is obtained by rearranging (any) 9 integers in a triangular fashion where the
highest value occurs in the middle and values on either side are in descending order ( a wedge shaped edge). The introduction of directional vectors divides the 9-dimensional input space into three distinct regions. As opposed to training one network in the rst experiment we need to train three networks (neth , netl and netc) in these experiments for each type of activation function (see Figure 5).
Training Phase for Experiments 2 and 3 : Step 1. Choose an input vector xi . Step 2. Compute tij = xivjT where j = h; l; c to nd the best match. Say tic has the largest value indicating that the input xi is most similar to the directional vector vc. Step 3. Use xi as the input vector and the center pixel value, yi, of the corresponding window from the original image to train the appropriate network (netc will be trained when tic has the largest value). Step 4. Repeat Steps 1 through 3 until all input vectors have been used.
Testing Phase for Experiment 2 : Step 1. Choose a vector zi from the test image and send this to all three networks. This produces three output values namely, fh, fl and fc. Step 2. The over all output yi is computed by taking an average of the three network outputs. Step 3. Repeat Steps 1 and 2 until all test vectors have been considered.
Testing Phase for Experiment 3 : Replace Step 2 from the above description by the following. Step 2. Compute tij = zi vjT . UsePone of the methods described above to compute gij s. The over all output yi =
fj gij .
Results are arranged in the following manner. The restored images for each test image by all three networks and three experiments are shown in one page for easy comparison. See Figures 11 and 12. In a given page, rows 1, 2 and 3 show the results achieved through NN, RBF and PPLN respectively whereas, the columns 1, 2 and 3 show the results from Experiment 1, Experiment 2 and Experiment 3 respectively. To get a better insight, we analyze the frequency plots. The high frequency content is attenuated in a blurred image. The frequency plots of deblurred images will display the high frequency content that has been restored. The frequency plots are arranged in the same way as the images. We compare this with the frequency plots from blurred and original images. See frequency plots in Figure 9 for original test images, Figure 10 for blurred test images, Figure 13 for restored clock image and Figure 14 for restored Madonna image. Let us consider the restored images. We notice that better restoration occurs as we move across a row and move down a column. Movement along a row takes us from the use of regular data set to clustering and static gating structure to clustering and dynamic gating structure for the same type of learning system (row 1 is NN etc.). On
15
the other hand, movement down a column takes us from one type of learning system to another while keeping the network structure xed. For example, column 2 shows results for clustering and static gating structure for NN, RBF and PPLN. We observe that the PPLN even recovers image information that is not so clear in the original image. Compare the area with a necklace in the original and the PPLN restored Madonna image. It is an interesting nding. The restored images show that a committee of neural networks with PPLN as the learning system and gating with dynamic structure is the best hybrid system. In addition to the images, we consider the frequency plots for a better understanding of the restoration process. The plots show the recovery of frequency components in a more clear manner. Notice the front quarter in the frequency plots. If we x the experiment i.e., focus on a column and consider various learning systems we notice that the output from PPLN shows the most perceptible recovery of frequency components followed by the outputs from RBF and NN. On the other hand, for a given learning system, committee machine with dynamic gating performs the best. We arrive at the same conclusion as the previous paragraph from the analysis of the frequency data.
7 Conclusions From the images and the corresponding frequency plots, (possibly frequency plots show the dierence more clearly,) we observe the role of the activation function and the committee of neural networks with static and dynamic gating structures. We keep all other parameters of the three learning networks same while the form of the activation function is increasingly made more complex. We can rank the three learning networks as per their performance on a set of large number of images: 1. Projection Pursuit Learning Network (best performance, most complex activation function) 2. Radial-Basis-Function Network (intermediate performance, activation function with moderate complexity) 3. Back-Propagation Network (worst performance, simplest activation function) Our second observation is on clustering and the use of the gating structure. A proper grouping of the data during the training phase has important eect on the network weights. Such networks show improved performance than that of the corresponding networks which do not receive clustered data. Furthermore, when the role of the input is extended beyond the initial training process and it is allowed to participate in producing the overall output of the network, we observe marked improvement in the system performance. This paper clearly demonstrates the superior performance of the committee of neural networks with dynamic gating structure in deblurring and enhancement of images. We conclude with a comment on the overall performance of the system. The resulting images are beautifully restored and exceptionally clear. We attribute this to a number of reasons. Note that, the blur function used for training is dierent from the blur function
16
used in testing. We believe that the following aspects of our design play key role in producing quality restored images. A committee of machines cast vote to generate part of the output. We use input in reshaping the output of the committee machines. This reduces the eects of some of the overly restored high frequency components. The generalization capability of individual network has been achieved through proper training
References [1] M. Basu and M. Su. \Deblurring Image using the projection pursuit learning network," Proc. of the International Joint Conference on Neural Networks, July 10-16, Washington, D.C. 1999. [2] L. M. Kennedy and M. Basu, \Application of projection pursuit learning to boundary detection and deblurring in images," Pattern Recognition, Vol. 33, No. 10, 2000. [3] R.A.Hummel, B.B.Kimia and S.W. Zucker, \Deblurring the Gaussian blur," Comput. Vision, Graphics & Image Process., Vol. 38, pp. 66-80, 1987. [4] J-N. Hwang, S-R. Lay, M. Maechler, R. D. Martin & J. Schimert. \Regression Modeling in Back-Propagation and Projection Pursuit Learning", IEEE Transactions on Neural Networks, Vol. 5. No. 3, pp 342-353, 1994. [5] J-N. Hwang, S-S. You, S-R. Lay & I-C Jou. \The Cascade-Correlation Learning: A Projection Pursuit Learning Perspective," IEEE Trans. on Neural Networks, Vol. 7, No. 2, pp.278-289 , 1996. [6] R. A. Jacobs and M. I. Jordan, \Adaptive mixtures of local experts," Neural Computation, vol. 3, pp. 79-87, 1991. [7] M. I. Jordan and L. Xu, \Convergence results for the EM approach to mixtures of expert architectures," Neural Networks, vol. 8, pp. 1409-1431, 1995. [8] C. M. Jubien and M. E. Jernigan, \A neural network for deblurring an image," IEEE Paci c Rim Conf. Comm., Comput. and Signal Pro., pp. 457-460, 1991. [9] J. B. Martens, \Deblurring digital images by means of polynomial transforms," CVGIP, Vol. 50, pp. 157-176, 1990. [10] Statistical Science Inc., S-Plus Guide to Statistical and Mathematical Analysis, Version 3.2, Seattle, WA. [11] H. Tang and L. W. Cahill, \A new approach for the restoration of noisy blurred images," Proc. IEEE Intl. Symp. Circuits and Systems, vol. 1, pp. 520-523, 1991. [12] M.Vairy and Y.V.Venkatesh, \Deblurring Gaussian blur using a wavelet array transform," Pattern Recognition, Vol.28, no7, pp.965-976, 1995. [13] Y. Zhao and C. G. Atkeson, \Implementing projection pursuit learning," IEEE Trans. Neural Networks, vol. 7, no. 2, pp. 362-373, 1996.
17
Figure 6: Image used for training
Original
Original
(a) Madonna
(b) Clock
Figure 7: Original test image
18
Blurred Image
Blurred Image
(a) Madonna blurred
(b) Clock blurred
Figure 8: Image used for testing
60
150
50 40
100
30 20
50
10 0 0
0 0 20
10
0 40
40 40
60 50
80 100
20
30
60
80
0
20
20 40
60
80 60
100
100
(a) Madonna
(b) Clock
Figure 9: Frequency plots for Fig:7
19
1.5
70 60 50
1 40 30 0.5 20 10 0 0
0 0 20
10
0 40 60
40 40
60 50
80 100
20
30
60
80
0
20
20 40
80 60
100
100
(a) Madonna
(b) Clock
Figure 10: Frequency plots for Fig:8
20
(i) NN
(ii) RBF
(iii) PPLN (a) Experiment 1
(b) Experiment 2
Figure 11: Restored clock image 21
(c) Experiment 3
(i) NN
(ii) RBF
(iii) PPLN (a) Experiment 1
(b) Experiment 2
Figure 12: Restored Madonna image 22
(c) Experiment 3
140
140
160
120
120
140
100
100
80
80
60
60
40
40
20
20
0 0
0 0
120 100 80 60
10
20 0 0 10
0
20
40
20
20
30 40
60 50
80 60
100
40 40
60 50
80 60
20
30
40 40
60 50
0
20
20
30
40
10
0
80 60
100
100
(i) NN 180
120
140
100
120
160 140
100 120
80 80
100 60 80
60
60
40 40
40 20
20
20 0 0
0 0 10
0
20
0 0 10
40
60 50
60 50
80 60
100
40 40
60 50
80 60
20
30
40 40
0
20
20
30
40
10
0
20
20
30
80 60
100
100
(ii) RBF 250
400
250
350 200
200 300 250
150
150
200 100
100
150 100
50
50 50
0 0
0 0 10
0
20
20
30
40 40
60 50
80 60
100
0 0 10
0
20
20
30
40 40
60 50
80 60
100
10
0
20
20
30
40 40
60 50
80 60
100
(iii) PPLN (a) Experiment 1
(b) Experiment 2
Figure 13: Restored clock image 23
(c) Experiment 3
4.5
5
10
4
8
3
6
2
4
1
2
4 3.5 3 2.5 2 1.5 1 0.5 0 0
0 0 20
0 0 20
0 40
40
20
40
60
60
80
60
80
80 100
100
20 40
60
60
80
80 100
0 40
20
40
60
20
0
80 100
100
100
(i) NN 18
5
18
16
16 4
14
14
12
12 3
10 8
10 8
2
6
6
4
1
4
0 0
0 0
2
2
0 0 20
20
0 40 60
60
60
80
80 100
100
20 40
60
60
80
80 100
0 40
20 40
60
80
20
0 40
20 40
80 100
100
100
(ii) RBF 25
20
1200
1400
1000
1200 1000
800 15
800 600 600
10 400
400 5
200
0 0
200
0 0 20
0 40
20 40
60 60
80
80 100
100
0 0 20
0 40
20
20 40
60 60
80
80 100
100
0 40
20 40
60 60
80
80 100
100
(iii) PPLN (a) Experiment 1
(b) Experiment 2
Figure 14: Restored Madonna image 24
(c) Experiment 3
Min Su received her M.E degree in electrical engineering from the City College of the City University of New York in 1997. She is a Ph.D candidate in the same Department, City University of New York. Her current research interests include neural networks, learning system, pattern recognition, data classi cation, biocomputation and image quality measurment. Mitra Basu is an Associate Professor in the Electrical Engineering Department at
the City College of the City University of New York. Her current research interests include pattern recognition, learning systems and biocomputation.
25