REMOVAL OF IMPULSE NOISE BY SELECTIVE FILTERING Ralph Sucher Institut fur Nachrichtentechnik und Hochfrequenztechnik Technische Universitat Wien Gusshausstrasse 25/389, A{1040 Vienna, Austria Email:
[email protected] ABSTRACT
In this paper we present a novel structure for removal of impulse noise which is based on a combination of impulse detection and nonlinear ltering. The impulse detector is realized as a neural network which will be shown to possess a very simple structure. This means that, compared to traditional approaches, a signi cant gain in performance can be achieved with considerably smaller complexity. Further, we show that the basic system is not restricted to the removal of impulse noise only, and present a generalization for arbitrary noise distributions.
1. INTRODUCTION The suppression of noise is one of the most important tasks in signal processing. Frequently, linear techniques are used, because linear lters are easy to implement and design. Furthermore, they are optimal among the class of all ltering operations when the noise is additive and Gaussian. However, if these assumptions are not satis ed, e.g. in the presence of impulse noise, the performance of linear lters deteriorates severely. To overcome these shortcomings, a large number of nonlinear methods has been presented in literature. The most popular nonlinear lter is the median lter; it is computationally ecient and has proved extremely successful for removing noise of impulsive nature. Clearly, it suffers from the fact that with increasing window length the signal details become blurred. Therefore, a lot of generalizations have been developed, e.g. order statistic lters and lters based on threshold decomposition (see [1] for a list of references). All these lters have in common that their design can be matched to the signal and noise statistics. Thereby, they gain performance over the median lter, but the complexity increases considerably. However, if we restrict ourselves to one type of noise, we can improve the performance with much smaller complexity.
2. SELECTIVE FILTERING One of the main properties of the lters mentioned above is that all input samples xi are unconditionally aected by the ltering process. In the presence of impulse noise, this approach is not optimal since in contrast to continuous noise distributions, only certain samples of the original signal si are corrupted and the others remain unchanged: si + h with probability p : (1) xi = si + ni = si otherwise The noise ni is characterized by the height h of the impulses and their probability of occurrence p (for the moment, we assume only one type of impulses; this restriction will be dropped later). Since p 1 in most cases, it is useful to lter an input sample only if it is corrupted; as a result, we reduce blurring of the signal. Thus, the ltering process consists of two parts [2]: 1. Decide whether the input sample considered (the sample in the middle of the observation window) is corrupted by an impulse. 2. If so, replace it by a value calculated from the other samples inside the window, otherwise pass it to the output unprocessed. This principle is shown in gure 1. It is important to note that our impulse detector makes a decision only about one sample in the observation window, and that this sample is processed immediately afterwards. This leads to a small complexity but since all samples inside the window are used for interpolation, a nonlinear lter should be applied in order to suppress other possibly existing impulses (see section 4).
3. IMPULSE DETECTION The task of impulse detection is a simple classi cation problem and can therefore be solved using neural networks (The usage of median lters [3] or generalized
- Impulse Detection xi
q
- Nonlinear Filter
? H - s^i a
q
a
Figure 1: System setup for selective ltering of signals corrupted by impulse noise
mean lters [4] for impulse detection turned out to be impractical due to the sensitivity of the threshold). For convenience, we will brie y present the basic principles leading to a simple and ecient network structure.
3.1. Detector Structure
By construction, an impulse can only be detected if the total number of impulses inside an observation window is less than half of the window length. To visualize the working method of a neural impulse detector let us rst assume a window length of n = 3. Then there exist 8 types of pattern vectors xi = (x1 ; x2; x3)T depending on the number and location of the impulses inside the window. These 8 cases result in 7 dierent clusters in the pattern space (x1 ; x2; x3), as shown in gure 2. x1, x2, and x3 (un)corrupted
only x3 corrupted
x1 and x3 corrupted
x2 and x3 corrupted
200 150 x3
100 50 0 0 50 100 only x1 corrupted
100 150 x2
150 200
200
0 50 only x2 corrupted
x1 and x2 corrupted
x1
Figure 2: Pattern space Here the original signal si consists of 64 horizontal lines a 64 pixels of a typical 8-bit gray scale image and the impulses possess a height h = 100 and a probability of occurrence p = 0:1. Since x2 is the sample we want to make a decision about, the patterns with corrupted x2
are marked as `x' and the other ones as `o'. As it can be seen, most of the `x' and `o' can be geometrically separated. For this purpose we have to put a plane between the clusters of `x' and `o', w1x1 + w2x2 + w3 x3 + = 0; (2) and then we declare x2 as corrupted if xi lies on the right hand side of the plane and as correct otherwise. This is the basic input/output relationship of a single neuron (see gure 3). x1
q
x2
q
x3
q
w1 - ? S w2 S - ? -Sw +? z- sign(z ) - y w3 7 - ? +1: x2 ok y= ?1: x2 corrupted j
j
j
j
Figure 3: Single neuron as impulse detector Clearly, in order to reduce the number of wrong decisions, more planes can be used to separate the clusters containing corrupted and uncorrupted samples, leading to multilayer neural networks. One important result of this paper is that, in general, this is not necessary, which is especially surprising for larger window sizes. Speci cally, this means that the cluster containing uncorrupted x(n+1)=2 and all clusters containing less than (n ? 1)=2 impulses can in principle (i.e. if x h) be separated by one (n ? 1)-dimensional hyperplane realized with one single neuron. Basically, this is achieved by strongly weighting x(n+1)=2 compared to the other samples: if it is corrupted, at least (n ? 1)=2 samples also have to be corrupted in order that the decision is wrong (no impulse detected); if, on the other side, x(n+1)=2 is correct, at least (n +1)=2 other samples have to be corrupted in order that an impulse is (wrongly) detected. Thus, only if there are more types of impulses (e.g. salt and pepper noise) more layers are required.
3.2. Training
The computation of the parameters of the neural network (the weights w = (w1; . . . ; wn)T and the threshold ) is a nonlinear optimization problem that can be solved recursively. To a rst order of approximation, the location of the boundary plane can be determined using simple geometric calculations. To optimize the parameters with respect to the statistics of the signals, we need to train the network accordingly. Two standard algorithms have been used (supervised learning):
perceptron rule with fa (z ) = sign(z ) as activation
wi
+1
i+1
= =
wi + eixi
i + ei
0.1
(3) (4)
delta rule with fa (z ) = tanh(z ) as activation function:
wi
+1
i+1
= =
wi + eifa(zi )xi 0
i + ei fa (zi ) 0
(5) (6)
where
= w x + i (7) ei = di ? fa (zi ): (8) The step size determines the convergence speed and the nal error of the algorithm. Due to the `soft' nonlinearity of the activation function in the second case, the learning behaviour can be improved by applying a gradient algorithm. It is important to note that the network is trained to minimize the number of wrong decisions (i.e. di = 1) and not to minimize the mean squared error of the reconstructed signal s^i . The latter would result in a dierent solution since an undetected impulse causes a much greater error than a wrongly detected one. Practically what we want is to remove as many impulses as possible without blurring the signal, for which the rst approach is better suited. To measure the learning process of the network, it is useful to observe the error rate ri (number of wrong decisions). As an estimate we use ri = e~i + (1 ? )ri?1 with r0 = 0 (9) where e~i = 1 if a decision is wrong and zero otherwise. Figure 4 shows a typical sequence of error rates for = 0:002. In this simulation, the input signal was a part (100 100 pixels) of the corrupted image in gure 6b and the window size was 3 3. The step size has been chosen as = 0:05 to guarantee stability and to achieve a good compromise between convergence speed and the nal error. With the weights and the threshold obtained from this simulation, the previous theoretical considerations could be veri ed. zi
T i i
4. NONLINEAR FILTERING If an impulse is detected, the corresponding sample must be replaced by a value calculated from the other samples inside the observation window. Since there could be more than one impulse inside the window,
0.08
error rate
function:
0.06
0.04
0.02
0 0
2000
4000 6000 time index
8000
10000
Figure 4: Error rate of the impulse detector the lter should be nonlinear to discard these impulses from the reconstruction process. For this purpose, the simplest nonlinear lter is again the median lter. But also several others of the lters mentioned above have been tested. In general, lters with greater complexity result in a lower reconstruction error, but since only corrupted samples are replaced, the lter has only a minor eect on the total error produced by this system. This usually happens as long as the impulse probability is small, say p 0:1. Then the total error is primarily determined by the impulse detector as can be seen from the simulation results given in section 6. Clearly, the reconstruction could be accomplished by a neural network as well [5], but this network would be much more complex than our combination presented here.
5. GENERALIZATION In this section we will brie y show how our system can be generalized for removing arbitrary kinds of noise. For this purpose, we write the input/output relationship of our system in the form yi ; b = 0 (10) s^i = yi + b(xi ? yi ) = xi ; b = 1 where yi is the output signal of the nonlinear lter and b the parameter controlled by the impulse detector (b = 0 if an impulse is detected and b = 1 otherwise). If the parameter b is matched to the characteristics of signal and noise, we get a two-component lter (see gure 5) which has already been reported in literature [3] [6]. The choice and analysis of the adaptation algorithm is a task currently under investigation and results will be presented elsewhere.
xi
q
- Nonlinear Filter
- + - s^i 6 -{ +? - + 6b
yi
j
j
- Adaptation Algorithm
j
Figure 5: Two-component ltering 6. SIMULATION RESULTS In order to visualize the dierence in performance between standard nonlinear ltering and selective ltering we simulate the restoration of images corrupted with impulse noise. Figures 6 (a) and (b) show the original and the corrupted image (resolution: 256 256 8bit). The probability of impulses is 10% and their height was set to h = 100. As usual, the neural network was trained using a part of the original and the noisy image (100 100 pixels). The parameters obtained from the training were then used to lter the noisy image. Figures 6 (c) and (d) show the images restored by the 3 3 standard and selective median lter, respectively. From table 1 one can also see the superior performance of the selective lter. For comparison, the performance of an Ll- lter is shown as well. Method Median lter Ll- lter Selective median lter Selective Ll- lter
MSE 77.6538 50.5308 37.6511 37.2591
MAE 5.0468 4.6956 2.5408 3.0196
Table 1: Results of image restoration example (image resolution: 256 256 8bit, window size: 3 3, impulse height: h = 100, probability of impulses: p = 0:1) 7. CONCLUSIONS A novel structure for removal of impulse noise has been proposed. The input signal is ltered depending on the decision of an impulse detector which is realized as a neural network. The network has been shown to possess a simple structure and is trained in supervised mode. Simulations indicate that the performance gain over standard methods is relatively insensitive to the choice of the network parameters and of the nonlinear lter.
8. REFERENCES [1] I. Pitas and A.N. Venetsanopoulos. Nonlinear Digital Filters. Kluwer, 1990. [2] R.E. Graham. Snow removal { a noise-stripping process for picture signals. IRE Transactions on Information Theory, IT-8(2):129{144, Feb. 1962. [3] R. Bernstein. Adaptive nonlinear lters for simultaneous removal of dierent kinds of noise in images. IEEE Transactions on Circuits and Systems, CAS34(11):1275{1291, Nov. 1987. [4] A. Kundu, S.K. Mitra, and P.P. Vaidyanathan. Application of two-dimensional generalized mean ltering for removal of impulse noises from images. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-32(3):600{609, June 1984.
[5] Y.T. Zhou, R. Chellappa, A. Vaid, and B.K. Jenkins. Image restauration using a neural network.
IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-36(7):1141{1151, July 1988.
[6] X.Z. Sun and A.N. Venetsanopoulos. Adaptive schemes for noise ltering and edge detection by use of local statistics. IEEE Transactions on Circuits and Systems, CAS-35(1):57{69, Jan. 1988.
Figure 6: Removal of impulses from the Lenna image: (a) Original image 256 256 8bit. (b) Image corrupted
with 10% impulses of height h = 100. (c) Median- ltered image from (b) with 3 3 square window. (d) Selective median- ltered image from (b) with 3 3 square window.
(a)
(b)
(c)
(d)