Combining Neural Networks, Fuzzy Sets, and Evidence Theory Based Approaches for Analysing Colour Images Antanas Verikas(”2),Kerstin Malmqvist“’, Marija Bacauskiene(2) (”Centre for Imaging Science and Technologies, Halmstad University, Box 823, S-301 18 Halmstad, Sweden E-mail:
[email protected] (2)Department of Applied Electronics, Kaunas University of Technology, Studentu 50, 303 1, Kaunas, Lithuania
Abstract This paperpresents an approach to determining colours of specks in an image taken from a pulp sample. The task is solved through colour classification by an artijicial neural network. The network is trained using possibilistic target values. The problem of post-processing of a pixelwise-classified image is addressed from the point of view of the Detnpster-Shafer theory of evidence. Each neighbour of a pixel being analysed is considered as an item of evidence supporting particular hypotheses regarding ihe class label of that pixel. The experiments pe$ormed have shown that the colour classification results correspond well wiih the human perception of colours of the specks.
1. Introduction This paper concerns application of colour image processing and artificial neural network based techniques in papermaking industry for monitoring the de-inking process. The aim of the work is to estimate the amount of specks of different colours in an image of a pulp being recycled. Fig. 1 (middle) displays an example of a grey scale representation of such an image. A need of such an analysis is twofold. Firstly, in the “black-white’’ analysis used today, the amount of specks in the pulp may be underestimated. Secondly, different colours may have different bleachability with different types of bleaching chemicals. Therefore, optimisation of the de-inking process may be possible knowing the chromatic content of the pulp. The estimate is performed by classifying colours of pixels of the image taken from a pulp sample. When solving a classification problem there are several choices of strategy such as statistical, neural network or fuzzy sets. A combination of neural networks, fuzzy sets, and the evidence theory based approaches has been adopted in this work. Artificial neural networks have proved themselves to be capable of representing complex classification or mapping functions. They discover the representations using powerful learning algorithms. A single hidden layer perceptron is the classification network used in this study. A problem of training data labelling should be solved when designing the colour classification neural network. It is not a trivial task to assign labels for each pixel of a pulp image. Besides, manual labelling is a very tedious procedure. In this work, we consider the colour classes as fuzzy sets and regard the membership degrees of pixels to the sets as the soft target values. The automatically determined soft target values are then used to train the classification neural network. The Dempster-Shafer theory of evidence has been used by several authors in different applications as a tool for representing and combining items of evidences. Applications can be found in fusion of outputs of several classifiers [ 6 ] ,analysis of medical images [l], generalisation of the k,, classifier [3], and object detection [2]. We use the evidence theory to post-process the pixelwise-classified image obtained from the colour classification neural network. We consider each neighbour of a pixel being analysed as an item of evidence supporting particular hypotheses regarding the class label of that pixel. The items of evidence provided by several neighbours are combined through the rule of orthogonal sum.
2. Tools and Data The equipment we use consists of a three-chip CCD colour camera, an X-Y board for scanning a pulp sample, a frame-grabber, a PC, and software. The resolution used was such that an image consisting of 512x512 pixels was recorded from an area of approximately 1.Ox1.O mm2 . A vast amount of data can be collected in our application. Computation time, however, severely limits the amount of data that can be used for designing the classification network. In order to collect more consistent training sets, we perform condensation of training data by using a clustering technique. A set of pixels obtained from a colour image is condensed into a predefined number of clusters. Centres of the clusters thus obtained--“generalised pixels” make up the training and validation sets [8].
297 0-7695-0619-4/00 $10.000 2000 IEEE
3. Colour Space We use the L'u'v' colour space [ l l ] in all procedures that involve distance calculations. The Euclidean distance measure can be used to measure the distance (AE) between the two points representing the colours in the colour +(A.*)' + ( A V * ) ~ .] However, ~'~ to speed up the classification when classifying colours by the space: AEiv = [(U*)' neural network, we use the colour space, which calculates the colour difference signals f , f, , and f , . The variables
,
f , , f, , and f , are obtained from R, G, and B by performing a linear transform of the ( R , G , B ) vector [ 8 , 9 ] . The variables are given by: f,= R + G + B , f , = R - B , and f3 = R - 2G + B .
4. The Dempster-Shafer Theory of Evidence Let 0 be a finite set of mutually exclusive and exhaustive atomic hypotheses about some problem domain. The set
O={B,,B,,...,BQ] is called the frame of discernment [ 7 ] . Let 2' m :2'
+ [OJ]
denote the power set of 0 . A function
is called a basic probability assignment if m(0) = 0 and x A c E ) m ( A = ) 1.
Whereas the probability theory assigns probabilities to atomic hypotheses Bi , a basic probability number m ( A ) represents one's belief in a not necessarily atomic hypothesis A . For a compound hypothesis A f Bj , m ( A ) reflects our ignorance since it is a measure of belief that we are willing to commit exactly to A . The belief cannot be further subdivided amongst the subsets of A and is assigned to A at the expense of support m(Bi ) of atomic hypotheses 8;. A support committed to a compound hypothesis A should also be committed to any hypotheses it implies. Therefore, to obtain the total belief in A , we must add to m ( A ) the basic probability numbers m ( B ) for all subsets B of A . If m is a basic probability assignment, then a function Bel :2'
-+ [OJ]
The subsets B of 0 for which m ( B ) > 0 are called the focal elements of the belief function. The union of the focal elements is called the core of the belief function. The belief functions having only one focal element in addition to 0 are called simple support functions. Bel is a simple support function if there exists a focal element F c_ 0 , B e l ( O ) = I and Bel(A)=
s, if F E A and A # @ 0, otherwise
where s is degree of support of Bel . The simple support functions are very good at representing evidence. Given two basic probability assignments m, and m, associated with Bel, and Bel, induced by two different sources of information over the same frame of discernment 0 can be combined into a single belief function if their cores are not disjoint. The Dempster's rule of combination or orthogonal sum is a convenient way for performing such a combination. The orthogonal sum m = m, 0 m, , m :2' + [0,1] is defined as:
where The function m is a basic probability assignment. The core of Bel given by m equals the intersection of the cores of Bel, and B e l , . Let F be a focal element for two simple support functions Bel, and Bel, with degrees of support s, and s 2 , respectively. If Bel = Bel, 0 Bel, and m is associated with Bel then m ( F ) = 1- (1- s,)(1- s2 ) ,
m ( 0 )= (1 - s,)(1- s 2) , and m ( A ) = 0, V A E 2' \ [ F , 0). W e use simple support functions in our application.
5. Training Data Labelling A common approach to determining the teaching signals is to assign a crisp target value to each training sample in the learning set X' =[(x',c1),(x2,c2) ...(x N , c N ) ) , where X" E R K is the n t h data sample, C" E 1 ={1,2, . . . , Q } is the
298
class label, and Q is the number of classes. The target values t' ,...,t N are encoded according to the scheme 1-of-Q, i.e. t; = 1 , if
C"
= k and t; = 0 , otherwise.
The approach adopted in this paper assumes that the target values are encoded according to the scheme Q-of-Q, i.e. the membership of each pattern in every class is considered. The membership degrees are used as the target values to train the network.
5.1. Procedure for determining target values The target values are determined through the following seven-step procedure [IO]. 1. Evolve a globally ordered two-dimensional self-organising map [4] using U * and v* components of the generalised pixels. 2. Divide the map into Q-1 regions, where Q is the number of colour classes of the specks. The same achromatic region represents the "Black" and "White " colour classes. 3. Determine membership functions for each of the Q colour classes by analysing the regions of the 2 0 map. 4. Evolve a globally ordered one-dimensional map for the L* component of the generalised pixels mapped onto the achromatic region of the 2 0 map. 5 . For each of the Q colour classes determine membership functions on the axis of the variable L* using information from the 2 0 and I D maps. 6. Aggregate the membership functions obtained in the steps 3 and 5. 7 . Compute the membership degrees of each generalised pixel in every class using the aggregated membership functions. Normalise the membership degrees and increase the contrast within the set of the membership degrees. Use the obtained membership degrees as target values for training the network.
5.2. Membership functions on the 2 0 map Let's assume that the j th class is represented by N J reference patterns--weight vectors. Let d(x",w',) be the distance between the n th input pixel weight vector to the pixel
X"
X"
and the i th weight vector of the j th class. Suppose that w: is the closest
amongst all the weight vectors representing the class C, : k = a r g d .min 2 ...._ N, d(x",w',), j = 1,2,.,.,Q
We consider the colour classes as fuzzy sets. The membership degree of the pixel by the so-called
~t-
function. The n - function, being in the range [0,1], for
i
2(1- II X"
X" E
- w; II/~;> 2 , for y; /2
xc,(x")=ne,(x",w~.,y,")= 1-2(11x"-wt. Il/y;)2,for
X"
R
in the fuzzy set C, is then given is defined to be:
1 1 xn 1 - w:. 1157;
OIIIX"
-wk.J IIS #//2 J
(6)
0, otherwise
with y," > 0 being the radius of the which
- function, I1x n - w t 11 is the Euclidean norm, and w; is the central point at
~t
ne,(wt.,wt , y," ) = 1 . The radii of the functions are found by analysing the 2 0 map [ 101.
5.3. Membership functions on the axis of the variable L* Membership functions on the axis of the CIE lightness variable L* are defined as follows [IO]:
A ~(x") , =
/{1 + 1~ - L;l/qa, },
j
E
C/"atic
(7)
where the positive constants p and q are the fuzzy generators controlling the amount of fuzziness in the set, L*" is the lightness value of the pixel X" , L;. is the reference value of the lightness variable L* for thejth class, oj is the standard deviation of the lightness variable in the jth class, and Chromatic is the set of chromatic colour classes. Membership functions for the "Black" and "White" colour classes are defined in a similar manner using information from the ID map of the variable L* [lo].
299
5.4. Aggregating the membership functions The aggregated membership functions tyC,( x " ) are obtained by applying the T-norm operator to the functions ,yc ( x " ) and
A,, ( x " ) .
For example, ' I / ~ ~ , , ( X " )= T ( X ~ ~ ~ ( X " ) , / Z R ~ where ~ ( X " )T} ,is the T-norm operator. The
algebraic product has been adopted as the T-norm operator in this application. Next, the membership values are normalised to occupy the unit interval [0,1] :
-v,,,,~), j=lA..,Q
~c,(~')=(~C,(~")-~m,n)/(~m,,
where
w,,,
= min y/c,(x") 1=1.2,
.Q
and ymax = max
,Q
1=1,2.
wc, (x").
(8)
This implies that the membership assignment is
possibilistic. Finally, in the last step, the contrast within the set of membership values p C , ( x " )is increased, i.e. the ambiguity in making a decision about the targets is decreased:
2 [ p c ,( x " ) ] ~for , 0 I pC,( x " ) 5 0.5 TIC,
(x") =
, j = 1 , 2 ,..., Q 1 - 2[1- p c
(x")]'
otherwise
(9)
The q c ( x ) s are the target values used to train the network.
6. Error Function We use the error back propagation algorithm to train the network. The network is trained by minimising the weighted sum squared error augmented with the additional regularisation term
where Q is the number of classes, N is the number of training samples, N , is the number of weights in the network, w, is the i th weight of the network,
pattern. The error weight
V"
p
is the regularisation coefficient, and
V"
is the error weight for the nth input
is given by v" = N/Qcard{C,}
( 1 1)
k = arg max ($, }
(12)
i=I.
.Q
where card( CA} is the cardinality of the fuzzy set CA, card( C, } =
Er=, $I
.
The following motivation can be given for the use of the error weight V " . W e assume that the occurrence of each class in operation is equally likely but the cardinalities of the fuzzy sets representing the training data from the different classes may differ significantly. Networks trained with a I-of-Q target coding bias strongly in favour of classes with largest membership. To reduce the biasing, the error weight V" was included into the error function.
7. The Evidence Theory Based Post-Processing 7.1. Defining a Basic Probability Assignment Let C={C,,C2,..,,CQ}be a set of Q colour classes - the frame of discernment of the application. Assume that x" E R K is a pixel of interest and X = ( x 1 , x Z..., , x N } is a set of N nearest neighbours of the pixel. The post-
processing window defines the members of the set. Associated with each member of the set are a class label L = {I,% ...,Q ] and the degree of certainty ,U E [0, I], with which the label was assigned to the member during the classification process based on the colour vector representing the member. W e calculate p in the following way: p = o k - max
o j , k =arg max o j j = I ....,Q
j = l ,...,Q, j#k
where o j is the output signal of the j th neuron in the output layer of the colour classification neural network. For any x'
E
X I including the x" itself, the knowledge that Li= q can be considered as a support of the
hypothesis that x" belongs to C',. However, the belief in hypothesis that L" = q contains some degree of uncertainty.
300
We find it reasonable to assume that the degree of uncertainty increases with the decrease of
,U/
or increase of the
distance d ( v ,j ) between pixel v and pixel j . Since we use the simple support functions for representing evidence, we distribute the evidence, obtained from the knowledge, between C, and the frame of discernment C . Let us assume that L" = p . The following basic probability assignment m'I is then used for representing the evidence: m v J ( C , ) = P , , m"'(C)=l-pJ
(14)
m " / ( A ) = O QAE 2@\{C,,C)
where T stands for the T-norm operator, that 0k P, < 1 . The coefficient
p,,,
P, = p,,,Tbax(,u,, E ) , 0 < P,,, < 1 , a,, > 0 , and
(15) (16)
E
is a small positive constant. From (16) follows
expresses our initial certainty that a pixel assigned to class q in the initial
classification process has value for post-processing a pixel assigned to class p . The use of ap specific for each class indicates that the influence of the distance d ( v ,j ) may depend on the class of a pixel being analysed. For each of the N neighbours of pixel x" from the post-processing window the basic probability assignment is defined. Next the basic probability assignments are combined using the Dempster's rule of combination.
7.2. Combining Evidence Let us denote by N: the number of neighbours of x" belonging to the class C,
. Assuming that Nci # 0 the result of
the Combination of the N: basic probability assignments is given by
(1 - P,)
m; (C, = 1 - J-)&/'
P,,
m p 3 = J-J,=,(l-
If N i = O , t h e n m,"(C,)=O and m;(C)=l. Combining the basic probability assignments for all the classes we obtain: m"(C,) = mt(C,)n,,,m,'(C)/K,
q = I , ...,Q
m: ( c ) / K
mv(C) = where the normalising factor K is given by
K
=
cQ