Learning Associative Memories by Error Backpropagation

Report 5 Downloads 53 Views
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 3, MARCH 2011

347

Learning Associative Memories by Error Backpropagation Pengsheng Zheng, Jianxiong Zhang, and Wansheng Tang

Abstract— In this paper, a method for the design of Hopfield networks, bidirectional and multidirectional associative memories with asymmetric connections, is proposed. The given patterns can be assigned as locally asymptotically stable equilibria of the network by training a single-layer feedforward network. It is shown that the robustness in respect to acceptable noise in the input of the constructed networks is enhanced as the memory dimension increases and weakened as the number of the stored patterns grows. More important is that the remembered patterns are not necessarily of binary forms. Neural associative memories for storing gray-level images are constructed based on the proposed method. Numerical simulations show that the proposed method is efficient for the design of Hopfield-type recurrent neural networks. Index Terms— Bidirectional associative memory, error backpropagation, gray-level images, Hopfield network, multidirectional associative memory.

I. I NTRODUCTION

R

ECURRENT neural networks are important artificial neural network models. The dynamical behaviors and applications of recurrently connected neural networks have been extensively studied in the past decades. For example, primitive neural models of concept formation and characteristics of the learning rules were studied in [1]. Hopfield combined the concepts of energy function and associative memory, and found that the recurrent networks could be considered as energy decreasing systems [2]. The bidirectional associative memory (BAM) model and its retrieval properties were studied in [3]. Graves et al. studied the unconstrained handwriting recognition problem based on a novel type of recurrent neural networks [4]. Retrieval properties of second-order and higher order BAMs were discussed in [5]. Feng and Plamondon studied the dynamic behaviors of the BAM networks with time delays by using a Lyapunov function method [6]. Recently, some significant results on the stability analysis of the recurrent neural networks were proposed in [7]–[11]. Xu et al. obtained some sufficiency conditions for the stability of discrete-time Hopfield neural networks (HNNs) and BAMs in [12] and [13], respectively. An important application of the recurrent neural networks is associative memory. The design of recurrent networks has

Manuscript received November 27, 2009; revised September 26, 2010; accepted December 4, 2010. Date of publication December 23, 2010; date of current version March 2, 2011. This work was supported in part by the National Natural Science Foundation of China, under Grant 61004015. The authors are with the Institute of Systems Engineering, Tianjin University, Tianjin 300072, China (e-mail: [email protected]; jx.zhang@ yahoo.com.cn; [email protected]). Digital Object Identifier 10.1109/TNN.2010.2099239

always been an interesting problem to researchers. The symmetric HNNs can be constructed by using Hebb rule or the pseudoinverse rule. However, the asymmetric network appears to be more biologically plausible. Dealing with asymmetric networks, a synthesis procedure for designing nonsymmetric cellular neural networks was proposed in [14]. Recently, some pioneering works on the design of discrete-time asymmetric networks have been introduced in [15]–[18]. However, they all focused on the memories of bipolar forms. As we know, the bipolar patterns only constitute a part of the memories that the brain can remember. Much of memories related to our everyday life are color images. Costantini et al. proposed a design procedure for neural associative memories storing grayscale images using multilayer neural networks [19]. Vazquez and Sossa proposed three color image associative memories by using hetero associative memory, morphological associative memory, and the dynamical synapses method in [20]–[22], respectively. Oh and Zak proposed an image recall system using the generalized Brain-State-in-a-Box [23]. Color image retrieval in a class of sparsely connected auto-associative morphological memories was studied in [24]. In this paper, we show that HNNs can correctly remember and retrieve the color images. Here, we focus on the design of continuous-time HNNs, BAMs, and multidirectional associative memories (MAMs) with asymmetric connections. Based on the local stability result proposed in [25] and the synthesis approach introduced in [26] and [27], a single-layer feedforward network (SFN) method is developed, by which a given set of patterns can be assigned as locally asymptotically stable equilibria by training the SFN. Statistical analysis shows that the robustness in respect of acceptable noise in the input of the constructed HNNs is weakened as the number of the stored patterns increases and enhanced as the memory dimension grows. More importantly, the remembered patterns are not necessarily of the binary forms. By virtue of proposed method, HNNs are constructed for storing and retrieving 256 gray-level images. Numerical simulations show that the proposed method is efficient for the design of Hopfield-type recurrent neural networks. II. A SYMMETRIC HNN S A. Model Description and System Design The Hopfield network, which is abstracted from the brain dynamics, is an important tool for memory retrieval. Let us consider an asymmetric Hopfield network of the form

1045–9227/$26.00 © 2010 IEEE

X˙ = −AX + W F(X) + 

(1)

348

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 3, MARCH 2011

b1

where X = [x 1 , x 2 , . . . , x n ]T ∈ Rn is the neuron state vector,  = [θ1 , θ2 , . . . , θn ]T ∈ Rn is a real constant vector, A = diag[a1 , a2 , . . . , an ] ∈ Rn×n is a positive definite diagonal matrix, W ∈ Rn×n is a nonzero asymmetric matrix which denotes the synaptic weights, a continuous function F: Rn → Rn with F(0) = 0, and F(X) = [ f 1 (x 1 ), f 2 (x 2 ), . . . , fn (x n )]T , denotes the neuron transfer function, f j (·) is differentiable and satisfies the following condition: f j (x) − f j (y) ≤δ ∀x, y ∈ R, j = 1, 2, . . . , n. (2) 0< x−y

α1

β1

ω b2

β2

α2 bm

2

βm

αm

1

2

X (i) ,

Let i = 1, 2, . . . , m, be the patterns that need to be remembered. To endow the network with retrieval properties, X (i) need to be assigned as the equilibria of the network. Moreover, the network is expected to converge to an equilibrium point if the initial state is sufficiently close to it, i.e., the equilibria are locally asymptotically stable. A sufficient condition given in [25] for the local stability of the equilibria is as follows. Theorem 1 [25]: X (i) is locally asymptotically stable, if λ1 [H − A(X (i) )] < 0

(3)

where (X) = diag[1/ f˙1 (x 1 ), . . . , 1/ f˙n (x n )], H = (W + W T )/2, and λ1 (·) denotes the maximum eigenvalue of a matrix. An interesting synthesis approach based on the perceptron training algorithm has been proposed in [26] and [27], in which the synthesis problem of neural networks was developed by formulating a set of linear inequalities which were solved by training a set of perceptrons. Inspired by this synthesis approach, an SFN method is developed to assign the given patterns as the equilibria of the network. Considering the SFN as shown in Fig. 1, the inputs are fed directly to the outputs via a series of weights. α = [α1 , α2 , . . . , αm 1 ]T and β = [β1 , β2 , . . . , βm 2 ]T denote an input and output pair of the SFN, ω = [ωi j ] ∈ Rm 2 ×m 1 is the input weights matrix, and B = [b1 , b2 , . . . , bm 2 ]T denotes the bias of the neurons in the output layer. The transfer function of the output layer is defined as g(x) = x, ∀x ∈ R. As illustrated in Fig. 1, the SFN has a very simple structure, and the training of the SFN can be easily accomplished by using the error backpropagation algorithm. The training dataset of the SFN is made up of m input– (i) = output examples: TH = {(Y (i) , AX (i) )}m i=1 where Y (i) (i) F(X ) is the input vector of the i th example, and AX is the target response of the i th example, such that m 1 = m 2 = n. Let W = ω and  = B. If the actual output of the SFN due to Y (i) is close enough to AX (i) , then AX (i) = ωY (i) + B = W F(X (i) ) + 

(4)

namely, X (i) , i = 1, 2, . . . , m, are the equilibria of network (1). That is, if the training data set TH can be approximated by the SFN, then the given patterns can be assigned as the equilibria of network (1). A question is how to guarantee the local stability of the remembered patterns. By Theorem 1, one need to keep λ1 [H − A(X (i) )] < 0 to make X (i) locally stable. It follows from the result of [28] that λ1 [H − A(X (i) )] ≤ λ1 [H ] + λ1 [−A(X (i) )]

g(x) = x Input layer Fig. 1.

Output layer

Structure of SFN.

which means that the upper bound of λ1 [H − A(X (i) )] is decided by λ1 [H ] and λ1 [−A(X (i) )]. The matrix H is formulated during the training of SFN. As we know, the eigenvalues of a matrix have strong relationship to its diagonal components. Hence, the diagonal components (ω j j ) of ω are initially set to be ω j j < 0. In addition, it is very easy to assign the eigenvalues of −A(X (i) ), and condition (3) can be satisfied by adjusting parameters A and F(·). Hence, the system designing procedure can be given as follows. Step 1: Set A and F(·). Step 2: Initialize the SFN. Step 3: Train the SFN using backpropagation algorithm. Step 4: Set W = ω and  = B. In Step 3, there are a number of methods to update the weights (ω) and bias (B) values of the SFN, for example, the adaptive learning rate method, gradient descent with momentum, scaled conjugate gradient (SCG) algorithm [29]. Unless otherwise stated, we use the SCG algorithm as the training algorithm of the SFN. The performance function of SFN is the squared error (S E), which is defined as SE =

m2 m   (x il − xˆil )2 l=1 i=1

where m 2 denotes the number of output layer nodes, x il is the actual output of the i th output node when the lth input pattern is presented to the network, and xˆil refers to the corresponding desired output. The error goal is set to be S E = 1.0 × 10−8 , and the training of the SFN will stop when S E reaches or drops below the error goal. The initial values of ω and B are randomly generated within [−1, 1]. The Steps 2–4 are executed for pˆ times, such that pˆ different well-trained SFNs can be obtained. Let λ = max{λ1 [H − A(X (i) )], i = i 1, 2, . . . , m}. From the constructed networks, the one with the minimum λ is selected as the finally constructed network. It should be mentioned that, every time we re-execute the step 2 of the system designing procedure, the parameters ω (except ω j j ) and B are randomly generated within [−1, 1], which makes the finally designed networks different. One can

ZHENG et al.: LEARNING ASSOCIATIVE MEMORIES BY ERROR BACKPROPAGATION

2

8 6 4

X

2 0 −2 −4 −6

Fig. 2.

349

0

2

4

6

8

10 t

12

14

16

18

20

Phase portrait of the constructed network with initial state X (0).

stop building networks if λ < 0. However, it is encouraged to construct more than two well-trained networks, and select the one with the minimum λ as the finally designed network. Thereby, one can gain better solutions for ω and B. Moreover, during the training of the network, one may fail to construct networks with λ < 0. In such a case, it is better to re-initialize the S F N by increasing the value of a j or decreasing ω j j , which is an effective method to gain the local stability of the equilibria. Remark 1: It is worth mentioning that the SFN and the error backpropagation algorithm are introduced only to assign the given patterns as the equilibria of network (1), and the dynamics of the constructed HNNs (including BAMs and MAMs) are apparently different from that of the recurrent backpropagation networks [30]. B. Numerical Simulations In the following, four numerical simulations are presented to illustrate the effectiveness of the proposed method. Example 1: The use of system designing procedure is illustrated in this example. The nonbinary patterns, which need to be stored, are specified by X (1) = [1.5, −1.3, −1.8, 1.5, 1.3, 1.8, −1.2, 0.9, 1.7]T ; X (2) = [−1.7, 1.6, −2, −1.5, 1.5, −1.8, 1.3, −1.7, −1.9]T ; X (3) = [−2.0, 1.5, 1.4, 1.9, −1.5, −1.9, 1.3, −1.3, 1.6]T ; X (4) = [1.6, 1.5, −1.8, 1.4, −1.6, 1.9, −1.8, 1.4, 1.8]T. We set A = diag[1.2, 0.9, 0.5, 1.4, 1.0, 1.5, 2.0, 1.8, 2.0], f 1 (·) = f 2 (·) = · · · = f 9 (·) = ϕ(·) with ϕ(x) =

2 − 1, 1 + e−2x

x∈R

(5)

and initially set ω j j = −1 ( j = 1, 2, 3, 4). The training data set (Y (i) , AX (i) ) can be obtained with Y (i) = F(X (i) ) for i = 1, 2, 3, 4. Then the coefficients ω and B of the SFN are adjusted by using the SCG algorithm. After only five iterations, the S E reaches 2.67 × 10−9 , and the training of the SFN only costs less than 3 s. The Steps 2–4 are executed two times (i.e., pˆ = 2). Hence, two different asymmetric 1 Hopfield networks can be constructed with λ = −1.76 and

1

λ = −1.24, respectively. The one with λ = −1.76 is selected as the finally designed network. Concretely ⎡ −0.222 −0.465 −0.576 0.728 −0.358 ⎢ −1.054 0.470 0.192 −0.892 −0.909 ⎢ ⎢ −0.220 −0.058 0.402 0.578 −0.086 ⎢ ⎢ 0.205 −0.267 0.152 1.082 −0.224 ⎢ 0.281 −0.420 −0.857 0.250 1.221 W =⎢ ⎢ ⎢ 0.000 0.008 0.176 −1.077 0.076 ⎢ ⎢ −0.629 −0.468 0.104 0.203 0.014 ⎢ ⎣ 0.403 −0.312 0.210 0.509 −0.640 −0.343 −0.520 0.163 2.101 −0.630 ⎤ −0.093 −0.786 1.434 −0.755 0.576 −0.755 −0.114 −0.234 ⎥ ⎥ −0.109 −0.224 −0.345 −0.137 ⎥ ⎥ 1.122 ⎥ −0.220 −0.448 −0.660 ⎥ −1.307 −0.032 0.157 0.164 ⎥ ⎥ 1.452 −1.267 0.492 0.876 ⎥ ⎥ −0.541 0.612 −1.523 −0.247 ⎥ ⎥ 0.169 −1.353 1.010 −0.980 ⎦ 0.697 0.223 0.305 0.858  = [−0.183, 1.087, −0.528, 0.300, −0.461, 0.182, 0.014, 0.180, 0.562]T . It can be verified that X (1) , X (2) , X (3) , and X (4) have been assigned as the equilibria of the constructed network. Moreover, it can be calculated that



λ1 H − A(X (1) ) = −1.76, λ1 H − A(X (2) ) = −3.81



λ1 H − A(X (3) ) = −1.78, λ1 H − A(X (4) ) = −3.69. By Theorem 1, X (1) , X (2) , X (3) , and X (4) are locally asymptotically stable. Given an initial state X (0) = [4.4, 5.1, −3.9, 6.2, 5.7, 2.4, −3.9, −4.9, −3.0]T , Fig. 2 shows the network trajectory as the network dynamic evolves with time. As shown in Fig. 2, the constructed network finally converges to X (1) . Spurious memories may exist in the designed network that harm the network performance. It is hard to compute every spurious memory analytically since network (1) is nonlinearly coupled. Note that the network can converge to a spurious memory if an initial state is sufficiently close to it. Moreover, it follows from the results of Appendix 2 in [25] that all the equilibrium points (including stored patterns and spurious memories) of the constructed network are inside Bs = {X : −4 ≤ x j ≤ 4, j = 1, 2, . . . , 9}. Hence, one may find all the spurious memories by giving the network sufficient different initial states. As initial states, 7214 vectors, which are uniformly distributed within Bs , are presented to the finally constructed network. It is believed that all the spurious memories have been found because a sufficient number initial states are presented. However, only two spurious memories are found, namely X s1 = [1.59, 3.70, −3.45, −1.18, −2.37, 2.29, −1.87, X s2

1.78, −0.64]T = [−1.97, −1.31, 1.30, 1.94, 1.45, −1.94, 1.81, −1.73, 1.48]T .

350

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 3, MARCH 2011

1

(a)

0.9

0.8

(b)

n = 50 n = 75 n = 100 n = 125 n = 150

P 0.7

0.6 (c)

0.5

Fig. 3. Initial states and retrieved patterns. (a) Chinese characters need to be stored. (b) Initial states. (c) Retrieved patterns.

0.4

0

10

20

30

40

50

60

70

m

In addition, numerical simulations show that 7139 vectors can be correctly recalled. The percent of correct recall, denoted as P, is 0.99. The simulation results show that the constructed network can correctly remember and retrieve the given nonbinary patterns. Example 2: Consider the Chinese characters introduced in [31]. As shown in Fig. 3(a), each character is composed of 9 × 9 small boxes, which can be interpreted as 9 × 9 data matrices. The matrices need to be transformed into vectors which can be remembered. Here and later, the transformation of matrix to vector and its inverse are defined as ⎤ ⎡ γ1 ⎢ γ2 ⎥ matri x t o vect or ⎥ −−−−−−−−−− ⎢ T (6) =⎢ . ⎥ −−−−−−−−−− [γ1 , γ2 , . . . , γnγ ] ⎣ .. ⎦ vect or t o matri x γn γ

where ∈ Rnγ ×m γ , γi denotes the i th row of . Then each character can be transformed into an 81-dimensional vector. Conventionally, the given characters can be translated into strings of ±1. However, in order to reduce the maximum eigenvalue of −A(X (i) ), strings of ±2 are adopted to represent the characters with −2 (black) and 2 (white). For simplicity, we set A = diag[1, 1, . . . , 1], f 1 (·) = f 2 (·) = · · · = f 81 (·) = ϕ(·), and initially set ω j j = −10 ( j = 1, 2, . . . , 81). The training dataset TH can be obtained with Y (i) = F(X (i) ) for i = 1, 2, . . . , 7. Then, performing the Steps 2–4 three times yields three different well-trained asymmetric Hopfield networks. The one with the minimum λ is selected as the finally designed network. Blurred patterns with salt and pepper noise of density 0.25, as shown in Fig. 3(b), are given to the constructed networks as initial states. Fig. 3(c) shows the retrieved patterns of the corrupted ones in the same column. As shown in Fig. 3, the network retrieves a previously stored pattern that most closely resembles the blurred one. This illustrates the effectiveness of the proposed method. Example 3: Consider a network of n neurons, and the network needs to store m binary patterns (random strings of ±2). The system design parameters are set to be A = diag[1, 1, . . . , 1], ω j j = −10 ( j = 1, 2, . . . , n), and f 1 (·) =

Fig. 4. Percent of correct recall of the networks constructed with different m and n (d = 0.15).

f 2 (·) = · · · = f n (·) = ϕ(·). Then the networks with different values of m and n can be constructed. To evaluate the network performance, we randomly collect 5n corrupted patterns (salt and pepper noise of density d) for every designed network, and then test how many of them can be correctly recalled. For d = 0.15, Fig. 4 shows the percent of correct recall (P) of the constructed networks with different numbers of the stored patterns (m) and memory dimensions (n). As shown in Fig. 4, for all the constructed networks (n = 50, 75, 100, 125, 150), P = 1 if the number of the stored patterns m ≤ 20, which implies the constructed networks can retrieve all the corrupted patterns precisely. Taking the curve n = 50 for example, P starts to decrease approximately at m = 20, we call it a critical point, which stands for the storage capability of the n-dimensional network under the salt and pepper noise of density d = 0.15. For n = 50 and m > 20, P decreases as m grows. For m = 25, the networks with n ≥ 75 perform much better than the ones with n = 50. As shown in Fig. 4, for n = 75, n = 100, n = 125, and n = 150, the critical point moves to m = 30, m = 42, m = 51, and m = 60, respectively. It is obvious that the critical point grows as the memory dimension n increases, which implies that, for any given m, the robustness in respect to acceptable noise in the input of the constructed networks is enhanced as the memory dimension n grows. Fig. 5 illustrates the values of P as a function of m under the different noise densities (d = 0.1, 0.15, 0.2) with n = 100. It is seen that the critical point decreases as d grows. For m ≤ 30 and d ≤ 0.2, it is seen that P = 1, and the networks have perfect network performances. However, for 30 < m < 40, the designed networks exhibit perfect retrieval reliability for the noises that d < 0.15, but the constructed networks are incapable of retrieving all of the corrupted patterns with d ≥ 0.2. It is obvious from Figs. 4 and 5 that the robustness in respect to acceptable noise in the input of the constructed networks is weakened as the number of the stored patterns m grows.

ZHENG et al.: LEARNING ASSOCIATIVE MEMORIES BY ERROR BACKPROPAGATION

1

patterns. To measure the quality of image retrieval, we adopt the function NMSE in [24] as

2 n  i x (i) j − oj

0.95 0.9 0.85

NMSE =

0.8 d = 0.1 d = 0.15 d = 0.2

P 0.75 0.7

0.6 0.55 0

10

20

j =1

n 

j =1

0.65

0.5

351

30

40

50

60

70

m Fig. 5. Percent of correct recall of the constructed networks under the different noise densities (n = 100).

Here, the measure “bits per synapse” [32] is adopted to describe the storage capacity of the network. Let Mc be the number of patterns that can be stored and recalled if there is only one bit in error. Then the storage capacity of the network can be calculated by Mc log2 (n) bits-per-synapse. n2 Numerical simulations show that the network can store patterns with a capacity about 0.1 bit per synapse. Example 4: Consider the 256 gray-level face images shown in [33, Fig. 6]. Each image is 40 pixels in height and 33 pixels in width. Conventionally, the images can be interpreted as 40× 33 data matrices (pixel value matrices) of class unit 8, and the pixel values range from 0 to 255. Here, for the convenience of the system designing, the pixel values are transformed by if φo ≤ 125 φo − 200, φn = if φo > 125 φo − 55, η=

where φo denotes the original pixel value, and φn is the new pixel value. Then the matrices can be transformed into 1320 dimensional vectors by (6). The system designing procedure is performed with n = 1320, m = 8, and A = diag[10, 10, . . . , 10], initially set ω j j = −10 ( j = 1, 2, . . . , 1320), and f j (·) is selected as x f j (x) = 200 tanh , x ∈ R. 100 Then, the training dataset (Y (i) , AX (i) ) of the SFN can be obtained with Y (i) = F(X (i) ), where X (i) denotes the transformed pixel value vectors. We design five different networks separately, and select the one with the minimum λ as the finally constructed network. To examine the performance of the constructed network, blurred images with Gaussian white noise of mean 0 and variance 0.05, as shown in Fig. 7(a), are presented to the constructed network as initial states. As the network dynamic evolves with time, the network finally converges to stored

x (i) j

2

where O i = [o1i , o2i , . . . , oni ]T is the final output, and X (i) = (i) (i) (i) [x 1 , x 2 , . . . , x n ]T denotes the corresponding desired output. It can be calculated that the average NMSE is 2.1 × 10−5. The network outputs are rounded toward the nearest integers, and transformed into matrix by (5). Then, the data matrix can be interpreted as a color image by using its associated color map. It should be mentioned that NMSE = 0 after the network outputs are rounded toward the nearest integers. Fig. 7(b) shows the retrieved images of the blurred ones in the same column. As illustrated in Fig. 7(b), the network retrieves a previously stored image that most closely resembles the corrupted one. The retrieved images are identical with the stored images shown in Fig. 6, which shows perfect retrieval reliability. In addition, another eight blurred images with salt and pepper noise of density 0.4, as shown in Fig. 8(a), are presented to the constructed network. During the retrieval phase, the network dynamic evolves toward the corresponding stored patterns. Images in Fig. 8(b) illustrate the retrieved images of the corrupted ones in the same column. It can be calculated that NMSE = 0 after the network outputs are rounded toward the nearest integers, which implies the recalled images are identical with the stored ones. As shown in Figs. 7(a) and 8(a), the given images are heavily blurred, even the human brain can hardly recognize these corrupted images. However, the constructed network can retrieve these heavily blurred images precisely. It is shown that the constructed network can correctly remember and retrieve the 256 graylevel images. Remark 2: For true color images, the color of each pixel is determined by the combination of the red, green, and blue intensities in each color plane at the location of the pixel. One can design three networks to remember and recall the red, green, and blue intensities of each image, respectively. By this method, the constructed networks can also act as efficient associative memories for storing true color images. Remark 3: The number of spurious memories of the constructed networks in this paper is much less than the designed networks with Hebb rule in [34], especially when the number of stored patterns (m) is sufficiently small. Furthermore, we redo the Example 2 by using the Hamming shell minimum overlap algorithm (HSMOA) [16] and pseudo-inverse technique (PIT) [35]. One-hundred initial states, the Hamming distance between which and the corresponding remembered patterns is 10, are presented to the constructed network. In simulations, the network performances of the HSMOA, and PIT are P = 0.75 and P = 0.89, respectively. The HNN constructed in this paper apparently has a better retrieval reliability. More important is that the proposed SFN method

352

Fig. 6.

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 3, MARCH 2011

Gray-level images (256) need to be remembered.

(a)

(b) Fig. 7. Blurred images with Gaussian white noise and the corresponding retrieved images. (a) Blurred images (Gaussian white noise of mean 0 and variance 0.05). (b) Retrieved images of the blurred ones in the same column.

(a)

(b) Fig. 8. Blurred images with salt and pepper noise and the corresponding retrieved images. (a) Blurred images (salt and pepper noise of density 0.4). (b) Retrieved images of the blurred ones in the same column.

is available for designing neural associative memories storing nonbinary patterns. III. BAM In the following, we show that BAMs can perform as efficient associative memories by training two SFNs. A. Model Description and System Design Consider a continuous-time asymmetric BAM as follows: X˙ 1 = −A1 X 1 + W1 F2 (X 2 ) + 1 (7) X˙ 2 = −A2 X 2 + W2 F1 (X 1 ) + 2 where X 1 , 1 ∈ Rn1 . X 2 , 2 ∈ Rn2 . W1 ∈ Rn1 ×n2 , W2 ∈ Rn2 ×n1 , continuous functions F1 : Rn1 → Rn1 and F2 : Rn2 → Rn2 which satisfy the condition (2).       X1 1 F1 (X 1 ) Let X = ,= , F(X) = , F2 (X 2 ) X2 2     0 W1 0 A1 ,W = . Then network (7) can A= 0 A2 W2 0 be transformed into X˙ = −AX + W F(X) + 

which is identical with the form of network (1). Hence, the asymmetric BAM can be considered as a special case of the asymmetric HNN. Thus, Theorem 1 also holds for asymmetric BAMs. Let (X 1(i) , X 2(i) ) (i = 1, 2, . . . , m) denote the patterns which need to be stored. In the following, two SFNs are introduced to assign the given patterns as the equilibria of network (7). The training dataset of the first SFN is defined as T1 = (i) (i) {(Y2(i) , A1 X 1(i) )}m i=1 , where Y2 = F2 (X 2 ) with m 1 = n 2 and m 2 = n1. Remark 4: Let W1 = ω, 1 = B1 . If the actual output (i) (i) of the first SFN due to Y2 is close enough to A1 X 1 , then (i) (i) A1 X 1 = W1 F2 (X 2 ) + 1 . The training dataset of the second SFN is T2 = (i) (i) (i) (i) {(Y1 , A2 X 2 )}m = F1 (X 1 ) with m 1 = n 1 i=1 , where Y1 and m 2 = n 2 . Remark 5: Let W2 = ω, 2 = B. If the actual output of the second SFN due to Y1(i) is close enough to A2 X 2(i) , then (i) (i) A2 X 2 = W2 F1 (X 1 ) + 2 . It is obvious that the given patterns can be assigned as the equilibria of network (7) by training the two SFNs. Similar

ZHENG et al.: LEARNING ASSOCIATIVE MEMORIES BY ERROR BACKPROPAGATION

X1(1)

X1(2)

X1(3)

X1(4)

X(5) 1

X2(1)

X(2)

X2(3)

X2(4)

X(5) 2

353

Fig. 10. Blurred (columns 1, 3, and 5) and corresponding retrieved (columns 2, 4, and 6) patterns. Fig. 9.

Patterns that need to be stored by the asymmetric BAM.

IV. MAM to the design of asymmetric HNNs in Section II, one can keep the equilibria locally asymptotically stable by adjusting parameters A and F(·). Note that ω j j is not the diagonal components of W , so that ω j j is set to be the random numbers within [−1, 1]. In the following section, we illustrate the effectiveness of the system design method by numerical simulation. B. Numerical Simulations Patterns that need to be stored by the asymmetric BAM are shown in Fig. 9. In this example, strings of ±2 are adopted to represent the given patterns with −2 (black) and 2 (white). (i) (i) Then X 1 , X 2 can be translated into 81- and 56-D vectors, respectively. Hence, n 1 = 81, n 2 = 56, and m = 5. We set  ω j j to be  the random numbers within [−1, 1], 0 A1 A = = diag[1, 1, . . . , 1], and F(X) =  0 A2 F1 (X 1 ) = [ϕ(·), ϕ(·), . . . , ϕ(·)]T . Then the training F2 (X 2 ) (i) (i) datasets Ta and Tb can be obtained with Y2 = F2 (X 2 ), Y1(i) = F1 (X 1(i) ), for i = 1, 2, 3, 4, 5. For each SFN, the Steps 2–4 are executed four times. Thus, 16 different  asymmet 0 W1 ric BAMs can be constructed by setting W = , W2 0   1 = . The one with the minimum λ is selected as the 2 finally designed network. To test the performance of the constructed asymmetric BAM, 100 blurred patterns with salt and pepper noise of density 0.25 (as shown in the columns 1, 3, and 5 of Fig. 10) are presented to the finally designed network. Here, we use the asymmetric BAM as a heteroassociative memory. Let X 10 denote a given initial state (blurred pattern) of X 1 , then the initial state of X 2 can be calculated as −1 0 A−1 2 [W2 F1 (X 1 ) + 2 ], where A 2 denotes the inverse of A 2 . 0 Similarly, if X 2 denotes a given initial state (blurred pattern) 0 of X 2 , then the initial state of X 1 is A−1 1 [W1 F2 (X 2 ) + 1 ]. As the network dynamic evolves with time, all the given blurred patterns can be correctly retrieved. In addition, we redo the numerical example by using the Hebb rule-designed BAM model [3], but only 91% patterns can be correctly recalled. Fig. 10 shows six examples of the blurred and retrieved patterns. It is shown that all the blurred patterns can be correctly recalled by the constructed asymmetric BAM.

In this section, the above results are extended to the case of asymmetric MAMs. Consider a continuous-time asymmetric MAM of the form ⎧ X˙ 1 = −A1 X 1 + W1 F2 (X 2 ) + 1 ⎪ ⎪ ⎪ ˙ ⎨ X 2 = −A2 X 2 + W2 F3 (X 3 ) + 2 (8) .. ⎪ . ⎪ ⎪ ⎩ ˙ X k = −Ak X k + Wk F1 (X 1 ) + k where X i , i ∈ Rni , n i ∈ R+ for i = 1, 2, . . . , k, Wi ∈ Rni ×n(i+1) for i = 1, 2, . . . , k − 1, and Wk ∈ Rnk ×n1 , continuous function Fi : Rni → Rni which satisfies condition (2). It is obvious that if k = 2, then network (8) reduces to network (7). ⎡ ⎤ ⎤ ⎡ F1 (X 1 ) X1 ⎢ F2 (X 2 ) ⎥ ⎢ X2 ⎥ ⎢ ⎥ ⎥ ⎢ Let X = ⎢ . ⎥, F(X) = ⎢ ⎥, .. ⎣ ⎦ ⎣ .. ⎦ . Xk Fk (X k ) ⎡ ⎤ ⎤ ⎡ 1 A1 0 · · · 0 ⎢ 2 ⎥ ⎢ 0 A2 · · · 0 ⎥ ⎢ ⎥ ⎥ ⎢  = ⎢ . ⎥, A = ⎢ . .. .. ⎥, . . . . ⎣ . ⎦ ⎣ . . . . ⎦ k 0 0 · · · Ak ⎤ ⎡ 0 W1 0 · · · 0 ⎢ 0 0 W2 · · · 0 ⎥ ⎥ ⎢ W =⎢ . .. .. . ⎥. Then network (8) can be .. ⎣ .. . .. ⎦ . . Wk 0 transformed into

0

··· 0

X˙ = −AX + W F(X) +  which is identical with the form of network (1). Let (X 1(i) , X 2(i) , . . . , X k(i) ) (i = 1, 2, . . . , m) denote the patterns which need to be stored. The given patterns can be assigned as the equilibria of network (8) by training the k SFNs. The training datasets of the first k − 1 SFNs are defined (i) m (i) (i) as T j = {(Y j(i) +1 , A j X j )}i=1 , where Y j +1 = F j +1 (X j +1 ) (i)

(i)

for j = 1, 2, . . . , k − 1. And Tk = {(Y1 , Ak X k )}m i=1 with (i) (i) Y1 = F1 (X 1 ) is the training dataset of the kth SFN. Similar to the asymmetric HNNs and BAMs, the given patterns can be assigned as locally asymptotically stable equilibria by adjusting parameters A and F(·). By this means, MAMs can be endowed with retrieval properties.

354

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 3, MARCH 2011

V. C ONCLUSION In conclusion, a method for the design of HNNs, BAMs, and MAMs with asymmetric connections is proposed. A given set of patterns, which are not necessary of binary forms, can be assigned as locally asymptotically stable equilibria of the network by training a SFN. Several numerical simulations are proposed to test the effectiveness of the proposed method. It is shown that the blurred patterns can be correctly retrieved, and the proposed method is efficient for the design of Hopfieldtype recurrent neural networks. ACKNOWLEDGMENT The authors would like to thank associate editor and anonymous reviewers for their helpful comments and suggestions. R EFERENCES [1] S.-I. Amari, “Neural theory of association and concept-formation,” Biol. Cybern., vol. 26, no. 3, pp. 175–185, 1977. [2] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proc. Natl. Acad. Sci., vol. 79, no. 8, pp. 2554–2558, 1982. [3] B. Kosko, “Bidirectional associative memories,” IEEE Trans. Syst. Man Cybern., vol. 18, no. 1, pp. 49–60, Jan.–Feb. 1988. [4] A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, and J. Schmidhuber, “A novel connectionist system for unconstrained handwriting recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 5, pp. 855–868, May 2009. [5] C.-S. Leung, L.-W. Chan, and E. M. K. Lai, “Stability and statistical properties of second-order bidirectional associative memory,” IEEE Trans. Neural Netw., vol. 8, no. 2, pp. 267–277, Mar. 1997. [6] C. Feng and R. Plamondon, “Stability analysis of bidirectional associative memory networks with time delays,” IEEE Trans. Neural Netw., vol. 14, no. 6, pp. 1560–1565, Nov. 2003. [7] T. P. Chen and S. I. Amari, “Stability of asymmetric Hopfield networks,” IEEE Trans. Neural Netw., vol. 12, no. 1, pp. 159–163, Jan. 2001. [8] K. Matsuoka, “Stability conditions for nonlinear continuous neural networks with asymmetric connection weights,” Neural Netw., vol. 5, no. 3, pp. 495–500, 1992. [9] A. N. Michel and D. L. Gray, “Analysis and synthesis of neural networks with lower block triangular interconnecting structure,” IEEE Trans. Circuits Syst., vol. 37, no. 10, pp. 1267–1283, Oct. 1990. [10] S. Mou, H. Gao, J. Lam, and W. Qiang, “A new criterion of delaydependent asymptotic stability for Hopfield neural networks with time delay,” IEEE Trans. Neural Netw., vol. 19, no. 3, pp. 532–535, Mar. 2008. [11] Z. Zuo, C. Yang, and Y. Wang, “A new method for stability analysis of recurrent neural networks with interval time-varying delay,” IEEE Trans. Neural Netw., vol. 21, no. 2, pp. 339–344, Feb. 2010. [12] Z.-B. Xu, G.-Q. Hu, and C.-P. Kwong, “Asymmetric Hopfield-type networks: Theory and applications,” Neural Netw., vol. 9, no. 3, pp. 483–501, Apr. 1996. [13] Z.-B. Xu, Y. Leung, and X.-W. He, “Asymmetric bidirectional associative memories,” IEEE Trans. Syst., Man Cybern., vol. 24, no. 10, pp. 1558–1564, Oct. 1994. [14] D. Liu and A. N. Michel, “Cellular neural networks for associative memories,” IEEE Trans. Circuits Syst., vol. 40, no. 2, pp. 119–121, Feb. 1993. [15] W. Krauth and M. Mezard, “Learning algorithms with optimal stability in neural networks,” J. Phys. A, vol. 20, no. 11, pp. L745–L752, 1987. [16] D.-L. Lee and T. C. Chuang, “Designing asymmetric Hopfield-type associative memory with higher order hamming stability,” IEEE Trans. Neural Netw., vol. 16, no. 6, pp. 1464–1476, Nov. 2005. [17] H. Zhao, “Designing asymmetric neural networks with associative memory,” Phys. Rev. E, vol. 70, no. 6, pp. 066137-1–066137-4, Dec. 2004.

[18] X. Zhuang, Y. Huang, and F. A. Yu, “Design of Hopfield contentaddressable memories,” IEEE Trans. Signal Process., vol. 42, no. 2, pp. 492–495, Feb. 1994. [19] G. Costantini, D. Casali, and R. Perfetti, “Associative memory design for 256 gray-level images using a multilayer neural network,” IEEE Trans. Neural Netw., vol. 17, no. 2, pp. 519–522, Mar. 2006. [20] R. A. Monteros and J. H. Sossa, “A bidirectional hetero-associative memory for true-color patterns,” Neural Process. Lett., vol. 28, no. 3, pp. 131–153, Dec. 2008. [21] R. A. Vázquez and H. Sossa, “Behavior of morphological associative memories with true-color image patterns,” Neurocomputing, vol. 73, nos. 1–3, pp. 225–244, Dec. 2009. [22] R. A. V. E. Monteros and J. H. S. Azuela, “A new associative model with dynamical synapses,” Neural Process. Lett., vol. 28, no. 3, pp. 189–207, Dec. 2008. ˙ [23] C. Oh and S. H. Zak, “Image recall using a large scale generalized brain-state-in-a-box neural network,” Int. J. Appl. Math. Comput. Sci., vol. 15, no. 1, pp. 99–114, 2005. [24] M. E. Valle, “A class of sparsely connected autoassociative morphological memories for large color images,” IEEE Trans. Neural Netw., vol. 20, no. 6, pp. 1045–1050, Jun. 2009. [25] P. Zheng, W. Tang, and J. Zhang, “Efficient continuous-time asymmetric Hopfield networks for memory retrieval,” Neural Comput., vol. 22, no. 6, pp. 1597–1614, Jun. 2010. [26] D. Liu and Z. Lu, “A new synthesis approach for feedback neural networks based on the perceptron training algorithm,” IEEE Trans. Neural Netw., vol. 8, no. 6, pp. 1468–1482, Nov. 1997. [27] I. Salih, S. H. Smith, and D. Liu, “Synthesis approach for bidirectional associative memories based on the perceptron training algorithm,” Neurocomputing, vol. 35, no. 1, pp. 137–148, Nov. 2000. [28] H. Weyl, “Inequalities between the two kinds of eigenvalues of a linear transformation,” Proc. Natl. Acad. Sci., vol. 35, no. 7, pp. 408–411, Jul. 1949. [29] M. F. Møller, “A scaled conjugate gradient algorithm for fast supervised learning,” Neural Netw., vol. 6, no. 4, pp. 525–533, 1993. [30] F. J. Pineda, “Generalization of back-propagation to recurrent neural networks,” Phys. Rev. Lett., vol. 59, no. 19, pp. 2229–2232, Nov. 1987. [31] J.-H. Li, A. N. Michel, and W. Porod, “Analysis and synthesis of a class of neural networks: Linear systems operating on a closed hypercube,” IEEE Trans. Circuits Syst., vol. 36, no. 11, pp. 1405–1422, Nov. 1989. [32] B. Graham and D. Willshaw, “Capacity and information efficiency of the associative net,” Network, vol. 8, no. 1, pp. 35–54, 1997. [33] AT&T Laboratories, Cambridge, U.K. The ORL Database of Faces [Online]. Available: http://www.cl.cam.ac.uk/research/dtg/attarchive/ facedatabase.html [34] J. Bruck and V. P. Roychowdhury, “On the number of spurious memories in the Hopfield model,” IEEE Trans. Inf. Theory, vol. 36, no. 2, pp. 393– 397, Mar. 1990. [35] T. Kohonen, E. Oja, and P. Lehtio, “Storage and processing of information in distributed associative memory systems,” in Parallel Models of Associative Memory, G. Hinton and J. A. Anderson, Eds. Hillsdale, NJ: Erlbaum, 1981.

Pengsheng Zheng received the B.S. degree in electrical engineering and automation from Shandong University of Technology, Zibo, China, in 2004, and the M.A. degree in management science from the Hei Longjiang Institute of Science and Technology, Harbin, China, in 2007. He received the Ph.D. degree in management science and engineering from the Institute of Systems Engineering, Tianjin University, Tianjin, China, in 2010. His current research interests include mathematical modeling and dynamic analysis of neural systems, and algorithm design for pattern recognition.

ZHENG et al.: LEARNING ASSOCIATIVE MEMORIES BY ERROR BACKPROPAGATION

Jianxiong Zhang received the B.S. degree in mechanical engineering and the M.S. and Ph.D. degrees in systems engineering from Tianjin University, Tianjin, China, in 2002, 2004, and 2006, respectively. He is currently an Associate Professor at the Institute of Systems Engineering, Tianjin University. His current research interests include hybrid dynamical systems, modeling and control for complex systems, and machine learning.

355

Wansheng Tang received the B.S. and M.S. degrees in operations research and control theory from Nankai University, Tianjin, China, in 1985 and 1988, respectively, and the Ph.D. degree in systems engineering from Tianjin University, Tianjin, in 1993. He is currently a Professor at the Institute of Systems Engineering, Tianjin University. His current research interests include modeling and control for complex systems, pattern analysis, and intelligent algorithms.