NDRAM: Nonlinear Dynamic Recurrent Associative Memory for

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005

1393

NDRAM: Nonlinear Dynamic Recurrent Associative Memory for Learning Bipolar and Nonbipolar Correlated Patterns Sylvain Chartier, Member, IEEE and Robert Proulx, Senior Member, IEEE

Abstract—This paper presents a new unsupervised attractor neural network, which, contrary to optimal linear associative memory models, is able to develop nonbipolar attractors as well as bipolar attractors. Moreover, the model is able to develop less spurious attractors and has a better recall performance under random noise than any other Hopfield type neural network. Those performances are obtained by a simple Hebbian/anti-Hebbian online learning rule that directly incorporates feedback from a specific nonlinear transmission rule. Several computer simulations show the model’s distinguishing properties. Index Terms—Associative memory, dynamic model, neural network, unsupervised learning.

I. INTRODUCTION

A

TTRACTOR neural networks (e.g., [1] and [2]) define a class of formal models that are usually used as autoassociative memory. The key mechanism common to all attractor neural networks is the presence of a feedback loop. Feedback enables a given network to shift progressively from an initial pattern toward an invariable state, namely, an attractor. If the model is properly trained, then the attractors should correspond to the learned patterns. Thus, the fundamental question is: How well can a given network develop good attractors through learning of proper patterns? Learning in attractor neural networks is usually carried out by a Hebbian type algorithm. At first, simple Hebbian algorithms were proposed [1], [2]. However, such networks had the problem of poor storage capacity, unconstrained weight matrix growth, and difficulties in learning correlated prototypes [3]. To overcome those difficulties, different learning algorithms have been proposed. The most popular solution uses a weight matrix that converges to an optimal linear associative memory (OLAM) base on the pseudoinverse. The pseudoinverse of a matrix [4] was proposed as a neural network learning algorithm by Kohonen [5] and was applied to a Hopfield network by Personnaz and Guyon [6] and Kanter and Sompolinsky [3]. However, the pseudoinverse algorithm is neither local nor iterative. Many authors proposed locally implemented learning rules that converge toward optimal linear associative memory [7]–[9] or Manuscript received January 10, 2004; revised July 5, 2004. S. Chartier was with Université du Québec à Montréal, Montréal QC H3C3P8, Canada. He is now with the Department of Psychology, Université du Québec en Outaouais, Gatienau QC J8X3X7, Canada (e-mail: chartier.sylvain@ courrier.uqam.ca). R. Proulx is with the Faculté des Sciences Humaines, Université du Québec à Montréal, Montréal QC H3C3P8, Canada (e-mail: [email protected]). Digital Object Identifier 10.1109/TNN.2005.852861

at scaling parameter [10], [11] (see [12] for a comparative analysis of an optimal projection model in a Hopfield type network). Although those various models have a better performance than the simple Hebbian algorithm, they nevertheless have the problem of spurious attractors and lack the capacity to develop nonbipolar attractors. In all previous models, the output is bound in a hypercube that limits the unit’s values to 1 or 1. Moreover, Vidyasagar [13] demonstrates that Hopfield type networks using a step function can only develop stable attractors at hypercube corners. Consequently, those models typically develop extreme behavior that restricts its cognitive explanation. A more powerful model would be able to develop attractors anywhere within a hypercube quadrant instead of only at its extremities. To accomplish this, researchers have used a different type of transmission rule using multiple limit output function [14], [15]. Although this solution yields good results, it does so with an increase in the learning rule complexity. Moreover, the proposed model is more sensitive to noise than its binary counterpart. In this paper, we introduce a new attractor neural network model that greatly reduces the number of spurious attractors and therefore has a better performance compared to the models mentioned previously. In addition, this new model is able to learn and recall nonbipolar attractors without any special coding or an increase in the learning rule complexity. Moreover, the model is able to develop real value attractors without using a multiple threshold transmission rule. We will start by presenting the model and its properties, followed with simulation results. II. PROPERTIES OF NDRAM As with any artificial neural network model, nonlinear dynamic recurrent associative memory (NDRAM) is entirely described by its architecture, its transmission rule, and its learning rule. The network architecture is illustrated in Fig. 1. As we can see, the model is autoassociative and recurrent like general Hopfield models. Learning in this model is simply based on Hebbian learning with an added correction term also named anti-Hebbian [16]. The following equation describes the learning rule: (1) represents the initial bipolar input vector, the where weight matrix, the value of the state vector after iterations, and the general learning parameter. The learning rule is a simplification of previous iterative optimal projection learning rules in which a combination of Hebbian and corrector factors makes

1045-9227/$20.00 © 2005 IEEE

1394

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005

Fig. 3. Convergence of the eigenvalues ( = 0:4 and  = 1).

Fig. 1.

matrix based on the linear activation and use a nonlinear transmission rule during recall. NDRAM uses the same nonlinear transmission rule during recall and learning. The transmission 1 rule has a positive and a negative part (respectively, and ). This double action rule, when coupled with the learning rule, enables the network to create attractors elsewhere than in the corners of the hypercube. Thus, a real value R will . be a fixed point attractor if it satisfies R

Architecture of the model.

A. Convergence of the Learning Algorithm To demonstrate the convergence of the learning algorithm, we study the network under the constraint of learning a one-dimensional (1-D) pattern in a 1-D network. Also, to simplify the problem, we assumed that the number of iterations before the weight update was one. As a consequence, the growth variation of the weight matrix is associated with the growth variation in its corresponding eigenvalue as expressed by the following equation:

Fig. 2. Illustration of the transmission rule for a value of  = 0:4.

the weight matrix converge. However, this new learning rule directly incorporates feedback, making the rule online. Also, the learning rule has only one free parameter and does not need any input normalization, contrary to others models (e.g., [11]). The learning algorithm principle is very simple; it decreases the difference between the correlations of the initial stimuli input matrix and the correlation of the output matrix at time . Conse, quently, the final weight matrix will depend on the value of namely, the number of transmission rule iterations. This transmission rule obeys the following expression:

If If

(2)

where represents the usual activation function and the transmission parameter. Fig. 2 illustrates the transmis. sion rule for a value of The overall aspect of this function resembles a sigmoid function. However, contrary to that function, this transmission rule reaches the value of 1 or 1 in a finite number of iterations, whereas the sigmoid function is asymptotic at those values. In addition, autoassociative models usually update their weight

(3) This equation has six roots, but only one root is relevant for the simple reason that the initial values of the weight matrix are zero and its weight update is additive. Consequently, the network behavior occurs between an eigenvalue of zero and the first positive root encounter. Fig. 3 illustrates this situation for a and . value of This figure clearly shows that the variation of the eigenvalues and that the eigenvalues’ is at their maximum initially variation decreases progressively until it is equal to zero at a . Thus, the learning rule converges by itself. value of B. Transmission and Learning Parameter The learning rule is online. Consequently the type of transmission rule will influence the convergence of the learning rule. Depending on the values of the free parameters, the network will develop different types of attractors. To guarantee that the network always converges directly to a fixed point, we can compute a bifurcation diagram and the Lyapunov exponent as a function values. Fig. 4 illustrates this of the transmission parameter result. We can see that the network will converge directly to a fixed point if the value of the transmission parameter is less than 0.5.

CHARTIER AND PROULX: NDRAM FOR LEARNING CORRELATED PATTERNS

1395

Fig. 5. Attractors from the learning three prototypes of three dimensions with optimal Hopfield network based on the pseudoinverse.

Fig. 4. Bifurcation diagram and Lyaponov exponent as a function of the value of the transmission parameter.

Therefore, we have to set the transmission parameter according to the following value: (4) This solution can also be found by finding the derivative of the transmission rule when the slope is greater than zero and solving for , as shown by the following expression:

if

has converged

(5)

This last equation gives the same result as (4). Moreover, because the transmission rule is bound to the learning rule, the maximum value that the learning parameter can have depends on the value of . To find this value, we have to find the derivative of the learning equation when the slope is positive and then solve it for

(6) The solution found when has converged expressed for an -dimensional network as

is thus

(7)

Fig. 6. NDRAM’s attractors from the learning of three prototypes of three dimensions.

Therefore, if both parameter constraints are satisfied, it is guaranteed that the network will learn the appropriate patterns as fixed-point attractors. As stated before, optimal linear associative memory models produce spurious attractors. To show this, we used a Hopfield network based on the pseudoinverse [6] that learns three three-dimensional patterns. In this case, there are eight attractors (three for the patterns, three for their complement, and two for the linear combinations). Fig. 5 illustrates the attractors obtained from 500 recall trials in which a state vector starting point is determined at random. We clearly see that all the corners of the cube are an attractor in the network; thus, the radii of attraction are divided in eight equally stable attractors. If we do the same simulations with the NDRAM model, we do not get as many attractors. Naturally, the new model develops the appropriate attractors for the prototypes and their complements, but it does not develop an attractor for the linear combination, as can be seen by Fig. 6. Even if a state vector is close to a linear combination of a given pattern, it will not be attracted by it. As a consequence, the network will always output a previously learned pattern. Thus, the reduction of spurious attractors improves the recall performance of NDRAM over the other models. To illustrate the network’s properties and to compare this new model performance with other popular recurrent associative memories, we performed several computer simulations that test recall under noise, development of spurious attractors, and learning nonbipolar patterns.

1396

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005

Fig. 7. Graphic illustrating the input patterns used for the simulations.

III. COMPUTER SIMULATIONS I: LEARNING AND RECALLING BIPOLAR PATTERNS A. Methodology The network task was to learn 13 correlated patterns placed on a 7 5 grid. The patterns were chosen because they present a good variety of correlation with each other. Correlation between the patterns varied from 0.029 to 0.83. Fig. 7 illustrates the chosen patterns. We converted each pattern into a vector of 35 components, where a white pixel was given a value of 1 and a black pixel was given a value of 1. It is noted that these 12 patterns represent a memory load of 34%, which is greater than the Hopfield network capacity (around 15% [3]). The transmission parameter was set to 0.1 and the learning parameter was set to 0.01. The first parameter respects the constraint defined in (4) (0.1 0.5) and the second parameter respects the constraint defined in (7) (0.01 0.014). To limit the simulation time, we set the number of transmission iterations before the weight matrix update to one. The learning was carried out according to the following procedure: 1) random selection of a prototype; 2) computation of according to the transmission rule (2); 3) computation of the weight matrix update according to the learning rule (1); 4) repetition of steps 1) to 3) until the weight matrix converges (around 2000 learning trial). When the learning was accomplished, we tested the network’s performance on two different recall tasks. The first task consisted of recalling noisy inputs. A noisy input was obtained by generating a random vector normally distributed with a mean of zero and a standard deviation of (where represents the desired proportion of noise) added to a given learned prototype. For the simulation, the proportion of noise varied from 0.1 to 2.0. Fig. 8 showed an example of a noisy version of the letter “S.” The second recall task consisted of testing the network with random pixel flip noise. For this trial, we fliped a fixed number of pixels randomly and let the network self-stabilize. Fig. 9 illustrates noisy versions of the letter “A” as a function of the number of pixel flips. From the number of pixel flips in a letter, we compute the hamming distance. Flipping pixels to add noise is more drastic a change than creating a noisy version of a pattern. Each recall trial is accomplished according to the following procedure: 1) selection and modification of a pattern by an addition of normally distributed noise or a random pixel flips; 2) computation of according to the transmission rule (2);

Fig. 8. Density graphics illustrating different proportion of noise for the letter “S.”

Fig. 9. Density graphics examples of noisy version of the letter “A” as a function of the number of pixel flips (parentheses indicate the Hamming distance).

3) repetition of 2) until stabilization of the sate vector in an attractor; 4) repetition of 1) to 3) for a different pattern. To have an idea of the performance of the network, we presented 200 noisy input patterns of a given letter for a desired noise proportion. Finally, we also determined the proportion of spurious attractors by calculating the number of random vectors that stabilize in a spurious state in relation to the total number of random vectors. To obtain an accurate percentage of spurious memories, we generated 1000 random vectors. We consider an attractor spurious if it is neither a learned prototypes nor its complement. We compared the network’s performance with other recurrent autoassociative networks. Those models are the ones used by Kanter and Sompolinsky [3], Storkey and Valebregue [10], Diederich and Opper [8], and Bégin and Proulx [11]. From those manipulations, we proceeded to the different simulations. IV. RESULTS A. Learning As expected, Fig. 10 shows that the model was able to selfstabilize its weight matrix and learn the 12 prototypes. The first difference between the NDRAM model and the OLAM model is in the eigenvalues spectrum. In models based on the pseudoinverse (except [3]), all the eigenvalues converge to the same value even when learning correlated prototypes. However, this is

CHARTIER AND PROULX: NDRAM FOR LEARNING CORRELATED PATTERNS

1397

Fig. 10. Eigenvalues spectrum of optimal linear associative memory models and NDRAM (the first 20 eigenvalues are shown).

Fig. 11.

Performance percentage in function of random noise proportion.

Fig. 12.

Performance of correct categorization in function of the number of pixels flip.

not case for the proposed model. The eigenvalues of the weight matrix converge to an unequal value spectrum, thus indicating that an unequal radius of attractions was developed. This property is also obtained by an optimal linear matrix model without self-connections [3]. However, this model can show cyclic behavior, whereas that is not the case with the proposed model if the constraint given by (4) is respected. B. Recall If we look at Fig. 11, we see that the performance of NDRAM is far better than that of any other model, when the recall is accomplished from normally distributed random noise. For example, even under a noise proportion of 2.0, NDRAM still has a performance of about 58%, rather than only 17% for the next best model.

If we look at the performance for the random flip noise task, we see that again the new model performs better than any other model (Fig. 12). To illustrates, at a noise proportion of 31%, NDRAM still reaches 74% of good recall, rather than 47% for the next best model. Thus, this indicates that the radii of attraction are greater in NDRAM compared to other models. Finally, we tested the network over totally random stimuli to identify the presence of spurious attractors in the categorization space. Fig. 13 shows examples of final state attractors obtained with the optimal linear associative model. We clearly see that only on rare occasions a random stimulus will converge toward previously learned prototypes. On the other hand, Fig. 14 shows a very different situation. In the majority of trials, a random stimulus ends in one of the stored prototypes (or its complement). The Fig. 15 shows the ratio of spurious attractors com-

1398

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005

Fig. 13.

Random samples of final attractor obtained by OLAM.

Fig. 14.

Random samples of final attractor obtained by NDRAM.

greater the memory loading, the lesser the radius of attractions. Also, the attractors’ radius depends on the correlation between the patterns. In a learning task in which the correlations between the patterns vary, the shape and the range of attraction will be different from one attractor to another, and consequently they cannot be expressed in a formal way. The network is able to reduce the radii of spurious attractors by using a learning rule that directly incorporates the nonlinear transmission rule. It is noted that this performance is achieved using the nonlinear transmission rule with the 1 limit. Thus, similar functions, such as the sinus function, should give similar results. However, contrary to the proposed transmission function, the sinus function will not be able to learn nonbipolar patterns for it does not have a dual stabilizing mechanism. V. COMPUTER SIMULATIONS II: LEARNING AND RECALLING NONBIPOLAR PATTERNS Fig. 15.

Spurious memory attractors.

pared with the others models. We can see that the new model has a spurious memory proportion of 13%, whereas the next best model has a spurious memory of 96%. This indicates us that the radii of spurious attractors are smaller in the NDRAM compare to other OLAM models. C. Discussion As the results show, by reducing the attraction radii on the spurious attractor, we increase the recall performance at the same time. Moreover, in the new model, the shape and range of the basins of attractions depend on the memory loading, as is the case in all the Hopfield-like attractor neural networks; the

Simulations were conducted to show the network’s capacity to develop real value attractors. This was done to show the unique model properties of learning and recall nonbipolar patterns. A. Methodology The network task was to learn four gray-level icons illustrated in Fig. 16. These images have a size 16 16 pixels. Each pixel is encoded according to an 8-bits grayscale (0 to 255). We converted each image into vectors of 256 components and rescaled the values of the images in the range [ 1, 1]. The transmission parameter was set to 0.1 and the learning parameter was set to 0.001, which are in agreement with the constraints expressed in (4) (0.1 0.5) and (7) (0.001 0.001 97). The recall task was set to show the network capacity of correctly

CHARTIER AND PROULX: NDRAM FOR LEARNING CORRELATED PATTERNS

1399

VI. CONCLUSION

Fig. 16.

Gray-level input patterns used for the simulations.

Fig. 17.

Eigenvalue spectrum of the 21 first eigenvalues.

We have demonstrated that the NDRAM model can achieve a better performance than OLAM models. We have also shown that the NDRAM model used a nonlinear transmission during both the learning and the recall, which enables the model to learn online. In addition, the dual action transmission rule enables the network to learn directly without preprocessing gray-level images. Finally, the learning in this model, like any unsupervised autoassociative neural network, is made solely from the noise-free patterns. A more natural implementation would make the network learn only from an example of patterns and then see if the network will develop the proper attractors by itself. Further research should focus on accomplishing this. It is thus concluded that a simple combination of Hebbian and anti-Hebbian learning with specific nonlinear transmission during learning enables both bipolar and nonbipolar correlated prototypes to be learned. The new proposed model represents a significant increase in performance over conventional correlational autoassociative models. ACKNOWLEDGMENT The authors are grateful to J.-F. Ferry for his help in reviewing this paper. REFERENCES

Fig. 18.

Examples of recall.

recalled noisy version of stimuli and the ability of stimulus completion. B. Results The first result illustrated by Fig. 17 indicates that the network developed 5 eigenvalues out of a maximum of 256. Once again, we can see that the eigenvalue spectrum is unequal, indicating unequal basins of attraction. If we look at Fig. 18, we see that the gray-level images formed stable attractors, as is the case with bipolar patterns. We can see that the attractors behave in the same fashion as hypercube corner attractors, for they are able to clean the noise and complete missing parts. C. Discussion The various results have shown that it is possible for a network to develop nonbipolar attractors as well as bipolar attractors. This property has never been observed in unsupervised correlational attractor neural networks without using arbitrarily multivalue thresholds. Usually, when we wanted a network to learn gray-level images, we had to code one pixel by several units. For example, if we want a Hopfield network to learn the 8-bit icons as illustrated in Fig. 15, we need to recode each pixel by eight units. Consequently, the 256-unit model is increased by a factor of eight, giving a network of 2048 units. Thus, it would takes a 2048-unit Hopfield network to do the same task as the present model composed of 256 units.

[1] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” in Proc. Nat. Acad. Sci., vol. 79, 1982, pp. 2554–2558. [2] J. A. Anderson, J. W. Silverstein, S. A. Ritz, and R. S. Jones, “Distinctive features, categorised perception, and probability learning: some application of a neural model,” Psych. Rev., vol. 84, pp. 413–451, 1977. [3] I. Kanter and H. Sompolinsky, “Associative recall of memory without errors,” Phys. Rev. A, vol. 35, pp. 380–392, 1987. [4] R. Penrose, “On best approximate solutions of linear matrix equations,” in Proc. Cambridge Philos. Soc., vol. 52, 1956, pp. 17–19. [5] T. Kohonen, Associative Memory—A System-Theoretical Approach. Berlin, Germany: Springer-Verlag, 1977. [6] L. Personnaz, L. Guyon, and G. Dreyfus, “Collective computational properties of neural networks: new learning mechanisms,” Phys. Rev. A, vol. 34, pp. 4217–4228, 1986. [7] J. A. Anderson and G. L. Murphy, “Psychological concepts in a parallel system,” in Evolution, Games and Learning, D. Farmer, A. Lapedes, N. Packard, and B. Wendroff, Eds. Amsterdam, the Netherlands: NorthHolland, 1986, pp. 318–336. [8] S. Diederich and M. Opper, “Learning of correlated pattern in spin-glass networks by local learning rules,” Phys. Rev. Lett., vol. 58, pp. 949–952, 1987. [9] S. Chartier and R. Proulx, “A new online unsupervised learning rule for the BSB model,” in Proc. Int. Joint Conf. Neural Networks (IJCNN’01), 2001, pp. 448–453. [10] A. J. Storkey and R. Valabregue, “The basins of attraction of a new Hopfield learning rule,” Neural Netw., vol. 12, pp. 869–876, 1999. [11] J. Bégin and R. Proulx, “Categorization in unsupervised neural networks: The Eidos model,” IEEE Trans. Neural Netw., vol. 7, pp. 147–154, 1996. [12] N. Davey and S. P. Hunt, “A comparative analysis of high performance associative memory models,” in Proc. 2nd Int. ICSC Symp. Neural Computation (NC’2000), 2000, pp. 55–61. [13] M. Vidyasagar, “Discrete optimization using analog neural networks with discontinuous dynamics,” in Int. Conf. Automation, Robotics, Computer Vision, Singapore, 1994. [14] J. M. Zurada, I. Cloete, and E. van der Poel, “Generalized Hopfield networks for associative memories with multi-valued stable states,” Neurocomputing, vol. 13, pp. 135–149, 1996. [15] S. Mertens, H. M. Koehler, and S. Bos, “Learning grey-toned patterns in neural networks,” J. Phys. A: Math. Gen., vol. 25, pp. 5039–5045, 1992. [16] S. Haykin, Neural Networks: A Comprehensive Foundation. Englewood Cliffs, NJ: Prentice-Hall, 1999.

1400

Sylvain Chartier (M’05) received the B.A. degree from the Université d’Ottawa, Ottawa, ON, Canada, in 1993 and the B.Sc. and Ph.D. degrees from the Université du Québec à Montréal, Montréal, PQ, Canada, in 1996 and 2004, respectively, all in psychology. His research interests are in the development of recurrent autoassociative memories as well as bidirectional associative memories. He is currently pursuing a Postdoctoral Fellowship with the Institut Philippe Pinel de Montréal and at the Université du Québec en Outaouais, Gatineau, Canada, where he is developing neural network applications for oculomotor data.

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005

Robert Proulx (M’94–SM’96) has been Dean of the Faculty of Human Science, Université du Québec à Montréal, Montréal, PQ, Canada, since 1998. He became a Professor at the Université du Québec à Montréal in 1978. He was Chair of the Department of Psychology there from 1994 to 1998 and currently is Director of the Natural and Artificial Intelligence Studies Laboratory. His research interests include signal processing and neural networks, as well as perception, pattern recognition, statistics, and artificial intelligence.