A Correlation Learning Rule - Semantic Scholar

Comment

Report 2 Downloads 228 Views

Neural Networks, Vol. 1, pp. 277-288, 1988 Printed in the USA. All rights reserved.

0893-6080/88 $3.00 + .00 Copyright © 1988 Pergamon Press plc

ORIGINAL CONTRIBUTION

Central Pattern Generating and Recognizing in Olfactory Bulb: A Correlation Learning Rule WALTER J. FREEMAN, Y O N G YAO, AND BRIAN B U R K E University of California (Received February 1988;revised and accepted April 1988)

Abstract--A learning rule called an input correlation rule that can simplify exploration of the behavior of learning, generating, and classifying patterns in the vertebrate olfactory system is proposed. We apply this correlation rule to a set offully interconnected coupled oscillators that comprises a dynamical model of the olfactory bulb so as to form "'templates" of oscillators with strengthened interconnection in respect to inputs classed as "learned. "' We obtain a content addressable memory in which phase coherent oscillation provides for central pattern generation and recognition. We use this analog model neural network to simulate dynamic features of the olfactory bulb in detail by numerical integration and multivariate analysis. The model classifies 100% correctly for incomplete inputs, testing inputs about their training centroids, and distortion by noise that is defined as input to nontemplate elements. The model also allows substantially overlapping templates, which implies that it possesses a large information capacity. For multiple inputs the model gives correct output of the forms A and B, A and not B, B and not A, or neither. The initial conditions of the model at the time of onset of input play no role in classification. Classification is achieved within 20 to 50 ms of simulated run time, even though convergence to a limit cycle requires up to 10 cycles (200 ms). The repetition rate of convergence from one pattern to the next in the model exceeds 10 patterns/s. Keywords--Associative memory, Classification, Input correlation, Learning, Olfactory bulb, Pattern generation, Pattern recognition, Template. 1. I N T R O D U C T I O N

Langenbacher, Thakoor, & Khanna, 1987; Skarda & Freeman, 1987; Yao, 1987). Two topics which we consider are collective behavior and associative memory. Collective behavior is very common in nature, for example, fluids (turbulence), crystals, and lasers in atomic and molecular systems. In the brain, for instance, although single neurons give rise to discrete spikes, spike density over a population is a continuous variable due to dense synaptic interaction in local neighborhoods (Freeman, 1975 ). The brain is thought to integrate in an analog mode in the dendritic trees of its neurons. These sums are then transformed into spikes at the soma of individual cells concomitantly over all neurons in a set and are transmitted to thousands of other neurons. This massively parallel form of cooperation is another important characteristic of neuronal collectives. New neuromorphic architectures may stem from understanding how such collective dynamics occurs among neurons in brain systems. An associative memory can be considered as a dynamical system. One reason why interest in neural networks was renewed is the realization that a spatial pattern of activity in a dynamical system may be considered as the minimum of a cost function, or a stored

Most perceptual and high-level information processing in the brain may be considered as pattern recognition, for example, vision, olfaction, audition, and speech. Those familiar with digital computers know that they function in a manner very different from the way biological brains do. The brain has billions of units (neurons) which behave collectively and each of which has both computing power and memory. These cooperative properties make nervous systems more powerful than computers in high-level information processing. Many researchers have sought novel design principles that underlie the superior performance of biological systems (Baird, 1986; Freeman, 1975; Freeman, Eisenberg, & Burke, 1987; G r a f & DeVegrar, 1987; Hecht-Nielsen, 1986; Koch & Poggio, 1987; Mead, 1987; Moopenn,

This work is supported by a grant MH06686 from the National Institute of Mental Healthand by a grant 87NH129 fromthe AFOSR. The authors thank Frank Eeckman and Joe Eisenbergfor their valuable discussion. Requestsfor reprintsshouldbe sent to Departmentof PhysiologyAnatomy, Universityof California, Berkeley,CA 94720. 277

278

W. J. Freeman, )1. Yao. a n d B. B u r k e

pattern in a memory. The convergence towards an identifiable pattern from its neighborhood implements the computations in an optimization problem, or the retrieval of a pattern from its partial content in the memory. Classification and selection are automatic in such a m e m o r y without resort to a "teacher," templatematching, error computation, or back propagation, which do not seem likely in brains. This paper is organized as follows: Section 2 presents the olfactory model (Freeman, 1987) as a neural network; in Section 3 a modified Hebb rule, the "input correlation learning rule" is presented, and computer simulation of the model with the rule is shown; in Section 4, discussion is addressed to simplifying and testing the olfactory model; and in Section 5. the significance of the olfactory model and its learning processes are discussed.

2. M O D E L

F ( t ) = (?(t) + (a + b ) l ) ( t ) -- a b V ( t )

where V(t) is m e m b r a n e voltage at time t. and a and b are structural parameters that are determined by physiological measurement. For rabbits, a = 2201s, h = 720/s. The equation for the nonlinear part in the input variable V and the output variable Q are Q ( V ) = Qm{ 1

and

=-I.

e-(~-l)/~ f.

if

V~-~,

NEURAL NETWORKS

+2~

4

3

-

0 to study the relationship between

(a)

".'W~" '~ ~,i ~,,~/~,,,. "v~" '~

(b)

(c) FIGURE 3, Pa~em outputs for inputs (a) T=, and (b) T5, where T~, T=, T8, Ta, T5, are stoN~:l in the system. (c) PaNem retrieval when input to the 14th channel in Figure 3b is omltled. Duration of each trace is 400 ms; the frequency is about 50 Hz. 281

282

W. J Freeman, ~ Yao, and B. Burke

TABLE 1 P ~ k - P ~ k Amplitudes of Outlet Wavelonns in 20-channel Case Input

T2

T5

T 5-

Peak-Peak Amplitude

0.561 0.561 0.561 0,561

0,453 0,369 2,532 0.369

2,532 2.532 0,453 0.668

2.532 0,668 0.668 2.532

0.453 0.668 0,265 0,668

0.561 0.561 0.561 0.561

0.453 0.369 0.668 0.369

0,668 0.668 0.453 2.532

0.668 2.532 2.532 0.668

0.453 2.532 0,265 2.532

0.382 0.328 0.382 0.382

0.323 0.273 0.438 0.273

0.438 0.438 0.323 2.425

0.438 2,425

0,323 2,425

1.778

0,202

0.438

2,425

(a)

(b)

than one. Channel 2 is off, so the correlations between channel 2 and other " o f f "" channels are one, but the rest are less than one. Channels 3, 4, and 5 are the same as channel 2. In addition, channel 6 to channel 10. or channel 11 to channel 15, or channel 16 to channel 20 is the duplication of channel 1 to channel 5. Because the input correlation learning rule is related to the cooperative response to an input signal for the receptors. from Table 3. we conclude that this learning rule can be applied to modification of connections using either the input or the output correlations.

D. Multivariate Analysis for Pattern Classification (c)

Five patterns are stored in the systemaccording to correlation learning rule, and its synapses will be unchanged except a new pattern needs to be learned. Two examples are shown in (a) and (b). (C)This table showsthe error correcting capability of the system, where T 5_ correspo~s to T 5 in Table 1b when the input channel 14 is omitted.

input amplitude and output waveform. Figure 4 shows the simulation result. From Figure 4, we draw the following conclusions: (a) although the " o n " channels and the " o f f " channels start to oscillate at the different latencies, once established they are still phase coherent; (b) all the " o n " channels start to oscillate at the same time, as do all of the " o f f " channels; and (c) the nonlinear system saturates, that is, at some critical value, the amplitude ofoutput increases quickly as the amplitude of the input increases. When the amplitude of the input achieves a certain value, the output is almost unchanged. This implies that if the amplitude of the input pattern deviates from binary ( " o n " or I and " o f f " or 0) over a wide range, it does not affect the recognition processing in the model. The above characteristics are very robust and hold when the structural parameters of the system are altered over a wide range, for example, K s (low) = 0.6, K~ (high) = 0.8, Kii = 0.3, and K~ (low) = 0.7. K~ (high) = 1.3, Ku = 1.0, respectively. Next, let us observe the evolution of stored patterns with the adding of new patterns. Table 2c is the case when T 4 is omitted in Table 1. In other words, T S in Table I b may be considered the evolution of Table 2c when T 4 is added. The evolution does not change the pattern essentially. For instance, the amplitude of the " o n " channels in Table 2c is 2.425, which is close to 2.532 in Table lb. Finally, Table 3 shows a table of the output correlations between the outputs o f channel 1 and channels 1 to 20, channel 2 and channels 1 to 20, etc. Since channel 1 is " o n , " the correlations between channel 1 and other " o n " channels are one, but the rest are less

Multivariate analysis is generally considered to include those statistical procedures concerned with analyzing multiple measurements that have been made on a n u m b e r N of individuals (Sammon. 1969 ). For pattern classification, nonlinear mapping provides a method to indicate to which class an individual belongs. Given N individuals, if m measurements have been made on these N individuals, each individual can be represented as a point in the m-dimensional space. The m-dimensional vectors can be point-mapped into 2dimensional space preserving the inherent data structure. Clustering of similar output in the data set can then be easily observed. In this context, we use this analysis to test our model and learning rule. Given the following three nonoverlapping templates of 5 elements in each. U ~= (0011000

t 000100000010~.

U2= ( 0 0 0 0 0 0 0 0 1 1 0 0 0 U -~= ( 1 0 0 0 0

10000101~o

[000101). 10000).

we store these templates in a 20-channel network (see Figure 2) according to our learning rule,

TABLE 2 This is the Case When the T ~ Overlap, where 1"1 and 1~, 1'= and 1~ are OveflaPlm¢ Req:mc.thtely Input

7"2

T5

Peak-Peak Amplitude

0,723 0.505 0.505 0.505

0.505 0.351 2.497 0.351

2.497

2.456

0.505

2,497

1.134 1,134 2,497

0.264

1.327 1.327

0,996

0.694

0.694 0.694 0.694

0.455 1.327 0.455

0.385 0.385 0.385 0.385

0.325 0.203 0.441 0.203

0.505 1.134

1.134 1.134

0.694 2.543

2.543 2.543 1.327

0.694 2:543 0.341 2.543

0.441 0,441 0,325 2.425

0,441 2.425 1.779 0,441

0.325 2,425 0.203 2,425

2.741

This table shows the output pattern evolutiondue to removing a previous stored pattern, where T 4 in Table l b is removed,that is, only T ~, T 2. T s. T 5 a r e stored in the system,

283

Dynamic Pattern Classification

~vVV~

FIGURE 4. The relationship between input magnitude and output pattern amplitudes, and oscillation latencies for a ramp input. Trace: 200 ms. A time sequence of 20-50 ms (3 cycles from t = 0) suffices for classification.

By means of multivariate analysis, we obtain Figure 5 in which, " - , " " + , " and " * " with circles represent the patterns U i, U 2, U 3, respectively. All of the other " - " points indicate the positions of outputs when the input of U j is drawn one, or two, or three, or four channels off (totaling 31 possibilities). Figure 5 shows the pattern clustering ability of the olfactory model under the input correlation learning rule. One hundred percent initial input patterns are correctly classified. This result is comparable to that by physiological experiments in rabbits (Freeman & Grajski, 1987). In that experiment on the average 75% of EEG bursts were correctly classified.

TABLE 3 Output Correlations Between Channel i and All of 20 Channels (i 1, 2, 3, 4, 5) (see text for the details) =

Channel1 to Channel 20 Channel1

1.000 1.000 1,000 1.000

0.710 0.710 0.710 0.710

0.710 0,710 0,710 0.710

0.710 0.710 0.710 0.710

0.710 0.710 0.710 0.710

Channel2

0.710 0.710 0.710 0,710

1.000 1.000 1.000 1.000

1.000 1.000 1.000 1,000

1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000

Channel3

the same as channel 2

Channel4

the same as channel 2

Channel5

the same as channel 2

Figure 5 gives further information: For an individual input, the olfactory model can distinguish not only to which template the input belongs, but also how many channels are omitted. For example, " - " with a circle indicates the original pattern, while from near to far, the groups of " - " represent one, two, three, and four channels are omitted. In Figure 5, there are four points which are not well grouped. When the size of the system is larger, the results are better (see Figure 6 as an example). 4. P R O P E R T I E S OF A FULLY I N T E R C O N N E C T E D REDUCED KII S E T Now we simplify our model because with increased N from 20 to 64 the contributions of e2 and of i2 become negligible. We find by computer simulation that by proper selection of values for parameters, the dynamic properties of a reduced KII subset can be made to conform to the full KII set while omitting e2, i2 as well as their connections with el, ii in a full KII subset (see Figure 2). Table 4 shows that under certain parameter values, both full and reduced KII subsets have suitable dynamic ranges, that is, both subsets oscillate, and their oscillation amplitudes increase when inputs are increased. According to the above phenomena it may be expected that when each full KII subset is replaced by a reduced one in Figure 2, the resulting network has similar pattern recognition capability under certain welldefined parameters (see Table 4). Figure 6 gives the simulation results for the following three patterns. For the reduced interconnected KII set with 64-channel, we obtain 100% clustering as well as perfect grouping.

I

./l~'r.

¢

/I /. /liil rl l } , . , t . ~ / _ i L l ~, I

/ 'i.L,. I 1

.- /

/

/

/

!

,/

f

i t

I + x..

/

\~-" / "

lr

if !

i_l_

"-,

I

/

Q

I

/

/

(

- -

'f

'y

I

I

I \'~

'.-IX

/

\

--t

_ _

I

_..

.

RG~I~IE 5. O I I t p ~ p I t t I ~

¢1~I11~

//. t

/

('B'-,/+Ti "'sT,

"

/

,'/+,'

./.L l

r

,+.,. ,

"l "l

/

(~

r-,

/ AL

tldl KII I O t #

~

¢ltIl'tlllli,

\

/

l

I

--s

\

hi II tully ~

/ /

-

-U-

\

-x..-

\ ~

,

--t

i

_

,

[ If - - k~ \- -~ ,

I

f r I

t

,r-\ I --

t

/ I

/

I

~ r-I-,

.-

\ .

\

--

--

I\-

\

,,,.,i,~.. i k ,~i # / J i l

284

\

\

~

\

--

,,

",'.--,'

-.

"-,,-~"

"-\---',

f"~

FIGURE 6. Output p e t i e m cluWmdng in a fully ~

\

\

- '~ , . -

I

/

-

~

ll~&" Jl #l~-¢

" ~

/

/..

/

~r

I

reduced K9 m t with 6 4 ~haemehL

Dynamic Pattern Classi~cation

285

TABLE 4 The Suitable Dynamic Range of (a) A Full KII Subset (K., = 5.0, K~ = 0.2, K.. = 0.25, K, = 0.25), (b) a Reduced KII Subset (K. = 2.5, K~ = 1.0, K.. = 0.25, KH = 0.25) as the Building Block of the Whole Network

B. Incompleteness and Noise

\

Q~Ut

0.3

0.5

0.7

1.0

1.5

5.0

0.301

1.630

2.285

2.735

2.739

3.5

0.405

1.386

1.802

1.909

*

~lnput

0.7

1.0

1.5

2.0

3.0

5.0

4.09

5.90

7.24

7.63

7.54

3.5

3.61

4.73

5.36

5.36

om\

(a)

(b) *

00000000 I0000000 10000000 00000000 00001000 00100000 10000000 00000000

The same templates and procedures as in study A are used to study extraneous input or noise, except to each of the 93 cases random nontemplate elements (nontemplate meaning not in any template) are turned on. The number of activated nontemplate elements is also random and ranges from 1 to L. Runs of all 93 cases are made where L is 5, 15, and 30. All of them give rise to 100% classification. This result at first glance is rather surprising. However, the reason is that nontemplate input is insignificant in the classification of output patterns. Since under the correlation rule connection strengths of all nontemplate channels are identical and low in value, the RMS amplitudes of nontemplate centroid elements are small and close in size across the templates. Thus the contribution of nontemplate channels in Euclidian

00000010 00000001 00001000 00000000 00000100 00000000 00100000 00000000

The robustness of this KII set is tested under conditions of incomplete input, overlap, and noise.

A. Incompleteness To evaluate pattern completion on incomplete input, the above three 5-element patterns are stored in a 64channel reduced KII set. Every possible case from one with all five inputs to five cases each with a single input is generated for each template. This results in 31 cases (5on= 1,4on=5,3on= 1 0 , 2 o n = 10, 1 o n = 5 ) for each pattern, for a total of 93 cases for the three templates. Each case consists of 64 output waveforms 200 ms in duration. The RMS (root mean square) amplitude of each waveform is calculated over the final 150 ms (7 cycles after stabilization), yielding 64 RMS amplitudes per case. Classification is made using the RMS values. The centroids for each pattern are calculated in 64 RMS space by two methods, (a) the absolute centroid (the location in 64 space with all 5 elements on), and (b) the average location in 64 space of the 31 cases for each pattern. Then the Euclidian distances to the three centroids are calculated for each case. The case is classified as belonging to the pattern for which Euclidian distance is a minimum. Simulation results confirm that both methods give rise to 100% correct classification.

10100000 00000000 00000000 00000000 00010000 10000000 01000000 00000000

distance sum is similar for all patterns, and not a determining factor in the pattern classifications.

C. Incompleteness and Overlap Consider the following three 5 element patterns: pattern 1 = elements 9, 17, 37, 43, 50 (sharing 37, 50); pattern 2 = elements 7, 16, 21, 37, 51 (sharing 21, 37); and pattern 3 = elements 1,21, 36, 41, 50 (sharing 21, 50). There is a 20% overlap in the patterns. That is each pattern contains one element of each other pattern. Ninety-three cases are set up as in study A and the same classification methods are used. We obtain the following results: 96% (pattern 1), 100% (pattern 2), 93.5% (pattern 3) by method 1 and 100% (pattern 1), 96.8% (pattern 2), 93.5% (pattern 3) by method 2. Overall is 96.8%. Furthermore, consider a 40% overlap situation. The three 5 element patterns are chosen to be: pattern 1 = elements 9, 17, 37, 41, 50; pattern 2 = elements 7, 17, 21, 37, 51; pattern 3 = 7, 21, 36, 41, 50. The following results are obtained: 83.9% (pattern 1), 93.5% (pattern 2), 93.5% (pattern 3 ) by both methods. Overall correct classification is 90.3%. Examination of the individual cases shows that when a single channel is activated with 20% overlap, on 2 /

286

14~ J. Freeman. ~1 Yao. and B. Burke

31 cases or about 6.4%, the input is ambiguous, and the distances between the centroids and the outputs in 64 space are expected to beequal. Owing to round-off error only 3.2% are misclassifted. The same result holds for 40% overlap, showing in each case that if at least one template channel in addition to the overlapped channels receives input (is " o n " ) then classification is determined by that input and is not impaired by the overlap. The error rate is reduced to zero if a class is defined by the centroid for the responses to the ambiguous inputs.

D. Pure Noise Two 5 element patterns (pattern 1 = elements 9, 17, 37, 43, 49; pattern 2 = elements 7, 16, 21, 38, 51) are stored in a 64 channel reduced KII set. There is no overlap in the patterns. Every possible case from five inputs on to a single input on is generated for each pattern. This results in 62 cases. The elements receiving noise inputs are 5, 25, 46, 55, 61 which does not effect the connection strengths in the KII set. All possible cases of these noise elements are given input in order to generate an additional 31 cases. The 93 cases are classified as above. Simulation results give rise to 100% (pattern 1), 100% (pattern 2 ) as well as 100% (noise). This means that we can separate noise input from input to the stored patterns, and we can say that an input is either in a known class or belongs to no known class.

E. Input Includes More Than One Pattern Two 5 element patterns are stored in a 64 element reduced KII set. There is no overlap in the patterns. The pattern templates are: pattern 1 = elements 9, 17, 37, 43, 49, pattern 2 = elements 7, 16, 21, 38, 51. Thus 62 possible cases from five inputs " o n " to a single input " o n " are generated. Thirty-one additional cases are generated where elements from pattern 1 and pattern 2 are set " o n " simultaneously. The cases contain from 2 to 5 " o n " elements and always include elements in both patterns. These 93 cases are classified using the pattern 1, pattern 2, and "combined" centroids. Only method 2 is used for classification since no absolute centroid exists for the "combined" case. The following result is obtained: 100% (pattern 1), 100% (pattern 2), and 67.7% ( " c o m b i n e d " ) . When an input has 4 template A (or B) " o n " and 1 template B (or A ) "on," the output was misclassified as A (or B) instead ofA + B.

E Clustering Templates A~!ast ~

Centroids

In study A, we examine classification where the average centroid is calculated from an exhaustive set of

the incomplete cases. How well can we classify tbr the average centroid is a small subset of the possible cases. That is, if we use a subset of cases to train the system to indicate where the average centroid is for each template, can the system classify any other disjunct subset as well as it does using the centroid calculated from the exhaustive set? Consider three 10 element templates which are stored in a 64 element reduced KII set. There is no overlap in the templates. Ninety cases denoted as training cases from five inputs " o n " to a single input " o n " are generated 30 for each template. The average centroids are calculated for each template based on these 90 cases. Thirty additional cases for each template (denoted as testing cases) are randomly generated. The testing cases are distinct from those training cases used to calculate the centroids. The 90 testing cases are classifted against the three centroids. Simulation results in 100% correct classification. All the above classification tests are made using waveforms of the KII set in the t 50 ms interval from 50 ms to 200 ms. that is, after stabilization. The ,same tests are made using the first 50 ms and also using the first 20 ms. No significant classification difference has been found between these three time intervals. This indicates that in the KII set. pattern classifications can be achieved before complete steady-state oscillation has occurred.

G. Role of Initial Condition How well can we classify patterns if the system is not reinitializcd between inputs? Another way to ask this question is: Does the initial condition ofthe system affect its steady-state output or its template classification? To see this, let us use the same templates and the same set up procedure as in study F The same 30 additional cases in study F are generated for each template with these variations: (a) initial conditions are not reset between cases; (b) cases are generated in the order: pattern l case, pattern 2 case, pattern 3 case, and so forth (this insures the state space of the system is "wrong" as each input is presented); (c) the length o f each generated time series is 100 ms and classification is made using the last 50 ms, That is, at 50 Hz there are 2.5 cycles in the transition period and 2.5 cycles in the classification period. Both classification methods (absolute centroid and average ccntroid) show 100% classification. The output waveforms are similar to those stemming from zero initial conditions so that the initial conditions o f the system do not effect convergence to its steady-state or its classification. Only input pattern and learned templates play a role in classification. The input patterns drive the system from one template to another, or cause it to bifurcate from zero state to one of the learned templates

Dynamic Pattern Classification (i.e., cause Hopfbifurcation). Simulation made for full KII set with 20 channels confirms this, too. The study also tells us that we can classify one input pattern from another at a repetition rate of 10 cases/s without reinitialization of the system. This means that the recognition rate of the model has been determined, which is not clear for other systems. 5. DISCUSSION Prior to recognition, olfactory transduction converts an odorant to a 2-D spatial array of active receptor cells. Each odorant can potentially excite a large number of cells forming its responding set, that overlaps with other such sets. On any one sniff only a small subset within a set is excited. The selected subset varies in spatial location and number with each sniff of the same odorant within the responding set. An invariant output in respect to repeated odorant trials requires learning under reinforcement. This takes place by changes in synaptic weights in accordance with a modified Hebb rule, leading to formation of a template for each learned odorant. The template is the set of strongly interconnected cells (Freeman, 1979, 1987). Simulations of the nonlinear neural dynamics with networks of differential equations reveal how the system works and what its strengths and weaknesses are. These appear to be closely related to bulbar function in olfactory pattern recognition. The input is massively parallel, with an on-off step signal on each line from a 2D transducer array. Each signal consists of the activation of a varying number of input lines from a set that is defined by multiple exposures during training. The input lines are not unique to the odorants, but the combinations or pairings are. The system is extraordinarily sensitive to minute levels of stimulation, such that the activity of small numbers of lines (receptors) in any selection or combination suffices for large-scale classification in a time span under 0.1 s. The output of the bulb not only provides for very rapid preattentive identification; it also provides the information needed at higher levels for more precise classification under attention with serial inspection. It performs this task in the presence of background odorants that are often much stronger and that are continually varying. However, the size of the repertoire of odorants that can be classified preattentively is relatively small, on the order of 16 for absolute identification, and there is little evidence for graded responses by receptors in relation to concentration differences. That is, the receptors are " o f f " for most conditions, and are " o n " for a subset of odorants over narrow ranges of their concentrations. On the one hand the odorant input in respect to receptors is a binary signal. On the other hand the bulb handles well a wide range of variation in the numbers of active lines on each input time frame,

287 covering 3 or 4 orders of magnitude (Freeman, 1975 ). There is relatively little in the way of preprocessing of olfactory input prior to the operations of classification and generalization, apart from this action of dynamic range compression. In several respects our model displays these features of olfactory dynamics. It has a minimal capacity that is defined by the number of elements in the array divided by the average size of template. The speed of convergence to an acceptable classification is faster than the rate of convergence to a limit cycle and in simulated time is equivalent to the processing time of the bulb for a sniff. The system can classify correctly irrespective of the fraction of elements that receives input, and it is virtually immune to background noise defined as input to nontemplate elements. Simultaneous input to two templates can be identified separately as A + B, versus A or B or none of the above. Overlapping templates lead to ambiguity of the output when input is given only to the shared elements, but this can be resolved by creating an output class for this eventuality. Hence our model is robust in respect to sensitivity for small or incomplete inputs in the presence of background noise and overlapping templates, and it is flexible for the establishment of new classes of input and output. The training process with our correlation rule in either of its two forms requires only that we know that an input is or is not present with or without noise, and that a corresponding output class is or is not desired for that input class. We need not know the input, although the testing process is facilitated if we do. The main limitation lies in the form specified for the input, namely an array of points in 2-D that recur in clusters that can be expressed ultimately as pairs (Koenderink, 1987). In the abstract we can postulate that perceptual patterns in visual, auditory and somesthetic systems may be reducible to arbitrary arrays of activated points on a surface, but it is not obvious how this might be done by preprocessing in these systems to achieve that form of input. There is physiological evidence that the primary visual cortex in monkey (Freeman & van Dijk, 1987) and cat (Gray & Singer, 1988 ) has the same or similar dynamics as the olfactory system and its model, Considering that the input has the properties of being massively in parallel, and that the system is capable of rapid learning of new categories by means of the correlation law in either its input or output form, we suggest that applications for real-time pattern recognition in large data bases, that are continually evolving at multiple rates of change, may follow from further studies of the olfactory system. Recognition of the need to change templates, and of the steps required to up-date the system, can be programmed into it. However, a more detailed review of this learning process, including the process of habituation needed to shape new templates for input received in the pres-

288

W. J. Freeman. Y Yao, and B. Burke

e n c e o f b a c k g r o u n d n o i s e ( F r e e m a n , 1979), lies b e y o n d the scope of the present report.

REFERENCES Baird, B. (1986). Nonlinear dynamics of pattern formation and pattern recognition in the rabbit olfactory bulb. Physica D, 22, 150-175. Freeman, W. J. (1968). Analog simulation of propyriform cortex in the cat. Mathematical Biosciences, 2, 181-190. Freeman, W. J. ( 1975 ). Mass action in the nervous system. New York: Academic Press. Freeman, W. J. (1979). Nonlinear dynamics of paleocortex manifested in the olfactory EEG. Biological Cybernetics, 35, 21-37. Freeman, W. J. (1987). Simulation of chaotic EEG patterns with a dynamic model of the olfactory system. Biological Cybernetics,

$6, 139-150. Freeman, W. J., Eisenberg, J,, & Burke, B. (1987). Hardware simulation of brain dynamics in learning: The SPOCK. Proceedings of the First IEEE International Conference on Neural Networks, 3, 435--442. Freeman, W. J., & Grajski, K. A. (1987). Relation of olfactory EEG to behavior: Factor analysis. Behavioral Neuroscience, 101 (6), 766-777. Freeman, W. J., & van Dijk, B. (1987). Spatial patterns of visual cortical fast EEG during condition reflex in a rhesus monkey. Brain Research, 422, 267-276. Graf, H. P., & DeVegrar, P. (1987). A CMOS associative memory chip based on neural networks. Technical digest of the IEEE International Solid-state Circuits Conference. Gray, C. M., & Singer, W. (1988). Nonlinear cooperativity mediates oscillatory responses in orientation columns of cat visual cortex. Submitted to Science.

Gros~erg, S. (1982). Studies of mind and brain" Neural principles of learning, perception, development, cognitionl and motor control. Boston: Reidel Press. Hebb, D. O. (1949). The organization of behavior. New York: Wiley. Hecht-Nielsen, R. (1986). Performance limits of optical, electro-oly tical, and electronic neurocomputers. In H: Szu (Ed.), Proceedings of SPIE: Hybrid and OpticalSystems (pp. 277-306). Bellingham, WA: SPIE, International Society for Optical Engineering. Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities; Proceedings o[ the National Academy of Science, USA, 79, 3088-3092. Koch, C., & Poggio, T. (1987). Biophysics of computation: Neurons, synapses, and membranes. In G. M. Edeiman, W. E. Gall, & W. M. Cowan, (Eds.), Synaptic connection New York: Wiley. Koenderink, J. J. (1988). Design for a sensorium, in W. Von Seelen, G. Shaw, & U. M. Leinhos (Eds.), Organization of neural networks. Weinheius, FRG: VCH Publishers. Mead, C. A. (1987). Silicon models of neural computation. Proceedings of lEEE First International Conferenceon Neural Networks, !, 91-106. Moopenn, A., Langenbacher, H., Thakoor, A. E, & Khanna, S. K. (1987). A programmable binary synaptic matrix chip for electronic neural networks. Submitted to IEEE Conferenceon Neural Information Processing Systems--Natural and Synthetic. Sammon, J. W. (1969). A nonlinear mapping tot data structure analysis. IEEE Transcriptions on Computers, 18, 401-409. Skarda, C. A., & Freeman, W. J. (1987). How brains make chaos in order to make sense of the world. Behavior and Brain Science, 10, 161-195. Yao, Y. (1987). A neural network model of CAAM and its application to handprinted Chinese character recognition. Proceedings of IEEE First International Conference on NeuralNetworks. 3, 309316.

Recommend Documents

A Universal Learning Rule That Minimizes Well ... - Semantic Scholar