A Neural Model of Human Object Recognition Development Rosaria Grazia Domenella and Alessio Plebe Department of Cognitive Science, University of Messina, Italy {rdomenella,aplebe}@unime.it
Abstract. The human capability of recognizing objects visually is here held to be a function emerging as result of interactions between epigenetic influences and basic neural plasticity mechanisms. The model here proposed simulates the development of the main neural processes of the visual system giving rise to the higher function of recognizing objects. It is a hierarchy of artificial neural maps, mainly based on the LISSOM architecture, achieving self-organization through simulated intercortical lateral connections.
1
Introduction
Object recognition is the most astonishing capability of the human visual system, and in the last decades many researches has been carried out to simulate it by means of artificial computational models. However, the majority of these attempts have just addressed the achievement of performances comparable with human vision, regardless of how the performances would be achieved. The point of how the human brain may gain recognition abilities has been much less investigated, since it may appear inessential to the understanding of how the adult visual system works. In part this is still heritage of Marr’s epistemology, with the underlaying principle of engineering design as discloser of the natural evolutionary strategies in forging vision. On the contrary, here is held that the understanding of how the brain areas involved in recognition gradually succeed in developing their mature functions would be a major key in revealing how humans and primates in general can recognize objects. This is the motivation of studying artificial models of vision where the main focus is in reproducing basic developmental mechanisms, avoiding the explicit design of any of the processing steps involved in the classical algorithmic approach to artificial vision. The background assumption is that most of the processing functions involved in recognition are not genetically determined and hardwired in the neural circuits, but are the result of interactions between epigenetic influences and some very basic neural plasticity mechanisms. This view is clearly not a prerogative of visual recognition only, but is extended as the most general explanation of the representational power of the neural system [20], in line with the constructivism in philosophy [28] and biology [29]. Visual recognition is indeed an exemplar case,
where the idea of cortical functions as emerging organizations of neural maps is supported by a particularly strong ground of neuroscientific [12, 2, 13, 11], neurocognitive [8], and psychological [4, 23, 16] evidences.
2
Modeling Cortical Development with Self-Organization
In the neurocomputational community several computational tools has been suggested for modeling the development of functions in population of neurons, especially in vision. One of the most attractive mathematical principle is the so-called self-organization of cortical maps, first applied to the development of visual areas in [27]. In this approach the final functions are achieved by the combination of self-reinforcing local interaction of neurons, supporting Hebbian principle, and some sort of competitive constraint in the growth of synaptic connections keeping constant the average of cell activities. Using variants of this principle von der Malsburg was able to simulate visual organizations like retinotopy, ocular dominance and orientation sensitivity. His original formulation was fairly realistic in mimicking cortical computations, limited to the two mentioned effects, but the resulting system of differential equation was not too manageable and therefore had little further developments. On the contrary a later mechanism called SOM (Self-Organizing Maps) [14] become quite popular because of its simplicity. The learning rule is on a winnertake-all basis: if the input data are vectors v ∈ RN , the SOM will be made of some M neurons, each associated with a vector x ∈ RN and a two dimensional (in vision applications) coordinate r ∈ {< [0, 1], [0, 1] >} ⊂ R2 . For an input v there will be a winner neuron w satisfying: w = arg
min
i∈{1,...,M }
{kv − xi k} .
(1)
The adaptation of the network is ruled by the following equation: ∆xi = ηe−
krw −ri k2 2σ 2
(v − xi ) ,
(2)
where w is the winner, identified thanks to the (1), η is the learning rate, and σ the amplitude of the neighborhood affected by the updating. Both parameters η and σ are actually functions of the training epochs, with several possible schemes of variations. The SOM is a useful tool for modeling in an abstract sense brain processes emerging from input interactions and represented as topological organization, but it is clearly far from reproducing realistic cortical mechanisms. A recent model called LISSOM (Laterally Interconnected Synergetically SelfOrganizing Map) attempts to preserve the simplicity of the SOM with a more realistic simulation of the basic plasticity mechanisms of cortical areas [22, 1]. The main differences from the SOM are the inclusion of intercortical connections, and the resort to plasticity as interaction between Hebbian growth and competitive constraints. In this model each neuron is not just connected with the
afferent input vector, but receives excitatory and inhibitory inputs from several (k) neighbor neurons on the same map. The activation ai of a neuron i at discrete time k is given by: (k) (k−1) (k−1) ai = f γX xi · v + γE ei · y i + γ H hi · z i , (3) where the vectors y i and z i are the activations of all neurons in the map with a lateral connections with neuron i of, respectively, excitatory or inhibitory type. Vectors ei and hi are composed by all connections strengths of the excitatory or inhibitory neurons projecting to i. The vectors v and xi are the input and the neural code. The scalars γX , γE , and γH , are constants modulating the contribution of afferents. The map is characterized by the matrices X, E, H, which columns are all vectors x, e, h for every neuron in the map. The function f is any monotonic non-linear function limited between 0 and 1. The final activation value of the neurons is assessed after a certain settling time K. The adaptation of the network is done by Hebbian learning, reinforcing connections with a coincidence of pre-synaptic and post-synaptic activities, but is counterbalanced by keeping constant the overall amount of connections to the same neuron. The following rule adapts the afferent connections to a neuron i: ∆xi =
xi + ηai v − xi . kxi + ηai vk
(4)
The weights e and h are modified by similar equations.
3
The Object Recognition Model
The model is made of several maps of artificial neurons, named in analogy with the brain areas, locus of the corresponding function; the overall scheme is visible in Fig. 1. The environment of the experiments is the set of natural images in the COIL-100 benchmark library [19], a collection of 100 ordinary objects, each seen under 72 different perspectives. In the model there are two distinct pathways, one monochromatic connected to the intensity retinal photoreceptors, and another sensitive to the green and red photoreceptors. For simplicity the short band photoreceptors has been discarded, it is known that short waves are less important for the representation of colors in the cortex [30]. The lower maps are called LGN with relation to the biological Lateral Geniculate Nucleus, the function performed includes in fact also the contribution of ganglion cells [5]. There are three pairs of on-center and off-center sheets, the former activated by a small central spot of light, and inhibited by its surround, conversely for the latter. One pair is for intensity, the other two collect alternatively the activation or the inhibition portions from the red and the green planes, producing the red-green opponents. It is known that also in LGN the functions performed are the result of early neural development, however since this work is aimed at investigating functions taking place in the cortex, for simplicity this component
OBJ
LOC
V2
V4
V1 monochromatic +
G R
−
G −R+ on−center
off−center
LGN on−center
off−center
off−center
green
on−center
red
intensity
Fig. 1. Scheme of the model architecture.
was not left to develop naturally, but was simulated using predefined difference of Gaussian functions. The cortical map named V1 collects its afferents from the monochromatic sheets pair in the LGN, and is followed by the map V2, which has a lower resolution and larger receptive fields. The relationship between brain areas and maps of the model is clearly a strong simplification: the biological V1 is known to be the place of an overlap of many different organizations: retinotopy [25], ocularity [18], orientation sensitivity [26], color sensitivity [15], contrast and spatial frequency [24]; the main phenomena reproduced by this model is the development of orientation domains, small patches of neurons especially sensitive to a specific orientation of lines and contours. Several studies suggest that the natural development of orientation sensitivity is a long process starting as response to spontaneous activity before eye opening, and continuing with the exposure to external images [9, 3, 21]. Accordingly, the training has been done using artificial elliptical blobs in the first 10000 steps, followed by natural images for other 10000 steps. The gradual development of orientation sensitive domains is shown in Fig. 2, where the three leftmost maps are the sequence of training using syn-
Fig. 2. Development of orientation domains in V1. The gray-scale in the maps is proportional to the orientation preference of the neuron, from black→horizontal to withe→vertical.
thetic blobs only, the rightmost final is the result of the training using all the 7200 real images. The color path proceeds to V4, named as the biological area especially involved in color processing [30]. The main feature of the cortical color process is color constancy, the property of group of neurons to respond to specific hue, despite the changes in physical composition of the reflected light. This property is important in recognizing objects, giving continuity to surfaces, and has also been proven to be an ability emerging gradually in infants [4]. During the training of V4 at the beginning there is a normal neural response, therefore with very low sensitivity to pure hue, and is peaked in the middle range between red and green, at the end the color sensitivity of all patches is uniformly distributed along the hue range. The development of color constancy domains is shown in Fig. 3.
Fig. 3. Development of color-constancy domains in V1. The gray-scale in the maps is proportional to the sensitivity of the neurons to a single specific hue.
The paths from V4 and V2 rejoin in the cortical map LOC, which has larger receptive fields, and is the last area of LISSOM type. It is known that knowledge of non-visuotopic areas in humans is currently poor [7], and scarcely comparable with primates [6]. An area that recently has been suggested as strongly involved in object recognition is the so-called LOC (Lateral Occipital Complex) [17, 10]. The response properties of cells in this area seems to fulfill the requirement for an object-recognition area: sensitivity to moderately complex and complex visual stimuli, and reasonable invariance with respect to the appearance of objects. The most difficult and unconstrained variability in appearance is inherent to the physics of vision: the 2D projection on the retina of 3D objects. The model LOC achieves by unsupervised training, using all COIL-100 images in all possible view,
Fig. 4. Invariance properties of the LOC map. In the right block is displayed the activations of the LOC map in response to the corresponding input images in the left block. Rotations are in steps of 30o .
a remarkable invariance with respect to viewpoint, as visible in some examples in Fig. 4. Table 1 summarizes the numerical results over all images, measured by cross-correlation between base view and other views, both in the input images and in the LOC maps: P P (xr,c − µ1 )(yr,c − µ2 ) ρ (I1 , I2 ) = 0