Biol Cybern (2008) 98:75–85 DOI 10.1007/s00422-007-0195-8
ORIGINAL PAPER
Neural model of disinhibitory interactions in the modified Poggendorff illusion Yingwei Yu · Yoonsuck Choe
Received: 27 August 2006 / Accepted: 17 October 2007 / Published online: 24 November 2007 © Springer-Verlag 2007
Abstract Visual illusions can be strengthened or weakened with the addition of extra visual elements. For example, in the Poggendorff illusion, with an additional bar added, the illusory skew in the perceived angle can be enlarged or reduced. In this paper, we show that a nontrivial interaction between lateral inhibitory processes in the early visual system (i.e., disinhibition) can explain such an enhancement or degradation of the illusory effect. The computational model we derived successfully predicted the perceived angle in the Poggendorff illusion task that was modified to include an extra thick bar. The concept of disinhibition employed in the model is general enough that we expect it can be further extended to account for other classes of geometric illusions. Keywords Lateral inhibition · Disinhibition · Visual cortex · Poggendorff illusion
1 Introduction Visual illusions are important phenomena because of their potential to shed light on the underlying functional organization of the visual system. For illusions with a single main effect, a simple explanation can be sufficient, but when multiple effects coexist in an illusion, the final percept can be quite complex, thus, simple explanations may be insufficient. Y. Yu · Y. Choe Department of Computer Science, Texas A&M University, College Station, TX 77843-3112, USA Y. Yu (B) Seismic Micro-Technology Inc., Houston, TX, USA e-mail:
[email protected] Y. Choe e-mail:
[email protected] For example, our perception of an angle is usually greater than the actual angle (expansion effect), but when there are multiple lines and thus multiple angles, the expansion effect can either be enhanced or reduced. The former could be due to simple lateral inhibition between orientation-tuned neurons, while the latter may be due to a more complex interaction among such neurons. Such an interference effect can be observed in a modified Poggendorff illusion. In the original Poggendorff illusion (see, e.g., Tolansky 1964; Morgan 1999), the top and the bottom portions of the penetrating thin line are perceived as misaligned (Fig. 1a). Figure 1b shows how such a perception of misalignment can occur (Blakemore et al. 1970; Carpenter and Blakemore 1973). The line on top forms an angle α with the horizontal bar, but the perceived angle α is greater than α. As a result, the line on top in Fig. 1a is perceived to be collinear with line 4 at the bottom, instead of line 3 which is physically collinear. (Note that there are other possible explanations of the Poggendorff illusion, such as the 3D-perspective explanation, which will be discussed in detail in the “Discussion”.) However, when an additional bar is added, the illusory angular expansion effect is altered: the effect is either reduced (Fig. 2a) or enhanced (Fig. 2b) depending on the orientation of the newly added bar. Understanding the functional organization and the low-level neurophysiology underlying such a nontrivial interaction is the main aim of this paper. Neurophysiologically, in the original case where two orientations interact, lateral inhibition between orientationtuned cells in the visual cortex can explain the exaggeration of the perceived angle. However, as we have seen in Fig. 2a and b, with an additional orientation response, lateral inhibition is not enough to explain the resulting interference effect. Our observation is that this complex response is due to disinhibition, i.e., inhibition of another inhibitory
123
76
Biol Cybern (2008) 98:75–85
a
a
1 2 3 4 5 1 2 3 4 5
b α α´
b
Fig. 1 The Poggendorff Illusion. a The original Poggendorff illusion is shown. The five lines below the horizontal bar are labeled 1–5 from top to bottom. Line 3 is physically collinear with the line on top. However, line 4 is perceived to be collinear. b Angle displacement in the Poggendorff illusion is illustrated. The actual angle α (= 30◦ ) and the perceived angle α (> 30◦ ) are shown. The solid line shows the straight line penetrating the bar. The dashed line below shows the perceived direction in which the line on top seemingly extends to
factor resulting in effective excitation (Hartline et al. 1956; Hartline and Ratliff 1957, 1958; Stevens 1964; Brodie et al. 1978). Unlike models using simple lateral inhibition, we explicitly accounted for disinhibition in our computational model to describe the complex interactions between multiple orientation cells. The resulting model based on the neurophysiology of the early visual system was able to accurately predict the perceptual performance in the modified Poggendorff illusion. The rest of the paper is organized as follows. The next section demonstrates our experimental methods for measuring the interference effect in humans. Then, a neurophysiological motivation for our computational model is presented, followed by a detailed mathematical description of the model. Next, the results from computational experiments is presented and compared to the psychophysical data we gathered, followed by discussion and conclusion. 2 Methods To quantify the interference effect in the modified Poggendorff illusion, we conducted a psychophysical experiment. Two
123
1 2 3 4 5
Fig. 2 Modified Poggendorff Illusion. a The Poggendorff figure with an additional bar at 50◦ is shown. In this case, line 2 is perceived to be collinear (i.e., α < 30◦ ). b The Poggendorff figure with an additional bar at 20◦ is shown. For this case, unlike in a, line 5 is perceived to be collinear (α > 30◦ ). (The angle α in this case is slightly greater than that in the original Poggendorff figure
subjects with normal vision participated in the experiment (YC and YY). A CRT display panel with a 1, 600 × 1, 200 resolution (37.5 × 30.0 cm) was used to display the stimuli at a distance of 30 cm. Thus, the visual angle spanned by the screen was 64.0◦ (horizontal) and 53.1◦ (vertical). The computer program displayed two thick bars and one thin line on the full screen, similar to the stimuli in Fig. 2a. The first thick bar was fixed in the center of the screen at 0◦ , with a width of 100 pixels. The thin line, five pixels in width, intersected the horizontal bar at a fixed angle of 30◦ . The second thick bar, 100 pixels in width, intersected at the same point as the other two, whereas the angle was varied from trial to trial. The pixels were not anti-aliased, but it did not interfere with orientation perception due to the sufficiently high resolution of the screen (the pixel at the center of the screen was 2 41 ). The stimulus display program also displayed up to 10 thin lines (all at 30◦ ) below the horizontal bar, from which the
Biol Cybern (2008) 98:75–85
77
a
0.03
0 degree 30 degree
0.025 0.02
Response
subjects were asked to choose the one that is apparently the most collinear to the thin line above the bar (forced choice task). The number of these lines varied across trials, and the relative placement of these lines with respect to the “correct” answer was varied so that answers from previous trials did not affect that of the current. The subjects were allowed to click on the line of choice to indicate the perceived continuation line, from which the perceived angle was derived and recorded by the computer program. Afterward, a new stimulus was generated. A total of 101 trials (trial 0 to trial 100) were recorded for each subject for different angles of the second thick bar. The results are reported later in the “Results” section, together with computational results.
0.015 0.01 0.005 0 −100
0
100
200
300
Position within Orientation Column
3 Modeling
b
0 degree 0.025
0.015 0.01 0.005 0 −100
Each simple cell in the primary visual cortex responds maximally to visual stimuli with a particular orientation, say θ . The response of these cells yθ to different input orientations x can be modeled as a Gaussian function 2 a −2 (x−θ) σ2 , e √ σ π/2
0
100
200
300
Position within Orientation Column
3.1 Activation profile of orientation columns
yθ (x) =
150 degree
0.02
Response
Let us first consider how orientation columns in the visual cortex interact in response to several intersecting lines. For each line at the intersection, there are corresponding orientation columns that respond maximally, which can be approximated by a Gaussian response distribution. As multiple simple cells are activated by different lines at the intersection, the response levels will interact with each other through lateral connections. Thus, there are two issues we want to address in our model: (a) What exactly is the activation profile (or the response distribution) of the orientation-tuned cells, and (b) How these cells interact with each other through lateral connections.
0.03
Fig. 3 Activation Profile. a The activation profile of simple cells in response to an acute angle is shown. The dashed curve is the response of the cells in an orientation column (x-axis) to a horizontal line of 0◦ , and the solid curve that to a 30◦ line. b The activation of simple cells in response to an obtuse angle is shown. The dashed curve is the response of the cells in an orientation column to a horizontal line of 0◦ , while the solid curve is the response to a 150◦ line
(1)
where θ is the center (or mean); σ the standard deviation; and a a scaling constant (Martinez et al. 2002). It also comes to our attention that the cell tuned for a certain orientation, say α, should respond to the opposite orientation, which is α + 180◦ . However, experiments have shown that the peak at the position α + 180◦ is somewhat smaller than the peak at α especially when the direction of motion of the oriented stimulus is considered (cf. Fig. 3; see Alonso and Martinez 1998). Further, such an asymmetric response due to stimulus movement is not only observed when the stimulus is moving perpendicular to the principal orientation, but also when it is moving along the long axis of the receptive field (Judge et al. 1980). They also showed that in the latter case, response is observed even during saccades. These
motion induced effects may result while the subject is traversing the continuation lines back and forth in the modified Poggendorff illusion. To accurately model this, we need two Gaussian curves to fit the response of a cell to a full range of orientations from −180◦ to 180◦ . The fitting curve can be written as follows: yθ (x) =
2 a −2 (x−θ) σ2 e √ σ π/2 )2 ak −2 (x−θ−π σ2 e + √ , σ π/2
(2)
where k is the degree of activation for the opposite direction (k < 1). All other terms have the same definition as in Eq. 1. Such an asymmetric response enables the simple cells to be
123
78
Biol Cybern (2008) 98:75–85
The Hartline–Ratliff equation describing disinhibition in the Limulus is written as follows (Hartline and Ratliff 1957, 1958; Stevens 1964): wmn (rn − tn ), (3) r m = m − k s r m − m=n
Fig. 4 A possible configuration of lateral inhibition between orientation detectors. The lines with unfilled arrows illustrate mutual inhibition between cells, and the lines with filled arrows excitatory synapses. Redrawn from Carpenter and Blakemore (1973)
sensitive to the direction, as well as the orientation of the stimulus. Using the equation, we can now visualize the response profile of simple cells tuned to orientations ranging from 0◦ to 360◦ . Figure 3a shows the responses of an orientation column, given inputs of two different orientations, 0◦ and 30◦ . Figure 3b shows the responses of the same orientation column to inputs of two orientations, 0◦ and 150◦ . In these two figures, we can see that for each specific orientation input, the excitation is tuned at that value with a peak in the Gaussian curve, and at the same time, the opposite direction-tuned cell shows a lower peak response. This is simply due to the two Gaussians at different heights in Eq. 2. The asymmetry in responses occurs in both the acute (Fig. 3a) and the obtuse angles (Fig. 3b). Note that even though the difference in orientation between 0◦ versus 30◦ (Fig. 3a) and 0◦ versus 150◦ (Fig. 3b) is 30◦ in both cases, the response profile greatly differs in the 0◦ versus the 150◦ case. Next, we will investigate how response profiles in multiple orientation columns can interact.
where rm is the response of the mth ommatidium, ks the selfinhibition constant, m the excitation of the mth ommatidium, wmn the inhibitory weight from other ommatidia, and tn their threshold. Brodie et al. (1978) extended this equation to derive a spatiotemporal filter, where the input was assumed to be a sinusoidal grating. This model is perfect in predicting Limulus retina experiments but only as a single spatial frequency channel filter, which means that only a fixed spatial frequency luminance input is allowed (Brodie et al. 1978). Because of this limitation, their model cannot be applied to predict the disinhibition between orientation detectors, whose activation are not in the form of sinusoidal grating with a single spatial frequency. In the following section, we will extend the Hartline–Ratliff equation and derive an equation that can be used in modeling interactions among orientation columns. 3.3 A simplified model of disinhibition Based on the Hartline–Ratliff equation above, we derived a model of disinhibition as follows (Yu et al. 2004; Yu and Choe 2006): r = (I − W)−1 x,
(4)
where I is the identity matrix, r the output vector, x the input vector containing the response levels of an array of sensors (e.g., a bank of photoreceptors or orientation-tuned neurons) and W the weight matrix
3.2 Column-level inhibition and disinhibition
Wi j = w(i, j),
Our observation that angular enlargement sometimes seems to be weakened when there are more than two bars or lines in the Poggendorff illusion (Fig. 2) led us to hypothesize about the potential role of recurrent inhibition giving rise to disinhibition. Disinhibition is the inhibition of other inhibitory factors, resulting in a net excitatory effect on the initial target, and it includes recurrent inhibition as a particular case (see Neumann et al. 1999; Yu 2006 for other nonrecurrent forms of disinhibition). Figure 4 demonstrates the recurrent feedback network structure proposed by Carpenter and Blakemore (1973) which could account for the observed properties of angle expansion. They suggested that the horizontal neuronal connectivity between the orientation detectors are recurrent in humans, and thus it can implement disinhibition. Hartline and his colleagues did significant mathematical modeling of disinhibition in the Limulus optical cell.
where w(i, j) is the kernel function (e.g., uniform, or difference of Gaussians) defining the inhibition strength from the jth neuron to the ith neuron based on this simplified model of disinhibition. With this approach, we can now more easily derive the disinhibition effect at the orientation-column level.
123
(5)
3.4 Applying disinhibition to orientation cells Orientation sensitive cells in the cat visual cortex are known to inhibit each other (Blakemore and Tobin 1972; Li et al. 1992; Kolb and Nelson 1993). From this, we can postulate that a group of cells tuned to similar orientations representing different lines (e.g., intersecting lines) may compete with each other through inhibition. Now let us consider a mathematical description of inhibition at the column level. Suppose there are n lines with
Biol Cybern (2008) 98:75–85
79
orientations {θ (1), θ (2), . . . , θ (n)} intersecting at one point. Let the initial responses of orientation columns to each line be {e1 , e2 , . . . , en }, where ei is the column response vector to input line i. In the orientation column for the ith line input, let α be the position in the orientation column whose cell is tuned to the orientation α. The initial excitation ei (α) can be calculated as ei (α) = di yθ(i) (α)
(6)
where α is the position within an orientation column, di the width of the ith input line, and yθ(i) (α) the activation of the orientation column in response to the ith input line (Eq. 2). This way, we can calculate the initial excitation e of the cell which is tuned to α, responding to the ith input line. Note that due to the term di , thicker input lines would result in a stronger response than thin ones. In other words, our model assumes that simple cell response is directly proportional to the thickness of the input, and the proper function of our model depends on such a difference in thickness. The use of stimulus width di to modulate the overall activity is justified by the fact that simple cells with low spatial-frequency tuning weakly respond to high spatial-frequency inputs, all other conditions being equal (Everson et al. 1998; Serre and Riesenhuber 2004). By the definition of disinhibition (Eq. 3), the final response ri of orientation column i can be obtained as follows: ri = ei − Wri ,
(7)
where W is a constant matrix of inhibition strengths (or weights), controlled by the inhibition rate parameter η: η if i = j Wi j = . (8) 0 otherwise From this, we can rearrange the terms to derive the response equation which accounts for the disinhibition effect ri = (I + W)−1 ei ,
(9)
where I is the identity matrix. By applying disinhibition, the response of the columns to the lines should shift a little depending on the strength of the response to each line. Thus, the final perceived line orientation γ can be obtained by finding the maximum response within each column after the inhibition process γi = arg max ri (α), α∈C
(10)
where γi is the perceived orientation for the ith line, ri (α) is the response of ith orientation column’s neuron tuned to orientation α, and C is the set of all the orientations within each column (from 0◦ to 180◦ ). (The arg max(·) function above can be replaced with a population-code-based vector sum (Snippe 1996) for a more realistic simulation.)
4 Results 4.1 Experiment 1: angle expansion without additional context To test our computational model in the simplest stimulus configuration, we used stimuli consisting of one thick bar and one thin line. The thick bar was fixed at 0◦ , and the thin line was rotated to various orientations while the perceived angle was measured in the model. Note that in order to get the correct predicted response from our model, the reference line (horizontal) has to be thicker than the rotated line. The enlargement effect of the angle varied depending on the orientation of the thin line. As shown in Fig. 5, we can observe that there are three major characteristics of this varying effect. First, for the acute angles, there is an increase in the angle of the perceived compared to the actual angle, but for the obtuse angles, the perceived is less than the actual angle. Second, the peak is around 20◦ for the largest positive displacement, and around 160◦ for the largest negative displacement. Third, there is a clear asymmetry in the magnitude of the displacement between the acute angles and the obtuse angles: the peak at 20◦ is greater in magnitude than the dip at 160◦ . As compared in Fig. 5, these computational results are consistent with results obtained in psychophysical experiments by Blakemore et al. (1970). In Blakemore et al. (1970), two parallel thin lines were used instead of the single thick bar used in our experiments. (Our current model is not constructed to deal with stimuli exactly the same as that of Blakemore et al. (1970), with the two parallel thin lines.) The justification for this slight modification is that the two parallel lines can be considered as the outline of a thick bar. 4.2 Experiment 2: modified Poggendorff illusion Disinhibition effect is the key observation leading to our extension of the angular expansion model, which is based on lateral inhibition alone. Because of disinhibition, when more than two lines or bars intersect, the perceived angle of the thin line will deviate from the case where only two lines or bars are present. Our computational model was set up to be consistent with the human experiment outlined in Sect. 2. In the human experiment, the widths of thick bars were 20 times that of the thin line, so we kept the same ratio to set the thickness of the input lines in the model. The thick bars were assigned 40 units in width, while the thin lines 2 units. The thickness of input lines can be controlled by the constant di in Eq. 6. Other parameters in Eq. 6 are free parameters, and the best fitting ones used in the resulting fit (as shown in Fig. 7) were as follows: η = 0.008, σ = 0.55, and k = 0.5. The resulting curve is generated by three inputs: a thin line (width = 2 units) with fixed orientation at 30◦ , a bar (width = 40 units)
123
80
Biol Cybern (2008) 98:75–85 2.5
a
B
0
50
O
A
0
1.5
b
1
−50
0
3
50
100
150
200
250
50
100
150
200
250
50
100
150
200
250
50
100
150
200
250
100
150
200
250
50 100 150 200 Orientation column position
250
30
2 1
0.5
−50
0
c
−0.5
Response
Perceived angle difference (degree)
2
−1 −1.5
0
50
100
150
200
d
Angle AOB (degree)
Fig. 5 The variations of perceived angle between two intersecting lines. The x-axis corresponds to the angle AO B (inset), from 0◦ to 180◦ . The y-axis is the difference between the perceived angle and the actual angle. The solid line is the result predicted by our model, and the data points asterisks and plus are data from human subjects in experiments by Blakemore et al. (1970). The curve was generated in one iteration with the following parameters: η = 0.009, k = 0.5, and σ = 1.0
e
60 40 20
20
−50 60 40 20 0
0
0
−50
0
3 2 1 0
32.5
−50
f
0
0
60
50
20
40
with fixed orientation at 0◦ , and a bar (width = 40 units) with a changing orientation from 0◦ to 180◦ . Figure 6 shows the experiment where a 20◦ bar was added. Figure 6a–c shows the initial activation of orientation columns to the three lines (first thick bar is 0◦ , the thin line is 30◦ , and the second thick bar is at 20◦ ). The peak in the orientation column response corresponds with the input orientations. Figure 6d–f shows the final response of the orientation columns after the disinhibition process. Note that the perceived thin line’s orientation (the red line in Fig. 6e inset) is slightly increased compared to that of the original input (green line in Fig. 6e inset), but the perceived bars’ orientation (the green line in Fig. 6d, f inset) are barely affected. It is because the bars’ input responses are much stronger than the thin line due to their thickness. Therefore, the proportion of change in the bars relative to their initial response is significantly smaller than that of the line, so the peak positions of the bars’ responses are not changed after the disinhibitory process. This experiment shows that the displacement of the peak positions before and after the disinhibition process can explain the angular perception at a neuronal level. Using the model of disinhibition applied to orientation columns, the angular displacement can be estimated mathematically. As shown in the model prediction results (blue curve, Fig. 7), the effect demonstrated in Fig. 2a is accurately predicted by the peak near 20◦ , and the effect in Fig. 2b by the valley near 50◦ . In a similar manner, the model can predict
123
20 0
−50
0
Fig. 6 Initial orientation column activations (a–c) and final responses of orientation columns after disinhibition (d–f) Initial excitation in responses to a the fixed 0◦ bar, b the 30◦ line, and c the 20◦ bar are shown. The peaks correspond exactly with the input orientations. Response after disinhibition in response to d the fixed 0◦ bar, e the 30◦ line, and f the 20◦ bar are shown. Note that in e, the peak position shifted to 32.5◦ , a 2.5◦ increase compared to b
the perceived angle when the angle between the thin line and horizontal bar is reduced. So, at least for these two cases, we can say that our disinhibition-based explanation is accurate. However, does the explanation hold for an arbitrary orientation of the second bar? To test this, we compared the results of our psychophysical experiments (Sect. 2) to that of the model prediction. The human results are shown as a red curve with error bars in Fig. 7. The peak (near 20◦ ) and valley (near 50◦ ) in Fig. 7 are apparent in the experimental data, and the overall shape of the curve closely agrees with the model prediction. The results show that our model of angular interaction based on disinhibition can accurately explain the modified Poggendorff illusion, and that low-level neurophysiology can provide us
Biol Cybern (2008) 98:75–85
81
a 40 η=0.003
4
η=0.009
38
η=0.015
3 2
Perceived angle (degree)
Perceived angle difference (degree)
5
1 0 −1 −2 −3
0
50
100
150
4.3 Experiment 3: influence of η and σ The standard deviation of the Gaussian σ and the inhibition strength η are two free parameters that can be used for the curve fitting in Fig. 7. The values of these two parameters are necessary in modulating the angles perceived from multiple lines, such as the sensitivity to the small angles, and the amount of distortion in orientation perception. Experiments with ferrets (Chapman et al. 1996) showed that the strength of orientation tuning in the cortex can change during development, and therefore mature orientation cells will be both sensitive to the small angles and at the same time minimize the distortion. These parameters are tuned throughout development, and we can also test similar effects in our simulations. The two experiments as shown in Fig. 8a and b tested different configurations of these two parameters in order to gain insights into how those parameters can affect orientation perception. Figure 8a shows how the inhibition strength η defines the magnitude of the curve predicted by the model. As η increases, the peak value of perceived angle becomes higher. In the tests of inhibition strength in Fig. 8a, we held σ to a constant (0.56 in those trials). Note that the x-axis location
34
32
30
26
0
50
100
150
Orientation of the second thick bar (degree)
b 35
σ=0.36 σ=0.46
34
σ=0.56
Perceived angle (degree)
with insights into the mechanisms underlying visual illusions with complex interactions.
η=0.030
28
Orientation of the second thick bar (degree)
Fig. 7 Perceived angle difference in the modified Poggendorff illusion. The results from the computational model (blue line) and human experiments (red line represents the mean, with error bars indicating the standard deviation, n ≈ 6) on the modified Poggendorff illusion (Fig. 2a) are plotted. The second thick bar was rotated while the perceived angle difference was measured. The x-axis indicates the angle of the second bar. The y-axis shows the deviation of the perceived angle of the thin 30◦ line from the reference angle (30◦ ). The model prediction and the human data are in close agreement. The parameters used in the computational experiment were as follows: η = 0.008, σ = 0.55, and k = 0.5
η=0.021
36
σ=0.66
33
σ=0.76
32
31
30
29
28
0
50
100
150
Orientation of the second thick bar (degree) Fig. 8 Perceived angles under various values of η and σ . a Perceived angles under various values of inhibition strength η (σ = 0.56 in all trials). b Perceived angles under various values of interaction width σ (η = 0.009 in all trials). See text for details
of the peak and the valley change only slightly as η is varied, and the inflection points near the valley stay constant. Figure 8b shows that the standard deviation for the orientation column’s activation profile defines the shape of the curve, for example, the x-axis positions of the peak and the valley, or the direction of the tail in the curve. When σ is small, which means the Gaussian curve of the orientation column excitation profile is narrow, there would be less interactions
123
82
across orientation columns. As a consequence, the crosscolumn inhibition will be limited only to a relatively short range. If the σ is larger, the cross-column inhibition can be effective in a wider range. The above observations suggest that the shape of the orientation column activation profile and cross-column inhibition strength could be the key factors which define human angular perception. Therefore, our model predicts that the effect of inhibition strength η can significantly affect the illusory effect. Moreover, the effect of angular disinhibition will differ depending on which part of the visual field the stimulus is present, e.g., fovea versus periphery, because the σ is effectively larger in the periphery than in the fovea (Westheimer 2003).
Biol Cybern (2008) 98:75–85
a
b
c
d
5 Discussion The study of Poggendorff illusion has a long history. One existing explanation of the angle expansion phenomenon in Poggendorff illusion is that it is due to lateral inhibition between orientation cells (Blakemore et al. 1970; Blakemore and Tobin 1972). The explanation is also known as angular displacement theory Zarándy et al. (1999). Our model is an extension of the angular displacement theory. The angular displacement theory has been disputed (e.g., as pointed out in Robinson 1998; Howe et al. 2005) because it cannot seem to explain the case where only the acute or the obtuse components are present. The Poggendorff illusion effect is apparently reduced when only acute angle components are present (Fig. 9a), but it is maintained when only obtuse angle components are shown (Fig. 9b). Our model, as presented here, is not able to replicate such an effect. However, there is an additional mechanism that can help address this issue. Zarándy et al. (1999) proposed that there is an apparent illusory shift or overestimation of the end position of the acute angles by endpoint detectors in the visual cortex (Fig. 9c), but the shift is not perceived with the obtuse angles. As shown in Fig. 9d, if we shift the expanded angle edges (the green dashed line) to the overestimated endpoints C and D (shifted as indicated by the red arrows), the line components appear collinear (as illustrated by the solid line in the middle of the two dashed line). Note that the amount of angle expansion (the dashed lines) is exaggerated here to better illustrate the angle expansion effect. In sum, when the two mechanisms of angular displacement and endpoint shift are combined, the Poggendorff illusion is reduced, and thus the line components appear collinear. Their discovery suggested that endpoint overestimation can neutralize the effects of angle expansion under acute-angle configurations, and that angular displacement theory can still be valid under usual configurations of the Poggendorff illusion. The illusory shifting mechanism by the cortical endpoint detector cells (Zarándy et al.
123
Fig. 9 The endpoint effect in the Poggendorff illusion. a The reduced Poggendorff illusion effect with acute angle components only. b The illusion effect is not reduced with obtuse angle components. c The positions of the acute angle endpoint can be overestimated. For example, the point B appears to be on the right of point A, but actually they are on the same vertical line. Redrawn from Zarándy et al. (1999). d The combination of shift in endpoint and angle expansion mechanisms may explain why the illusion is reduced in a. The overestimated endpoints are labeled as C and D, and the expanded angles are illustrated by the green dashed line
1999), which may account for the reduced illusory effect of acute angle components, is not included in our current model, and is a matter of future work. There are several other existing theories explaining the Poggendorff illusion. For example, Gillam (1971, 1980) proposed a depth processing theory, and suggested that the Poggendorff illusion was due to the bias from three dimensional perception. Hansen (2002) recently revisited the 3D theory and argued that the illusion is due to the disparity-induced junction displacement in 3D affecting 2D perception. Morgan (1999) explained the illusion based on bias in the estimation of the orientation of virtual lines by second-stage filters. On the other hand, Fermüller and Malm (2004) proposed that noise and uncertainty in the formation and processing of images caused a bias in perception of the line orientation. Howe et al. (2005) explained the illusion based on natural scene geometry using statistics of natural images. They showed that the location of a thin line segment across a thick bar in natural environments has the highest possibility away from the collinear point. The bias in the geometric perception in natural scene matched well with the shifting of
Biol Cybern (2008) 98:75–85
the thin line in Poggendorff illusion. Indeed, the above theories successfully explain possible sources of bias formed in the Poggendorff illusion, but none of those provide an explanation of Poggendorff illusion at a neurophysiological level, nor did they explicitly model the disinhibitory effect as presented here. For example, it is hard to see how a 3D stimulus analogous to Fig. 2 can be constructed to give a disinhibitory effect. The model we have presented here is based on angular inhibition which takes into account the disinhibition effect, and the soundness of the theoretical extension lies in physiological and psychological facts. Our model was based on the Limulus visual system (an invertebrate), so the generalizability of the results to human vision may be a concern. However, it is also known that disinhibition exists in the vertebrate visual system as well, such as in the visual cortical column of cats (Hubel and Wiesel 1962; Blakemore and Tobin 1972; Li et al. 1992; Kolb and Nelson 1993) tiger salamanders (Roska et al. 1998) and in mice (Frech et al. 2001). It is also known that the opposite directions of the same orientation evoke an asymmetric response (Alonso and Martinez 1998; Judge et al. 1980). Based on these, our model can correctly replicate disinhibition caused by more than two lines intersecting and the results match our own experimental data obtained by the same kind of stimuli. One possible limitation of our model is that it addresses the perceptual interaction in orientation space only (i.e., it is spatially zero-dimensional). However, it has the potential to be further extended to account for position (retinotopy) or phase variations in the spatial domain. Orientation sensitivity is different at the fovea and in the periphery in the visual field. In the periphery, the orientation detector is more sensitive to radially-oriented lines than tangential ones (Westheimer 2003). A function of σ (see Eq. 2) can be defined as a response to the location (x, y) of angle intersection, and the corresponding line’s directions. Therefore, we can use smaller σ in the orientation profile for radially-oriented lines, and larger one for tangential ones, thus taking into account the position information. Moreover, we can use a bank of Gabor filters with a set of orientations and multiple scales to automatically derive the orientation profiles from the spatial domain rather than directly dealing with orientations. The output of the Gabor filters can be interpolated to get a higher resolution in orientation space, and subsequently be used as the input to our current model (the ei part in Eq. 8). Therefore, the factors of scale, phase, and orientation can be unified in such an extended model. Besides the Poggendorff illusion, our model has the potential for explaining other geometrical illusions. Fermüller and Malm (2004) showed a variation of the café-wall illusion where adding some dots in strategic places significantly reduced the perceived distortion. Such a correctional effect can potentially be explained by our model. Because the newly
83
introduced dots give rise to a new orientation component (as the second thick bar did in our modified Poggendorff illusion), the disinhibitory effect caused by that new orientation can reduce the perceptual distortion formed by the existing orientation components. Another example is the Wundt– Hering illusion. In this illusion two horizontal lines (intersecting with lines radiating from the center) are seen as curved when in reality they are straight. It is known that lateral inhibition underlies this illusion (Coren 1970,1999). Our model would predict that adding small oriented line segments at each intersection will reduce the illusory effect through disinhibition. The well-known Zöllner illusion is also thought to be due to interacting orientation-sensitive neurons (Oyama 1975), thus it can be altered in a similar manner as described above. There are many other illusions of the same type (angular misperception), that can be potentially explained by our model. See Prinzmetal and Beck (2001) for a comprehensive discussion of these other illusions, including tilt induction. Finally, our model makes two novel predictions, regarding the influence of (1) the strength of lateral inhibition (η) and (2) the width of the orientation tuning curve (σ in our model). In the following, we will discuss more specifically how these two model parameters can be modulated in psychophysical and neurophysiological experiments, and what are the expected experimental results. First, the strength of neural inhibition can significantly affect the illusory effect (Fig. 8a). The stronger the inhibition, the larger the magnitude of the curve in Fig. 8a, while the locations of the peak and the valley of the curve are not affected. This observation may not be verified through purely psychological experiments, however, if combined with physiological experiments, it may be possible to test. An animal can be trained to select an apparently collinear lines over misaligned ones in a non-illusory task. In the first test, the stimulus can be presented in the periphery of the animal’s visual field, where the tuning curve has a standard deviation measured as σ1 . Next, record the animal’s choices for the stimulus configured as in Fig. 8a. Then, in the second test, apply drugs to affect GABA receptors in the animal’s primary visual cortex to reduce the strength of inhibition. There are many drugs can influence GABA neurons, e.g., bicuculline, alcohol, and lorazepam. It is known that lorazepam can even directly enhance the tilt illusion (Gelbtuch et al. 1986). The modification of GABA receptor may also affect the change of the orientation tuning curve’s standard deviation, and therefore we may need to move the stimulus toward/backward the fovea region to increase/decrease the reduced standard deviation value. (In the fovea, the tuning curve’s standard deviation is smaller, which will be explained below.) We can stop moving the stimulus right at the location where the receptive field has the same tuning width as in the first test σ1 . Again, first record the animal’s choices for the stimulus configured as in Fig. 8a. Comparing the two results obtained in those
123
84
two tests, in which both of the standard deviations are the same but the inhibition strengths are different, the result of the first test is expected to show a larger magnitude in angle expansion than that of the second one due to the reduced inhibition strength in the second test. Furthermore, the peak and the valley position will not change as predicted in Fig. 8a. It is also possible that the inhibitory effects can be modulated by altering the illumination (the “shaped” intermittent illumination) as shown by Coren (1999). Coren showed that the use of these illumination techniques to enhance lateral inhibition made the Wundt–Hering illusion effect stronger, while other illusions that do not depend on lateral inhibition (e.g., the Delboeuf illusion tested by Coren) resulted in no change in the strength of illusion. Second, an illusory effect can heavily depend on the standard deviation σ of the orientation tuning curve. The value of σ can affect both the magnitude and the x-locations of the peak and valleys in Fig. 8b. Further psychological experiment could be conducted to verify this observation. The perceived orientation variations can be compared under two conditions: one is to present the stimulus to the fovea, and the other is to present it to the periphery. In the fovea area, it is supposed to have smaller σ than that in the periphery. Therefore, according to our computational experiment based on the value of σ (Fig. 8b), we are expecting that in the periphery there is stronger illusion (larger magnitude) than in the fovea; and also in the periphery, the second thick bar must have a higher degree of orientation than in the fovea to make the thin line’s orientation to be maximally enhanced and while a lower degree of the second thick bar in the periphery can get the thin line’s orientation maximally reduced (i.e., locations of the peak and valley will change).
6 Conclusion In this paper, we presented a neurophysiologically based model of disinhibition to account for a modified version of the Poggendorff illusion. The model was able to accurately predict a subtle orientation interaction effect, closely matching the psychophysical data we collected. We expect the model to be general enough to account for other kinds of geometrical illusions as well, where multiple stimulus components are nontrivially interacting. Acknowledgments This research was supported in part by Texas A&M University, by the Texas Higher Education Coordinating Board grant ATP#000512-0217-2001, and by the National Institute of Mental Health Human Brain Project grant #1-R01-MH66991. This paper is a significantly expanded version of Yu and Choe (2004). The authors wish to thank the anonymous reviewers for helpful suggestions, including the discussion on angular illusions (Zöllner, and Wundt–Hering illusion, etc.) as well as the inhibitory effects of alcohol (see the “Discussion”).
123
Biol Cybern (2008) 98:75–85
References Alonso J, Martinez LM (1998) Functional connectivity between simple cells and complex cells in cat striate cortex. Nat Neurosci 1:395– 403 Blakemore C, Tobin EA (1972) Lateral inhibition between orientation detectors in the cat’s visual cortex. Exp Brain Res 15:439–440 Blakemore C, Carpenter RH, Georgeson MA (1970) Lateral inhibition between orientation detectors in the human visual system. Nature 228:37–39 Brodie S, Knight BW, Ratliff F (1978) The spatiotemporal transfer function of the limulus lateral eye. J Gen Physiol 72:161–202 Carpenter RH, Blakemore C (1973) Interactions between orientations in human vision. Exp Brain Res 18:287–303 Chapman B, Stryker MP, Bonhoeffer T (1996) Development of orientation preference maps in ferret primary visual cortex. J Neurosci 16:6643–6653 Coren S (1970) Lateral inhibition and the Wundt–Hering illusion. Psychonomic Sci 18:341 Coren S (1999) Relative contribution of lateral inhibition to the Delboeuf and Wundt–Hering illusions. Percept Motor Skills 99:771– 784 Everson RM, Prashanth AK, Gabbay M, Knight BW, Sirovich L, Kaplan E (1998) Representation of spatial frequency and orientation in the visual cortex. Proc Natl Acad Sci USA 95:8334–8338 Fermüller C, Malm H (2004) Uncertainty in visual processes predicts geometrical optical illusions. Vis Res 44:727–749 Frech MJ, Perez-Leon J, Wassle H, Backus KH (2001) Characterization of the spontaneous synaptic activity of amacrine cells in the mouse retina. J Neurophysiol 86:1632–1643 Gelbtuch MH, Calvert JE, Harris JP, Phillipson OT (1986) Modification of visual orientation illusions by drugs which influence dopamine and gaba neurones: differential effects on simultaneous and successive illusions. J Psychopharm 90(3):379–383 Gillam B (1971) A depth processing theory of the Poggendorff illusion. Percep Psychophys 10:211–216 Gillam B (1980) Geometric illusions. Sci Am 242:102–111 Hansen T (2002) A new theory of the Poggendorff illusion based on stereoscopic vision. In: Proceedings of 5. Tübinger Wahrnehmungskonferenz Hartline HK, Ratliff F (1957) Inhibitory interaction of receptor units in the eye of Limulus. J Gen Physiol 40:357–376 Hartline HK, Ratliff F (1958) Spatial summation of inhibitory influcences in the eye of limulus, and the mutual interaction of receptor units. J Gen Physiol 41:1049–1066 Hartline HK, Wager H, Ratliff F (1956) Inhibition in the eye of limulus. J Gen Physiol 39:651–673 Howe CQ, Yang Z, Purves D (2005) The Poggendorff illusion explained by natural scene geometry. Proc Natl Acad Sci USA 102:7707– 7712 Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol (London) 160:106–154 Judge SJ, Wurtz RH, Richmond BJ (1980) Vision during saccadic eye movements. I. Visual interactions in striate cortex. J Neurophysiol 43:1133–1155 Kolb H, Nelson R (1993) Off-alpha and off-beta ganglion cells in the cat retina. J Comp Neurol 329:85–110 Li CY, Zhou YX, Pei X, Qiu IY, Tang CQ, Xu XZ (1992) Extensive disinhibitory region beyond the classical receptive field of cat retinal ganglion cells. Vis Res 32:219–228 Martinez LM, Alonso J, Reid RC, Hirsch JA (2002) Laminar processing of stimulus orientation in cat visual cortex. J Physiol 540.1:321– 333
Biol Cybern (2008) 98:75–85 Morgan M (1999) The poggendorff illusion: a bias in the estimation of the orientation of virtual lines by second-stage filters. Vis Res 39:2361–2380 Neumann H, Pessoa L, Hansen T (1999) Interaction of on and off pathways for visual contrast measurement. Biol Cybern 81:515– 532 Oyama T (1975) Determinants of the Zöllner illusion. J Psychol Res 37:261–280 Prinzmetal W, Beck DM (2001) The tilt-constancy theory of visual illusions. J Exp Psychol Human Perception Perform 27:206–217 Robinson JO (1998) The psychology of visual illusion. Dover, Mineola, NY Roska B, Nemeth E, Werblin F (1998) Response to change is facillitated by a three-neuron disinhibitory pathway in the tiger salamander retina. J Neurosci 18:3451–3459 Serre T, Riesenhuber M (2004) Realistic modeling of simple and complex cell tuning in the HMAX model, and implications for invariant object recognition in cortex. Technical report, CBCL paper 239/AI Memo 2004-017, Massachusetts Institute of Technology, Cambridge, MA Snippe HP (1996) Parameter extraction from population codes: a critical assessment. Neural Comput 8(3):511–529
85 Stevens CF (1964) A quantitative theory of neural interactions: theoretical and experimental investigations. Ph.D. thesis, The Rockefeller Institute Tolansky S (1964) Optical illusions. Pergamon, London Westheimer G (2003) The distribution of preferred orientations in the peripheral visual field. Vis Res 43:53–57 Yu Y (2006) Computational role of disinhibition in brain function. Ph.D. thesis, Department of Computer Science, Texas A&M University Yu Y, Choe Y (2004) Angular disinhibition effect in a modified Poggendorff illusion. In: Proceedings of the 26th annual conference of the Cognitive Science Society, pp 1500–1505 Yu Y, Choe Y (2006) A neural model of the scintillating grid illusion: disinhibition and self-inhibition in early vision. Neural Comput 18:521–544 Yu Y, Yamauchi T, Choe Y (2004) Explaining low level brightness– contrast visual illusion using disinhibition. In: Ijspeert AJ, Murata M, Wakamiya N (eds) Biologically inspired approaches to advanced information technology, lecture notes in computer science, vol 3141, pp 166–175 Zarándy A, Orzó L, Grawes E, Werblin F (1999) CNN-based models for color vision and visual illusions. IEEE Trans Circuits Syst I Fundam Theor Appl 46:229–238
123