Chapter 12 Auditory Localization These sounds at different locations create an auditory space , which exists all around, wherever there is sound. This locating of sound sources in auditory space is called auditory localization . Comparing location information for vision and hearing. Vision: The bird and the cat, which are located at different places, are imaged on different places on the retina. Hearing: The frequencies in the sounds from the bird and cat are spread out over the cochlea, with no regard to the animals’ locations. This means that two tones with the same frequency that originate in different locations will activate the same hair cells and nerve fibers in the cochlea. The auditory system must therefore use other information to determine location. The information it uses involves location cues that are created by the way sound interacts with the listener’s head and ears. There are two kinds of location cues: binaural cues, which depend on both ears, and monaural cues, which depend on just one ear. Researchers studying these cues have determined how well people can locate the position of a sound in three dimensions: the azimuth , which extends from left to right; elevation , which extends up and down; and the distance of the sound source from the listener. In this chapter, we will focus on the azimuth and elevation. Binaural Cues for Sound Localization Binaural cues use information reaching both ears to determine the azimuth (left right position) of sounds. The two binaural cues are interaural time difference and interaural level difference. Both are based on a comparison of the sound signals reaching the left and right ear Interaural Time Difference The interaural time difference (ITD) is the difference between when a sound reaches the left ear and when it reaches the right ear If the source is located directly in front of the listener, at A, the distance to each ear is the same; the sound reaches the left and right ears simultaneously, so the ITD is zero. However, if a source is located off to the side, at B, the sound reaches the right ear before it reaches the left ear. The magnitude of the ITD can be used as a cue to determine a sound’s location ITD is an effective cue for localizing lowfrequency sounds Interaural Level Difference The other binaural cue, interaural level difference (ILD) , is based on the difference in the sound pressure level (or just “level”) of the sound reaching the two ears. A difference in level between the two ears occurs because the head is a barrier that creates an acoustic shadow , reducing the intensity of sounds that reach the far ear. This reduction of intensity at the far ear occurs for high frequency sounds, but not for lowfrequency sounds Example: Because the ripples are small compared to the boat, they bounce off the side of the boat and go no further. Now imagine the same ripples approaching the cattails in Figure 12.5d. Because the distance between the ripples is large
compared to the stems of the cattails, the ripples are hardly disturbed and continue on their way. Notice that at high frequencies, there is a large difference between the ILD for sounds located at 10 degrees (green curve) and 90 degrees (blue curve). At lower frequencies, however, there is a smaller difference between the ILDs for sounds coming from these two locations until, at very low frequencies, the ILD is a very poor indicator of a sound’s location. The Cone of Confusion Because the time and level differences can be the same at a number of different elevations, they cannot reliably indicate the elevation of the sound source. Similar ambiguous information is provided when the sound source is off to the side. These places of ambiguity are illustrated by the cone of confusion All points on this cone have the same ILD and ITD In other words, there are many locations in space where two sounds could result in the same ILD and ITD. Monaural Cue for Localization Monaural cue —a cue that depends on information from only one ear. The primary monaural cue for localization is called a spectral cue , because the information for localization is contained in differences in the distribution (or spectrum) of frequencies that reach each ear from different locations. Differences in the way the sounds bounce around within the pinna create different patterns of frequencies for the two locations ( King et al., 2001). The importance of the pinna for determining elevation has been demonstrated by showing that smoothing out the nooks and crannies of the pinnae with molding compound makes it difficult to locate sounds along the elevation coordinate They determined how localization changes when the mold is worn for several weeks, and then what happens when the mold is removed. After measuring initial performance, Hofman fitted his listeners with molds that altered the shape of the pinnae and therefore changed the spectral cue. Localization performance is poor for the elevation coordinate immediately after the mold is inserted, but locations can still be judged at locations along the azimuth coordinate Apparently, the person had learned, over a period of weeks, to associate new spectral cues to different directions in space. Localization remained excellent immediately after removal of the ear molds The Physiology of Auditory Localization The Auditory Pathway and Cortex The auditory nerve carries the signals generated by the inner hair cells away from the cochlea and toward the auditory receiving area in the cortex. Auditory nerve fibers from the cochlea synapse in a sequence of subcortical structures —structures below the cerebral cortex. This sequence begins with the cochlear nucleus and continues to the superior olivary nucleus in the brain stem, the inferior colliculus in the midbrain, and the medial geniculate nucleus in the thalamus.
From the medial geniculate nucleus, fibers continue to the primary auditory cortex (or auditory receiving area, A1 ), in the temporal lobe of the cortex. Acronym SONIC MG (a very fast sports car), which represents the three structures between the cochlear nucleus and the auditory cortex, as follows: SON = superior olivary nucleus; IC = inferior colliculus; MG = medial geniculate nucleus. Processing in the superior olivary nucleus is important for binaural localization because it is here that signals from the left and right ears first meet Auditory signals arrive at the primary auditory receiving area (A1) in the temporal lobe and then travel to other cortical auditory areas: o the core area , which includes the primary auditory cortex (A1) and some nearby areas; o the belt area , which surrounds the core, and o the parabelt area, receives signals from belt area
The Jeffress Neural Coincidence Model The Jeffress model of auditory localization proposes that neurons are wired so they each receive signals from the two ears How the circuit proposed by Jeffress operates. Axons transmit signals from the left ear (blue) and the right ear (red) to neurons, indicated by circles. o Sound in front. Signals start in left and right channels simultaneously. o Signals meet at neuron 5, causing it to fire. o Sound to the right. Signal starts in the right channel first. o Signals meet at neuron 3, causing it to fire. (Coming from the right, it gets a head start) This neuron and the others in this circuit are called coincidence detectors , because they only fire when both signals coincide by arriving at the neuron simultaneously. The firing of neuron 5 indicates that ITD = 0. This has been called a “place code” because ITD is indicated by the place (which neuron) where the activity occurs. One way to describe the properties of ITD neurons is to measure ITD tuning curves , which plot the neuron’s firing rate against the ITD. Graph: IDT vs. Firing Rate Broad ITD Tuning Curves in Mammals The “range” indicator below each curve indicates that the gerbil curve is much broader than the owl curve. The gerbil curve is, in fact, broader than the range of ITDs that typically occur in the environment Responses recorded from a neuron in the left auditory cortex of the monkey to sounds originating at different places around the head. The firing of a single cortical neuron to a sound presented at different locations around the monkey’s head is shown by the records at each location. his neuron responds to sounds coming from a number of locations on the right. According to this idea, there are broadly tuned neurons in the right hemisphere that respond when sound is coming from the left and broadly tuned neurons in the left hemisphere that respond when sound is coming from the right.
To summarize research on the neural mechanism of binaural localization, we can conclude that it is based on sharply tuned neurons for birds and broadly tuned neurons for mammals. The code for birds is a place code because the ITD is indicated by firing of neurons at a specific place. The code for mammals is a distributed code because the ITD is determined by the firing of many broadly tuned neurons working together
Localization in Area A1 and the Auditory Belt Area Found that destroying A1 decreased, but did not totally eliminate, the ferrets’ ability to localize sounds. Showed that deactivating A1 in cats by cooling the cortex results in poor localization These studies also showed that destroying or deactivating areas outside A1 affected localization. Gregg Recanzone (2000) compared the spatial tuning of neurons in A1 and neurons in the posterior area of the belt. He found that neurons in A1 respond when a sound is moved within a specific area of space and don’t respond outside that area. When he then recorded from neurons in the posterior belt area, he found that these neurons respond to sound within an even smaller area of space, indicating that spatial tuning is better in the posterior belt area. Thus, neurons in the belt area provide more precise information than A1 neurons about the location of sound sources. Moving Beyond the Temporal Lobe: Auditory Where (and What) Pathways Two auditory pathways extend from the temporal lobe to the frontal lobe. These pathways, like the what and where pathways The what pathway, which starts in the front (anterior) part of the core and belt and extends to the prefrontal cortex. The what pathway is responsible for identifying sounds. The where pathway, which starts in the rear (posterior) part of the core and belt and extends to the prefrontal cortex. This is the pathway associated with locating sounds. Thus, the posterior belt is associated with spatial tuning, and the anterior belt is associated with identifying different types of sounds. This difference between posterior and anterior areas of the belt represents the difference between where and what auditory pathways. temporarily deactivating a cat’s anterior auditory areas by cooling the cortex disrupts the cat’s ability to tell the difference between two patterns of sounds, but does not affect the cat’s ability to localize sounds. Conversely, deactivating the cat’s posterior auditory areas disrupts the cat’s ability to localize sounds, without affecting the cat’s ability to tell the difference between different patterns of sounds Lesion and cooling studies indicate that A1 is important for localization. However, additional research indicates that processing information about location also occurs in the belt area and then continues farther in the where processing stream, which extends from the temporal lobe to the prefrontal area in the frontal lobe.
Hearing Inside Rooms If you are listening to someone playing a guitar on an outdoor stage, some of the sound you hear reaches your ears after being reflected from the ground or objects like trees, but most of the sound travels directly from the sound source to your ears (Figure 12.20a). If, however, you are listening to the same guitar in an auditorium, then a large proportion of the sound bounces off the auditorium’s walls, ceiling, and floor before reaching your ears The sound reaching your ears directly, along path 1, is called direct sound ; the sound reaching your ears later, along paths like 2, 3, and 4, is called indirect sound . Perceiving Two Sounds That Read the Ears at Different Times The speaker on the left is the lead speaker, and the one on the right is the lag speaker. If a sound is presented in the lead speaker followed by a long delay (tenths of a second), and then a sound is presented in the lag speaker, listeners typically hear two separate sounds—one from the left (lead) followed by one from the right (lag). But when the delay between the lead and lag sounds is much shorter, something different happens. Even though the sound is coming from both speakers, listeners hear the sound as coming only from the lead speaker. This situation, in which the sound appears to originate from the lead speaker, is called the precedence effect because we perceive the sound as coming from the source that reaches our ears first The precedence effect governs most of our indoor listening experience The precedence effect means that we generally perceive sound as coming from its source, rather than from many different directions at once. Architectural Acoustics Architectural acoustics , the study of how sounds are reflected in rooms, is largely concerned with how indirect sound changes the quality of the sounds we hear in rooms. The major factors affecting indirect sound are the size of the room and the amount of sound absorbed by the walls, ceiling, and floor. If most of the sound is absorbed, then there are few sound reflections and little indirect sound. If most of the sound is reflected, there are many sound reflections and a large amount of indirect sound. Another factor affecting indirect sound is the shape of the room. This determines how sound hits surfaces and the directions in which it is reflected.
he amount and duration of indirect sound produced by a room is expressed as reverberation time —the time it takes for the sound to decrease to 1/1000th of its original pressure (or a decrease in level by 60 dB). If the reverberation time of a room is too long, sounds become muddled because the reflected sounds persist for too long. In extreme cases, such as cathedrals with stone walls, these delays are perceived as echoes, and it may be difficult to accurately localize the sound source. If the reverberation time is too short, music sounds “dead,” and it becomes more difficult to produce highintensity sounds.
Acoustics in Concert Halls Intimacy time: The time between when sound arrives directly from the stage and when the first reflection arrives. This is related to reverberation but involves just comparing the time between the direct sound and the first reflection, rather than the time it takes for many reflections to die down. Bass ratio: The ratio of low frequencies to middle frequencies that are reflected from walls and other surfaces. Spaciousness factor: The fraction of all of the sound received by a listener that is indirect sound. They confirmed that the best concert halls had reverberation times of about 2 seconds, but they found that 1.5 seconds was better for opera houses, with the shorter time being necessary to enable people to hear the singers’ voices clearly. They also found that intimacy times of about 20 ms and high bass ratios and spaciousness factors were associated with good acoustics Acoustics in Lecture Halls The ideal reverberation time for a small classroom is about 0.4 to 0.6 seconds, and for an auditorium about 1.0 to 1.5 seconds. These are less than the 2.0second optimum for concert halls because the goal is not to create a rich musical sound, but to create an environment in which students can hear what the teacher is saying. These sounds, called background noise, include noisy ventilation systems, students talking in class (when they aren’t supposed to!), and noise from the hall and adjacent classrooms. The presence of background noise has led to the use of signaltonoise (S/N) ratio in designing classrooms. The S/N ratio is the level of the teacher’s voice in dB minus the level of the background noise in the room. Ideally, the S/N ratio is +10 to +15 dB or more. At lower S/N ratios, students may have trouble hearing what the teacher is saying. Auditory Organization: Scene Analysis The Problem of Auditory Scene Analysis The array of sound sources at different locations in the environment is called the auditory scene , and the process by which the stimuli produced by each of the sources in the scene are separated is called auditory scene analysis Auditory scene analysis poses a difficult problem because the sounds from different sources are combined into a single acoustic signal, so it is difficult to tell which part of the signal is created by which source just by looking at the waveform of the sound stimulus Each musician produces a sound stimulus, but these signals are combined into one signal, which enters the ear. Separating the Sources Location One way to analyze an auditory scene into its separate components would be to use information about where each source is located. According to this idea, you can separate the sound of the vocalist from the sound of the guitar based on localization cues such as the ITD and ILD
The cue of location helps us separate them perceptually. In addition, when a source moves, it typically follows a continuous path rather than jumping erratically from one place to another Onset Time As mentioned above, if two sounds start at slightly different times, it is likely that they came from different sources. This occurs often in the environment, because sounds from different sources rarely start at exactly the same time. When sound components do start together, it is likely that they are being created by the same source Pitch and Timbre Sounds that have the same timbre or pitch range are often produced by the same source Composers in the Baroque period (1600–1750) knew that when a single instrument plays notes that alternate rapidly between high and low tones, the listener perceives two separate melodies, with the high notes perceived as being played by one instrument and the low notes as being played by another This separation of different sound sources into perceptually different streams, called implied polyphony or compound melodic line by musicians, is called auditory stream segregation by psychologists When the highpitched tones were slowly alternated with the lowpitched tones, as in Figure 12.24a, the tones were heard in one stream, one after another But when the tones were alternated very rapidly, the high and low tones became perceptually grouped into two auditory streams; the listener perceived two separate streams of sound, one highpitched and one lowpitched, occurring simultaneously This demonstration shows that stream segregation depends not only on pitch but also on the rate at which tones are presented. Thus, returning to the Bach composition, the high and low streams are perceived to be separate if they are played rapidly, but not if they are played slowly.
However, when the frequencies of the two stimuli become similar, something interesting happens. Grouping by similarity of pitch occurs, and perception changes to a backandforth “galloping” between the tones of the two streams. Then, as the scale continues upward so the frequencies become more separated, the two sequences are again perceived as separated. Another example of how similarity of pitch causes grouping is an effect called the scale illusion , or melodic channeling . Diana Deutsch (1975, 1996) demonstrated this effect by presenting two sequences of notes simultaneously through earphones, one to the right ear and one to the left Notice that the notes presented to each ear jump up and down and do not create a scale. However, Deutsch’s listeners perceived smooth sequences of notes in each ear, with the higher notes in the right ear and the lower ones in the left ear (Figure 12.26b). Even though each ear received both high and low notes, grouping by similarity of pitch caused listeners to group the higher notes in the right ear (which started with a high note) and the lower notes in the left ear (which started
with a low note). Auditory Continuity Sounds that stay constant or that change smoothly are often produced by the same source Sound stimuli with the same frequency or smoothly changing frequencies are perceived as continuous even when they are interrupted by another stimulus Listeners perceived these tones as stopping during the silence. But when Warren filled in the gaps with noise, listeners perceived the tone as continuing behind the noise Experience The effect of past experience on the perceptual grouping of auditory stimuli can be demonstrated by presenting the melody of a familiar song When people first hear these notes, they find it difficult to identify the song. But once they have heard the song as it was meant to be played, they can follow the melody in the octavejumping This is an example of the operation of a melody schema —a representation of a familiar melody that is stored in a person’s memory. When people don’t know that a melody is present, they have no access to the schema and therefore have nothing with which to compare the unknown melody. But when they know which melody is present, they compare what they hear to their stored schema and perceive the melody Auditory Organization: Perceiving Meter This series of changes across time is called the rhythmic pattern . Different singers might change the rhythmic pattern by holding some notes longer, making others shorter, or adding pauses. Thus, any song or instrumental piece has its own rhythmic pattern, which depends on how the song is written and how it is performed. The underlying beat of the music, called the metrical structure , is indicated by the red arrows below The StarSpangled Banner in Figure 12.29. Note that the metrical structure is not the same thing as the notes or the rhythmic pattern, because you can feel a beat even if there are pauses in the sound during a song. Two common meters are duple (represented in musical notation by a 2/4 time signature, which indicates that there are two beats per measure, as in a march) and triple (represented by a 3/4 time signature, which indicates that there are three beats per measure, as occurs in a waltz) Metrical Structure and the Mind Metrical structure is indicated by the time signature of the composition and, in performance, is typically achieved by accentuating some notes by using a stronger attack or by playing them louder or longer. Thus, even though the metronome creates a series of identical beats with regular spacing, we can perceive the beats in duple meter (TICKtoc) or, with a small amount of effort, in triple meter (TICKtoctoc) This ability to change metrical structure even when the physical stimulus remains the same is similar to what happens for the visual facevase display in Figure
5.25. Examples of ambiguous stimuli, because they can be perceived in more than one way.
Metrical Structure and Movement While these infants listened to a regular repeating ambiguous rhythm that had no accents, they were bounced up and down in the arms of the experimenter. These bounces occurred either in a duple pattern (a bounce on every second beat) or in a triple pattern (a bounce on every third beat). After being bounced for 2 minutes, the infants were tested to determine whether this movement caused them to hear the ambiguous pattern in groups of two or in groups of three. To do this, the infants were tested to determine whether they preferred listening to the pattern with accents that corresponded to how they had been bounced. This preference was determined by using a headturning preference procedure. PhillipsSilver and Trainor found that infants listened to the pattern they had been bounced to for an average of 8 seconds but only listened to the other pattern for an average of 6 seconds. The infants therefore preferred the pattern they had been bounced to Apparently moving is the key to influencing metrical grouping. Based on the results of these and other experiments, PhillipsSilver and Trainor concluded that the crucial factor that causes movement to influence the perception of metrical structure is stimulation of the vestibular system —the system that is responsible for balance and sensing the position of the body. Metrical Structure and Language Therefore, the dominant stress pattern in English is shortlong (unaccented accented), but in Japanese it is longshort (accentedunaccented). Comparisons of how native Englishspeakers and Japanesespeakers perceive metrical grouping supports the idea that the stress patterns in a person’s language can influence the person’s perception of grouping. The results indicated that Englishspeakers were more likely to perceive the grouping as shortlong and Japanese speakers were more likely to perceive the grouping as longshort Returning to the Coffee Shop The first problem was the problem of auditory localization: Where is each of the sounds you can hear in the coffee shop coming from? We saw that one solution to this problem involves comparing the sounds that reach the left and right ears, and another solution involves using spectral cues. The second problem was how to deal with sound reflected from surfaces such as the walls of a room. This problem is solved by a mechanism that creates the precedence effect, which causes the auditory system to give preference to the first sound that arrives. The third problem, the problem of auditory scene analysis, occurs because all of the sounds in the environment are combined, as illustrated for the trio in Figure 12.22. This is a problem of perceptual organization, because the goar is to
separare the sound created by each source from this combined signal. The auditory system solves this problem by using a number of different cues, such as location, timing, pitch, continuity, and experience to separate the individual sources. Finally, there is the problem of ongoing sequences of sounds in time, which creates a specific beat or “time signature” that organizes the music coming from the coffee shop’s speakers. The research we have described on metrical structure shows that people can shift from one meter to another mentally, and also that meter can be influenced by information provided by past experience with a particular language (as demonstrated by comparing English and Japanese speakers)
Something to Consider: Connections Between Hearing and Vision We see people’s lips move as we listen to them speak; our fingers feel the keys of a piano as we hear the music the fingers are creating; we hear a screeching sound and turn to see a car coming to a sudden stop. All of these combinations of hearing and other senses are examples of multisensory interactions . Hearing and Vision Perception The ventriloquism effect , or visual capture , is an example of vision dominating audition. It occurs when sounds coming from one place (the ventriloquist’s mouth) appear to come from another place (the dummy’s mouth). Movement of the dummy’s mouth “captures” the sound In these examples, the sound, even if it actually originates from another location, is captured by vision. Note that because virtually all theaters now have stereophonic sound, binaural cues contribute to the match between sound position and characters on the screen. But vision doesn’t always win out over hearing. Consider, for example, the two flash illusion , which occurs when a single flash is accompanied by two tones and the subject perceives two flashes. In this case, hearing modifies vision. Two conditions in the Sekuler et al. (1997) experiment showing successive positions of two balls that were presented so they appeared to be moving. No sound condition: the two balls were perceived to pass each other and continue moving in a straightline motion. Clickadded condition: observers were more likely to see the balls as colliding. Hearing and Vision Physiology These connections between sensory areas contribute to coordinated receptive fields (RFs) like the ones shown in Figure 12.34 for a neuron in the monkey’s parietal lobe that responds to both visual stimuli and sound This neuron responds when an auditory stimulus is presented in an area that is below eye level and to the left (Figure 12.34a) and when a visual stimulus originates from about the same area (Figure 12.34b). Figure 12.34c shows that there is a great deal of overlap between these two receptive fields. The multisensory neurons that fire to both sound and vision help us form a single representation of space that
involves both auditory and visual stimuli.
Another example of cross talk between the senses occurs when the primary receiving area associated with one sense is activated by stimuli that are usually associated with another sense. The blind people make a clicking sound with their tongue and mouth and listen for echoes. Skilled echolocators can detect the positions and shapes of objects as they move through the environment. Not surprisingly, they found that the sounds activated the auditory cortex in both the blind and sighted subjects. However, the visual cortex was strongly activated in the echolocators but was silent in the control subjects Thus, when sound is used to achieve spatial awareness, the visual cortex becomes involved.