Natural selection and cultural selection in the ... - Semantic Scholar

Report 9 Downloads 58 Views
Natural selection and cultural selection in the evolution of communication Kenneth Smith Language Evolution and Computation Research Unit Department of Theoretical and Applied Linguistics University of Edinburgh [email protected]

Phone: +44 131 650 6657 Fax: +44 131 650 3962 Draft of paper to be submitted to Adaptive Behavior

Abstract

Computational simulations of natural selection for communicative success show that natural selection alone is capable of evolving optimal communication systems. Simulations of the interactions between natural selection and learning show that the biases of learners, when placed in the framework of iterated cultural transmission of communication systems, result in cultural selection of communication systems. Cultural selection may be in opposition to natural selection as communication systems which are optimal in terms of cultural selection may not be optimal in terms of natural selection. Cultural selection is the determining factor in the development of communication systems in the simulated populations, with natural selection being relegated to a secondary role. This paper suggests that the role of cultural selection in the evolution of language should not be underestimated.

1

1 Introduction Language is transmitted from generation to generation within a speech community. The precise nature of the intergenerational transmission remains a contentious issue. The transmission of language down the generations involves at least some cultural transmission | under normal circumstances children learn the language of their speech community through exposure to the linguistic behaviour of that community. The most in uential linguistic theories of modern times (Chomsky, 1965; Chomsky, 1980; Chomsky, 1987) assume genetic transmission of the language faculty between generations in addition to this cultural transmission | language learners come to the language acquisition task equipped with some genetically encoded Language Acquisition Device (Chomsky, 1987) which they have inherited from their parents. The research outlined in this paper represents an attempt to understand the types of interactions which may occur between cultural transmission and genetic transmission during the iterated transmission of communication systems within a communicating population. Accounts of the emergence of communication systems, including languagelike syntactically structured communication systems, which depend in part on the computational simulation of cultural and genetic processes fall into three main groups: 1. Those which suggest that genetic transmission between generations alone is capable of developing and re ning innate communication systems (e.g. Werner and Dyer, 1991; Ackley and Littman, 1994; MacLennan and Burghardt, 1994; Levin, 1995; Cangelosi and Parisi, 1996; Oliphant, 1996; Bullock, 1997; de Bourcier and Wheeler, 1997; Di Paolo, 1997; Werner and Todd, 1997; Noble, 1998) 2. Those which suggest that cultural transmission between generations alone is capable of developing and re ning entirely learned communication systems (e.g. Hutchins and Hazelhurst, 1995; Steels and Vogt, 1997; Batali, 1998; Livingstone and Fyfe, 1999; Batali, in press; Hurford, in press; Kirby, in prep; Kirby, in pressa; Kirby, in pressb; Oliphant, in press), 3. Those which suggest that positive interactions between genetic and cultural transmission are capable of developing and re ning communication systems which are part innate and part learned (e.g. Batali, 1994; Briscoe, 1997; Kirby and Hurford, 1997). This paper explores the roles of non-random genetic transmission (evolution by natural selection) and cultural transmission in the development of simple communication systems in a simulated population. The ndings outlined in this paper suggest that the learning biases of individual agents result in cultural selection during the process of cultural transmission. This cultural selection, rather than natural selection, is the key factor in the development of shared communication systems in the simulated populations. 2

2 The communication system A maximally simple model of communication is used, to focus inquiry on the roles of genetic and cultural transmission of such systems. The communication systems consist of a set of meaning-signal pairs, where each meaning in the set of meanings M is associated with a signal from the set of signals S . Systems of this type clearly lack the syntactic structure characteristic of natural language | signals are discrete tokens which may not be combined with other signals to form more complex signals. The canonical example of such a communication system is the communication system used by Vervet monkeys (Cheney and Seyfarth, 1990). Optimal communication between two individuals using such a system requires that the individual wishing to communicate meaning m 2 M uses signal s 2 S and the individual receiving s interprets it as meaning m . Such communication systems can be classi ed in terms of the degree of homonymy they exhibit. Homonyms are signals which are perceptually indistinguishable, but which are associated with distinct meanings. A communication system of the type outlined above will be termed:  Unhomonymous if every meaning is associated with a distinct signal. The Vervet communication system is an example of an unhomonymous communication system.  Partially homonymous if some, but not all, meanings are associated with identical signals. In such a communication system, some signals will be homonyms of other signals.  Fully homonymous if all meanings are associated with identical signals. In such a system, all signals will be homonyms of all other signals. Homonymy therefore introduces ambiguity in the communication system - homonymous signals will be ambiguous as to the meaning they should be associated with1 . Pressure to minimise ambiguity should therefore discourage homonymy. i

i

i

i

3 The communicative agent Feedforward neural networks are used to model communicative agents. The structure of the network used is shown in Figure 1. The input to this network is considered to be the meaning to be communicated and the network's output is considered to be the signal used by that agent to communicate the input meaning, with the precise nature of the meaning-signal mapping determined by the connection weights in the network. 1 Homonymy is rife in natural languages, but utterances containing homonymous words are rarely ambiguous due to the context provided by the rest of the utterance. In English, for example, \bank" has several interpretations | it can refer to a river's edge or a nancial institution or the act of tilting sideways while making a turn, to name but a few. However, the utterance \I paid money into the bank" will not normally be ambiguous due to the linguistic context the homonymous \bank" appears in.

3

Output/Signal

Input/Meaning

Figure 1: The structure of the neural network. Communication systems therefore map from 3D meaning vectors to 3D signal vectors. Binary meaning vectors are used, giving 23 possible meanings to be communicated. A subset of this set of possible meaning vectors is considered to be the universe of discourse of the simulated agents. The universe of discourse represents meanings which the agents will be required to communicate to one another. For all simulations outlined in this paper, the universe of discourse consists of the unit vectors (1 0 0), (0 1 0) and (0 0 1). The other meaning vectors not included in the universe of discourse are meanings which the agents can conceive of but are not required to communicate.

3.1 Production and reception

Producing the signal associated with a given meaning from the universe of discourse in such communicative agents is straightforward | the given meaning is used as the input to the network and activations are propagated forward through the network to give a real-valued output pattern of activation, which is thresholded at 0:5 to give the binary signal associated with the given meaning. Reception is slightly more complex, given that the networks are not bidirectional. All meanings in the universe of discourse are propagated through a given agent's network to produce a real-numbered output pattern of activation for each meaning. Each output pattern is given a con dence rating, corresponding to how closely that pattern matches the received signal. The meaning which produces the signal closest to the received signal, according to the con dence measure, is chosen as the interpretation of the received signal. This method is based on the method used by Batali (1998) and Kirby (in prep) for producing outputs for similar networks. The con dence measure that a given real-numbered output vector, o, of length n matches a target binary vector t of length n is simply the product of the con dence scores for each individual node 1:::n in the output vector i.e.

Y C (t[1 : : : n]jo[1 : : : n]) = C (t[i]jo[i]) n

=1

i

4

where the con dence measure for node i is = 1; C (t[i]jo[i]) = o(1[i]? o[i]) ifif tt[[ii]] = 0:

n

(Equations adapted from Kirby (in prep))

4 Genetic transmission of communication systems In this section a model of genetic transmission of communication systems is outlined. Selection based on communicative success during genetic transmission results in optimal communication in the simulated populations.

4.1 The genetic algorithm

A genetic algorithm (Holland, 1975; Mitchell, 1996) is used to simulate the process of genetic transmission of communication systems. The genetic algorithm has four key components: 1. A model of population turnover. 2. A model of genotypes, phenotypes and the mapping from genotype to phenotype. 3. Selective breeding based on an evaluation of communicative ability. 4. A method of recombination of genes during breeding. These four components are described below.

4.1.1 Population turnover

A generational population model is used | at every time step of the simulation the entire population is replaced by a new population generated by breeding interactions between the members of the old population.

4.1.2 Genotypes and phenotypes

The phenotype communicative agent used is as outlined in Section 3 | a threelayer, feedforward neural network mapping from input meanings to output communicative signals. In the simulations outlined in this section each individual's connection weights are fully speci ed by their genotype | each agent's genotype consists of a string of real numbers, with each locus in the genotype mapping to a particular connection in the phenotype network and the real-numbered allele at the locus in the genotype determining the weight of the associated connection in the phenotype network. This mapping from genotype to phenotype is illustrated in Figure 2. 5

0.7 0.4 0.3 0.1 0.2 1.0 0.6 -0.7 0.1 0.3 -0.9 0.9

0.4 0.2 0.4 0.3 0.4 0.2 -0.3 -1.0 1.0 0.1 -0.3 0.5

[-0.3,0.1,0.4,0.5,0.3,-1.0,0.4,0.2,0.2,1.0,-0.3,0.4,0.6,0.3,0.3,0.9,0.1,-0.7,0.2,0.4,1.0,0.1,-0.9,0.7]

Phenotype

Genotype

Figure 2: The mapping from genotype (a string of real numbers) to phenotype (a neural network). Bias node connection weights are shown in the associated node.

4.1.3 Selective breeding

An agent's chances breeding are determined by their success at communicating with other members of their generation of the population. The method of evaluating tness is given in Figure 3. The population is spatially organised on a toroidal line for the purposes of tness evaluation | every agent in the population has two neighbours. The ttest 50% of the population breed with equal probability to produce the next generation of agents. The population is not spatially organised for breeding | any eligible agent can breed with any other eligible agent to produce o spring, and the position of their o spring in the next generation is not related in any way to their own positions. The lack of spatial organisation in the breeding process prevents the population splitting into several genetically distinct groups. Such processes of speciation are outwith the scope of this paper.

4.1.4 Recombination of genes

Breeding involves recombination of genes, via crossover, and mutation. The crossover process operates on chunks of genotype which map to connections feeding into a single node in the phenotype network, and selects each such chunk randomly from an agent's two parents to produce a new genotype. Point mutations occur on the newly formed genotype with probability P , where P = 0 1 given genome length L. Mutation results in the value at the mutated locus being increased by a random real number x, where ?1  x  1. These crossover and mutation operators are identical to those used by Montana and Davis (1989). :

L

6

 For each agent in the population: { Call that agent the speaker. { For each of the speaker's neighbours, do:  Call that neighbour the hearer  For each meaning, m , do:  Generate the form, s , that the speaker associates with m s

s

s

(see Section 3.1).  Identify the meaning, m , that the hearer interprets s as conveying using the reception process (see Section 3.1).  Compare m with m and score the success of the communication. If m is identical to m score the communication as a success. Otherwise, the communication is a failure.  Return the hearer to the population. { Return the speaker to the population h

h

s

s

h

s

Figure 3: The algorithm for evaluating communicative success of agents in the population.

4.2 Natural selection results in optimal communication

In a population of such agents, where communication systems are genetically encoded, this genetic information is transmitted between generations by breeding and breeding is determined by communicative success, optimal unhomonymous communication systems rapidly emerge from random communication systems, as suggested by previous work. The average communicative success of members of 10 populations over time is shown below in Figure 4. The communication system types used over time, averaged over these 10 populations, is shown in Figure 5. As can be seen from Figures 4 and 5 there is a clear relationship between communication system type and communicative success. A population fully converged on a fully homonymous communication system will have an average communicative success of around 33%. Similarly, a population converged on a partially homonymous communication system will have an average communicative success of around 66% and a population converged on an unhomonymous communication system will have average communicative success of 100%. When the population is not fully converged on one system type or another, or where there are di erent, mutually unintelligible instances of a communication system of a given type present in the population, the population's average tness will di er from these anticipated levels. The convergence of these simulated populations on optimal communication systems is a result of natural selection | \the di erential survival and reproduction of . . . di erent genotype[s] within a population" (Jones, Martin, and 7

100

% Success

66

33

0 0

200

400

600

800

1000

Generation

Figure 4: Average communicative success of members of 10 populations over time.

100

Unhomonymous

90 Partially Homonymous % of Population using system

80 Fully Homonymous 70 60 50 40 30 20 10 0

0

100 200 300 400

500 600 700

800

900 1000

Generations

Figure 5: Communication system types in use over time, averaged over 10 populations

8

Pilbeam, 1992, p467). The di erentiating factor in the retention of genotypes is the ability of the phenotypes they encode to successfully communicate with other members of their population. Genotypes which encode phenotypes which communicate using less homonymous communication systems are more likely to be retained in the population's gene pool as such phenotypes are more likely to be successful communicators.

5 Adding cultural transmission In this section cultural transmission, in the form of learning, is added to the model. The addition of learning allows communication systems to be both genetically transmitted between generations, through the processes of selection and breeding outlined above, and culturally transmitted by the process of observational learning, described below. Previous work on the interactions between genetic and cultural transmission in the evolution of communication suggest that positive interactions between genetic and cultural transmission, such as the Baldwin e ect (Baldwin, 1896; Hinton and Nowlan, 1987), will occur in these circumstances. This was found not to be the case - the addition of cultural transmission results in the emergence of suboptimal communication systems in the simulated populations.

5.1 The learning process

The generational genetic algorithm was altered to allow individuals at generation N to transmit their communication system to individuals at generation N + 1 culturally, as well as genetically. Cultural transmission is enabled by learning | individuals at generation N + 1 observe and learn from the communicative behaviour of generation N individuals. As previously mentioned, the population at each generation is spatially organised along a line - within a single generation, a given individual is closer to some other individuals than others. The spatial relationship is taken to hold between generations | certain generation N + 1 individuals are closer to certain generation N individuals than others. A generation N + 1 individual observes and learns from the communicative behaviour of the 3 generation N individuals closest to them, as illustrated in Figure 6. Each individual at generation N +1 receives 25 exposures to the communication systems of the population at generation N . These exposures are randomly distributed among the three closest members of generation N . During each exposure, the set of meaning-signal pairs of the N generation agent is used to train the generation N + 1 agent. The backpropagation algorithm was used to implement this learning process2, with the starting point for learning being the connection weights speci ed in the learning agent's genotype. The learning 2 A learning rate of 5 and momentum of 0 were used. This extremely high learning rate and low number of exposures made the simulations computationally tractable, but complicates matters further on.

9

Agent Agent Agent Agent

Generation N

Agent Agent Agent Agent

Generation N + 1

Figure 6: Spatial organisation of 2 adjacent generations. Arrows indicate cultural transmission. 100

% Success

66

33

0 0

200

400

600

800

1000

Generation

Figure 7: Average communicative success of members of 10 learning populations over

time

agent's communication system will therefore be determined, at least to some extent, by the interactions between the processes of genetic transmission via selective breeding and cultural transmission via learning.

5.2 Cultural transmission and cultural stagnation

Previous work suggests that the combination of non-random genetic transmission (natural selection) and cultural transmission (learning) should result in the emergence of optimal communication systems. The average communicative success of members of 10 learning populations over time is shown in Figure 7. Figure 8 shows the communication system types in use over time, averaged over the same 10 learning populations. It is clear from Figures 7 and 8 that the addition of cultural transmission has prevented the populations from developing optimal, unhomonymous communication systems. In fact, adding cultural transmission results in cultural stagnation | suboptimal systems which appear in the rst few generations are maintained and come to dominate the population. Why? The poor performance of the learning populations is due to the biases of the learners. There are two possible biases which may e ect the successful trans10

100

Unhomonymous

90 Partially Homonymous % of Population using system

80 Fully Homonymous 70 60 50 40 30 20 10 0

0

100 200 300 400

500 600 700

800 900 1000

Generations

Figure 8: Communication system types in use over time, averaged over 10 learning populations System Type % success Unhomonymous 99.5 Partially Homonymous 100 Fully Homonymous 100 Figure 9: The success of networks at the learning task. 100 random communication systems of each type were each tested on 100 networks. mission of communication systems | learning biases making certain systems harder to learn than others, and learning biases determining how the learners respond when confronted with con icting communication systems to learn from. Figure 9 summarises the success of networks with small, random initial weights in acquiring communication systems of the three types given 25 exposures to single systems of those types. This re ects the ability of the communicative agents to acquire a system of the given type when presented with uni ed communicative behaviour to learn from. As can be seen from Figure 9, the agents enjoy extremely high success at acquiring systems of all three types in these circumstances, with unhomonymous systems proving fractionally harder to acquire. This small learning bias is clearly not the determining factor in the evolving communicative behaviour of the learning populations. Figure 10 summarises the percentage of networks with random initial connection weights using communication systems of the three types. This approximates the response of learning agents to training on con icting communication systems | training on con icting systems e ectively randomises the connection weights in the agents' neural networks. As can be seen from Figure 10, under these circumstances the majority of agents are likely to converge on fully 11

System Type % population Unhomonymous 1.8 Partially Homonymous 26.4 Fully Homonymous 71.8 Figure 10: The percentage of a population of agents with small, random weights using communication systems of the given type

homonymous communication systems. This is the crucial learning bias which results in the convergence of the learning populations on fully homonymous communication systems. As can be seen from Figure 8, there is a mix of communication systems in the populations at generation 0. Agents at generation 1 therefore observe and learn from a mix of communication systems. Under such circumstances, the agents are likely to acquire fully homonymous systems, due to the learning bias in favour of acquiring fully homonymous systems when presented with mixed communicative behaviour. The same will be true of learners in subsequent generations faced with mixed communicative behaviour. The learning biases of the agents lter out communication systems not conforming to those biases through repeated iterations of cultural transmission. In Section 4.2, natural selection was characterised as \the di erential survival and reproduction of . . . di erent genotype[s] within a population". This di erential reproduction of genotypes is a result of properties of the genotypes themselves | some genotypes have properties which make them more likely to be genetically transmitted through reproduction. Repeated ltering of genotypes by selective breeding results in the evolving populations coming to be dominated by genotypes which encode optimally communicating neural networks | neural networks which conform to the breeding bias in favour of successful communicators. Similarly, some communication systems are more likely than others to be successfully transmitted between generations. This selective transmission of communication systems is determined by the learning biases of the learning agents, with communication systems conforming to the learning biases being more likely to be successfully transmitted. This di erential reproduction of di erent communication systems within the population can be termed cultural selection. In the learning populations outlined in this section, the learning biases of the agents result in cultural selection for fully homonymous communication systems and as a result these systems come to dominate the populations. It is irrelevant to the cultural transmission process that the communication systems which are likely to be favoured by cultural selection are extremely bad from a perspective of individual organism tness. Why does natural selection not counteract the force of cultural selection and weed out poor communicators? Learning in the phenotype masks an individual's genetic makeup | no matter how good an agent's genes are, their 12

e ects are likely to be overtaken by learning, which almost fully determines an agent's communicative behaviour. Shielding (Ackley and Littman, 1991) prevents natural selection from identifying good gene combinations and weeding out bad gene combinations. There are,however, certain combinations of genes which make learning a particular communication system impossible | an agent's genes constitute the starting point for learning, and the backpropagation algorithm is sensitive to initial weights to a certain degree. Genetic drift does occasionally result in small numbers of agents being born whose genes are so good they cannot learn fully homonymous communication systems. However, these agents must still communicate with their neighbors, and if those neighbors use a fully homonymous system then using a better system to communicate with them yields no bene t. The good gene combinations do not survive for long due to inter-breeding with agents whose genes allow them to acquire fully homonymous systems. Cultural transmission leads to cultural stagnation in the simulated populations | cultural selection favours fully homonymous communication systems and natural selection is powerless to counteract this.

5.3 Cultural transmission and collapse

The addition of cultural transmission not only prevents the development of an optimal communication system in the simulated populations | it prevents the maintenance of such a system. Figure 11 shows the average tness of a population of agents who start out with a shared, optimal, innate communication system | all the agents in the population at generation 0 have a hand-selected set of genes which encode an optimal communication system. As in the simulations outlined in the previous section, every generation after the rst attempts to learn a communication system from the preceding generation. As can be seen from Figure 11, the population collapses from using an unhomonymous communication system to using a fully homonymous communication system within 12000 generations. Learning in the phenotype almost completely masks an agent's genes. As mentioned above, there are certain combinations of genes which make learning a particular communication system impossible. In the simulation shown in Figure 11 an agent will eventually be born whose genes are so bad that they cannot learn the unhomonymous communication system in use by the rest of the population. This individual will learn a partially homonymous or fully homonymous communication system instead. While this individual will not breed, due to low success in communicating with its neighbours, its communication system will be learned from by agents in the next generation. As previously mentioned, training on con icting communication systems has the e ect of randomising the weights in an agent's network, resulting in the possibility that the agent will converge on a suboptimal communication system. The communication system of an agent with bad genes will be observed by three individuals. These individuals run the risk of acquiring a suboptimal communication system. If they do acquire a suboptimal system they will not breed, but their communication systems will be used for learning by agents in 13

100

% Success

66

33

0 0

2000

4000

6000 Generation

8000

10000

12000

Figure 11: Average communicative success of members of a learning population over

Population Space

time.

9000

9500

10000

10500

11000

11500

12000

Generations

Figure 12: The population collapses from using an unhomonymous system (white)

to a partially homonymous system (grey), and then to a fully homonymous system (black).

the next generation. As increasing levels of homonymy result in more successful replication during cultural transmission, suboptimal communication systems spread through the population like a virus due to the processes of cultural transmission and cultural selection, until the whole population converges on a fully homonymous communication system. Figure 12 illustrates the spread of suboptimal communication systems through the population.

6 Adding explicit cultural selection The cultural selection observed in the simulations outlined in the previous section is purely a result of the bias of the learners towards acquiring fully homonymous communication systems in the presence of con icting training data. In this section an additional learning bias, a preference by learners to learn from 14

successful communicators, is added to the model. This additional selectional pressure on cultural transmission reduces cultural stagnation to a certain extent and allows optimal communication systems to be maintained inde nitely.

6.1 Cultural selection by discriminating learners

In the simulations outlined in Section 5, natural selection was implemented by only allowing the top 50% of the population to transmit their genetic information to the next generation via breeding. Cultural selection was purely a result of the learner's biases. For the simulations outlined in this section agents in generation N + 1 observe and learn from the 3 nearest generation N agents whose communicative success scores place them in the best 50% of the population, according to the evaluation function outlined in Section 4.1.3. In these populations of discriminating learners there are therefore 3 selectional pressures operating on the evolving populations and communication systems: 1. Natural selection, operating on genetic transmission, favouring genes whose phenotype realizations are successful communicators. 2. Cultural selection for learnability, operating on cultural transmission, favouring communication systems which conform to the learning bias for fully homonymous systems in the presence of con icting training data. 3. Cultural selection for communicative success, operating on cultural transmission, favouring communication systems which result in successful communication. Selection pressures 1 and 3 are clearly related, although operating on di erent modalities of transmission. Selection pressures 2 and 3 operate in the same modality of transmission, but are in direct competition.

6.2 Con icting forms of cultural selection

The average communicative success of members of 10 such discriminating learning populations over time is shown in Figure 13. Figure 14 shows the communication system types in use over time, averaged over the same 10 learning populations. It is clear from Figures 13 and 14 that the addition of explicit cultural selection for communicative success has improved the average communicative success of the majority of the populations | most of the populations have moved away from fully homonymous communication systems and one population has converged on an optimal, unhomonymous communication system. However, discriminating learning is not a reliable mechanism for developing optimal communication systems. The failure of the majority of populations of discriminating learners to develop optimal, unhomonymous communication systems is due to the direct con ict between the two forms of cultural selection, cultural selection for learnability and cultural selection for communicative success. The agent's bias for learning 15

100

% Success

66

33

0 0

200

400

600

800

1000

Generation

Figure 13: Average communicative success of members of 10 learning populations over time

100

Unhomonymous

90 Partially Homonymous % of Population using system

80 Fully Homonymous 70 60 50 40 30 20 10 0

0

100 200 300 400

500 600 700

800

900 1000

Generations

Figure 14: Communication system types in use over time, averaged over 10 learning

populations

16

from successful communicators gives communication systems which result in successful communication an increased probability of being involved in the cultural transmission process. However, these very same systems are least likely to be successfully transmitted due to the learner's bias in favour of acquiring fully homonymous systems when presented with con icting training examples. The communicative behaviour of the populations therefore represents a compromise between the competing pressures for communicative success and transmission delity. Such populations of discriminating learners do not su er from the collapse problem which e ected the populations outlined in Section 5.3 | populations of discriminating learners were able to maintain an optimal, unhomonymous communication system inde nitely. While the shielding e ect does allow deleterious mutations to accumulate in such populations, resulting in the introduction of agents using suboptimal communication systems, these suboptimal systems are never transmitted. The suboptimal systems never occur in sucient numbers to ght their way into the top 50% of communication systems which enter into the cultural transmission process.

7 Tailoring the learning bias In the simulations outlined in the previous section there were two learning biases | the intrinsic bias of the learners, which favours increased homonymy, and the bias in favour of learning from successful communicators. In this section the model of a communicative agent is revised to build in a learning bias towards optimal communication systems. This bias results in the rapid and reliable emergence of such systems.

7.1 The new communicative agent

As outlined in Section 3, the communicative agents in all previous simulations were feedforward neural networks mapping from input meanings to output signals. Networks of this type will be referred to as imitators below. Signal production for these imitator agents was merely a matter of propagating an input meaning pattern of activation through the network to produce an output signal. Reception was achieved by presenting all universe of discourse meanings and selecting the meaning which maximises con dence in the received signal. These networks are strongly biased in favour of acquiring fully homonymous communication systems. This learning bias, in the context of cultural transmission, results in cultural selection for fully homonymous systems. The new model of a communicative agent has exactly the same basic form as the imitator model, being a three-layer feedforward neural network. However, the crucial di erence is that the new networks, which will be referred to as obverter (Oliphant and Batali, 1997)3 networks, map from input signals to output 3

Obverter networks are the equivalent of what Hurford (1989) termed Saussurean learners.

17

System Type % successes Unhomonymous 54 Partially Homonymous 16 Fully Homonymous 1 Figure 15: The success of obverter networks at learning communication systems of the three types, with small random initial weights.

meanings | the direction of the mapping has been reversed. Production and reception in these obverter networks operate as follows:

Production: Each of the set of possible signals is propagated through the

network, producing a real-numbered output pattern of activation for each signal. The signal which produces the meaning closest to the meaning to be communicated, as determined by the con dence measure outlined in Section 3.1, is used to communicate the given meaning (as for imitator reception). Reception: The received signal pattern is propagated forward through the network and the output activation pattern of activation is thresholded to produce a binary pattern of activation corresponding to that agent's interpretation of the received signal (as for imitator production).

The learning biases of these agents are shown in Figures 15 and 16. As can be seen from Figure 15 these agents are strongly biased against learning fully homonymous and partially homonymous communication systems.4 Figure 16 shows that obverter networks with random weight settings in the range [1,1] are strongly biased towards unhomonymous communication systems | as discussed in Section 5.2, these random weight biases approximate the response of networks to exposure to con icting communication systems. 4 The picture is rather more complex than this | due to the high learning rate and small number of exposures to training data, the networks are incapable of learning systems in which all signals have the same value at a given bit. Such systems constitute 100% of all possible fully homonymous systems, 86% of all possible partially homonymous communication systems and 43% of unhomonymous systems. If systems with this property are excluded, which involves excluding all fully homonymous systems, the obverter networks enjoy 100% success in learning partially homonymous and unhomonymous systems. There is therefore a strong learning bias against fully homonymous systems and partially homonymous or unhomonymous systems with the awkward property, but there is no learning bias against partially homonymous or unhomonymous systems not possessing the property. The learning bias against systems in which all signals share the same value for the same bit disappears as the number of exposures is increased and the learning rate decreased, until the networks enjoy 100% success at learning communication systems of any type (not actually sure if this is true). While the learning bias introduced by the high learning rate and small amount of training is clearly a strong one, it is not the bias which determines the overall behaviour of the system | this learning bias does not discriminate between partially homonymous and unhomonymous systems of particular types, and as will be seen there clearly is a bias against partially homonymous systems.

18

System Type % population Unhomonymous 66 Partially Homonymous 33 Fully Homonymous 1 Figure 16: The percentage of a population of obverter agents with small, random weights using communication systems of the given type 100

% Success

66

33

0 0

200

400

600

800

1000

Generation

Figure 17: Average communicative success of members of 10 obverter populations over time

7.2 Obverter learning results in optimal communication

In the simulation runs outlined in this section, the new obverter learner is substituted for the imitator learner used in previous sections. Excluding the change in the agent model, all other simulation details are identical to the simulation runs described in Section 5 | speci cally, the agents are not of the discriminating type outlined in Section 6. The average communicative success of members of 10 populations of obverter agents over time is shown in Figure 17. Figure 18 shows the communication system types in use over time, averaged over the same 10 populations. As can be seen from Figures 17 and 18 obverter populations quickly converge on optimal, unhomonymous communication systems. Further simulations show that populations of obverter networks maintain unhomonymous communication systems inde nitely. The switch from imitator to obverter networks removes the problems of cultural stagnation and collapse experienced by imitator populations. As discussed in Section 5.2, learning biases impose selection pressures on the cultural transmission of communication systems. The obverter networks in these simulations have two learning biases | the bias which prevents them from learning fully homonymous systems and partially or unhomonymous systems with certain characteristics, and the random weight bias which approxi19

100

Unhomonymous

90 Partially Homonymous % of Population using system

80 Fully Homonymous 70 60 50 40 30 20 10 0

0

100 200 300 400

500 600 700

800

900 1000

Generations

Figure 18: Communication system types in use over time, averaged over 10 obverter populations

mates how the networks behave when presented with con icting communication systems during the training process. The rst bias immediately selects against fully homonymous communication systems and some partially homonymous and unhomonymous systems | these systems cannot be learned, so cannot replicate culturally. However, the rst bias does not di erentiate between partially homonymous and unhomonymous systems in which all 3 signals do not share the same value for a particular bit. The random weight bias appears to be the key factor in determining the eventual communicative behaviour of the populations. While certain partially homonymous and unhomonymous systems may be equally transmittable in terms of the learning bias, partially homonymous systems are unstable | their successful transmission depends on the absence of any other communication systems from the pool of communication systems being transmitted. As soon as other communication are present, partially homonymous systems are culturally selected against | when an obverter agent is trained on con icting communication systems, they tend to acquire an unhomonymous communication system. The cultural selection of unhomonymous communication systems results in these systems emerging from random communicative behaviour over time, and also allows the populations to resist the invasion of partially homonymous or fully homonymous communication systems which may be introduced by the process of shielding and genetic drift outlined in Section 5.3.

8 Removing natural selection In Sections 5, 6 and 7 the eventual communicative behaviour of the populations was described solely in terms of the selection pressures acting on cultural 20

100

% Success

66

33

0 0

200

400

600

800

1000

Generation

Figure 19: Average communicative success of members of 10 imitator populations

over time

transmission. If the process of cultural selection is truly the determining force behind the behaviour of these populations, we would anticipate that the removal of genetic transmission would not qualitatively a ect the behaviour of the populations. The simulations outlined in Sections 5, 6 and 7 were repeated without genetic transmission of connection weights | rather than inheriting a set of starting weights from their parents, agents start learning with small, random connection weights. The pressures acting on the communicative behaviour of these populations are therefore entirely cultural. The average communicative success of populations in which genetic transmission is absent are illustrated in Figures 19 (imitators), 20 (discriminating imitators) and 21 (obverters). The behaviour of these populations is qualitatively similar to the equivalent populations subject to natural selection for communicative success, con rming that the emergent communicative behaviour of all the simulated populations is due to cultural selection, rather than natural selection.

9 Conclusions This paper outlines a computational model of the emergence of communication in a population of communicative agents. As previous work suggests, natural selection alone is capable of evolving optimal, innate communication systems in such populations. However, the addition of cultural transmission of communication systems does not necessarily assist the emergence of optimal communication systems. The biases of the learners involved in the cultural transmission process result in cultural selection | communication systems which conform to the biases of the learners are more likely to be successfully transmitted than communication systems which do not. 21

100

Generation

66

33

0 0

200

400

600

800

1000

% Success

Figure 20: Average communicative success of members of 10 discriminating learning populations over time

100

% Success

66

33

0 0

200

400

600

800

1000

Generation

Figure 21: Average communicative success of members of 10 obverter populations over time

22

The results of simulations in which cultural selection is in direct con ict with natural selection are outlined in Section 5. In these circumstances, cultural selection proved to be the determining factor in the emergent behaviour of the simulated populations. In Section 6, a second cultural selection pressure was introduced which was in direct con ict with the intrinsic learning biases of the simulated agents. The con ict between these two cultural pressures resulted in communicative behaviour which was a compromise between the two competing forms of cultural selection. In Section 7 the model of the learning agent was modi ed to build in a bias towards optimal communication systems. In populations of such agents optimal communication systems rapidly and reliably emerged, due to the cultural selection pressures arising from the learners' biases. Finally, in Section 8, the behaviour of these populations was shown to be largely determined by cultural selection, with selection during genetic transmission being relegated to a subsidiary role. The simulations outlined in this paper suggest that research into the origins and evolution of language should not underestimate the role of cultural selection in this process. These simulations give a clear illustration of the fact that the learning biases of individual learners can have profound and far-reaching e ects when placed in the context of iterated cultural transmission, and that in certain circumstances these cultural processes can e ectively nullify the in uence of natural selection during genetic transmission. This is not to say that natural selection can have no role in the explanation of the evolution of language. In the simulations outlined in this paper, natural selection is restricted to tinkering with the starting point for the learning process. Allowing natural selection to develop learning algorithms, and therefore modify learner biases and determine the precise nature of cultural selection occurring during cultural transmission, will undoubtedly result in complex and interesting interactions between natural selection and cultural selection. The story of the evolution of language may be best told in two parts, with the development of a language acquisition device occurring in a geological time scale and the development of language parasitic on this language acquisition device occurring in a historical time scale.

23

References Ackley, D. and M. Littman (1991). Interactions between learning and evolution. In C. Langton, C. Taylor, J. Farmer, and S. Rasmussen (Eds.), Arti cial Life 2, pp. 487{509. Redwood City, CA: Addison-Wesley. Ackley, D. and M. Littman (1994). Altruism in the evolution of communication. In R. Brooks and P. Maes (Eds.), Arti cial Life 4: Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, pp. 40{48. Redwood City, CA: Addison-Wesley. Baldwin, J. M. (1896). A new factor in evolution. American Naturalist 30, 441{ 451. Batali, J. (1994). Innate biases and critical periods: Combining evolution and learning in the acquisition of syntax. In R. Brooks and P. Maes (Eds.), Arti cial Life 4: Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, pp. 160{171. Redwood City, CA: Addison-Wesley. Batali, J. (1998). Computational simulations of the emergence of grammar. Cambridge University Press. Batali, J. (in press). The negotiation and acquisition of recursive grammars as a result of competition among exemplars. See Briscoe (in press). Briscoe, E. (1997). Language acquisition: the bioprogram hypothesis and the baldwin e ect. MS, Computer Laboratory, University of Cambridge. Briscoe, E. (Ed.) (in press). Linguistic Evolution through Language Acquisition: Formal and Computational Models. Cambridge: Cambridge University Press. Bullock, S. (1997). An exploration of signalling behaviour by both analytic and simulation means for both discrete and continuous models. In P. Husbands and I. Harvey (Eds.), Fourth European Conference on Arti cial Life, pp. 454{ 463. Cambridge, MA: MIT Press. Cangelosi, A. and D. Parisi (1996). The emergence of a `language' in an evolving population of neural networks. Technical Report NSAL-96-004, Institute of Psychology, National Research Council, Rome. Cheney, D. and R. Seyfarth (1990). How Monkeys See the World: Inside the Mind of Another Species. Chicago, IL: University of Chicago Press. Chomsky, N. (1965). Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. Chomsky, N. (1980). Rules and Representations. London: Basil Blackwell. Chomsky, N. (1987). Knowledge of Language: Its Nature, Origin and Use. Dordrecht: Foris. de Bourcier, P. and M. Wheeler (1997). The truth is out there: The evolution of reliability in aggresive communication systems. In P. Husbands and I. Harvey (Eds.), Fourth European Conference on Arti cial Life, pp. 444{453. Cambridge, MA: MIT Press. 24

Di Paolo, E. (1997). An investigation into the evolution of communication. Adaptive Behaviour 6, 285{324. Hinton, G. and S. Nowlan (1987). How learning can guide evolution. Complex Systems 1, 495{502. Holland, J. H. (1975). Adaptation in Natural and Arti cial Systems. Cambridge,MA: MIT Press. Hurford, J. R. (1989). Biological evolution of the saussurean sign as a component of the language acquisition device. Lingua 77, 187{222. Hurford, J. R. (in press). Expression/induction models of language evolution: Dimensions and issues. See Briscoe (in press). Hutchins, E. and B. Hazelhurst (1995). How to invent a lexicon: The development of shared symbols in interaction. In N. Gilbert and R. Conte (Eds.), Arti cial Societies: the computer simulation of social life. London: UCL Press. Jones, S., M. Martin, and D. Pilbeam (Eds.) (1992). The Cambridge Encyclopedia of Human Evolution. Cambridge: Cambridge University Press. Kirby, S. (in prep). The evolution of structured languages has a lot to do with social transmission and not a whole lot to do with language-speci c learning biases. LEC talk, Edinburgh. Kirby, S. (in pressa). Learning, bottlenecks and the evolution of recursive syntax. See Briscoe (in press). Kirby, S. (in pressb). Syntax without natural selection: How compositionality emerges from vocabulary in a population of learners. In C. Knight, M. Studdert-Kennedy, and J. R. Hurford (Eds.), The Evolutionary Emergence of Language. Cambridge: Cambridge University Press. Kirby, S. and J. R. Hurford (1997). Learning, culture and evolution in the origin of linguistic constraints. In P. Husbands and I. Harvey (Eds.), Fourth European Conference on Arti cial Life, pp. 493{502. Cambridge, MA: MIT Press. Levin, M. (1995). The evolution of understanding: a genetic algorithm model of the evolution of communication. BioSystems 36, 167{178. Livingstone, D. and C. Fyfe (1999). Modelling the evolution of linguistic diversity. In D. Floreano, J. Nicoud, and F. Mondada (Eds.), Advances in Arti cial Life: Fifth European Conference on Arti cial Life, pp. 704{708. Berlin: Springer. MacLennan, B. and G. Burghardt (1994). Synthetic ethology and the evolution of cooperative communication. Adaptive Behaviour 2, 161{187. Mitchell, M. (1996). An Introduction to Genetic Algorithms. Cambridge, MA: MIT Press. Montana, D. J. and L. D. Davis (1989). Training feedforward networks using genetic algorithms. In Proceedings of the International Joint Conference on Arti cial Intelligence. Morgan Kaufmann. 25

Noble, J. (1998). Evolved signals: Expensive hype vs. conspiratorial whispers. In C. Adami, R. Belew, H. Kitano, and C. Taylor (Eds.), Arti cial Life 6: Proceedings of the Sixth International Conference on Arti cial Life. Cambridge, MA: MIT Press. Oliphant, M. (1996). The dilemma of saussurean communication. BioSystems 37, 31{38. Oliphant, M. (in press). The learning barrier: Moving from innate to learned systems of communication. To appear in Adaptive Behaviour. Oliphant, M. and J. Batali (1997). Learning and the emergence of coordinated communication. Center for Research on Language Newsletter 11 (1). Steels, L. and P. Vogt (1997). Grounding adaptive language games in robotic agents. In P. Husbands and I. Harvey (Eds.), Fourth European Conference on Arti cial Life, pp. 474{482. Cambridge, MA: MIT Press. Werner, G. and M. Dyer (1991). Evolution of communication in arti cial organisms. In C. Langton, C. Taylor, J. Farmer, and S. Rasmussen (Eds.), Arti cial Life 2, pp. 659{687. Redwood City, CA: Addison-Wesley. Werner, G. and P. Todd (1997). Too many love songs: Sexual selection and the evolution of communication. In P. Husbands and I. Harvey (Eds.), Fourth European Conference on Arti cial Life, pp. 434{443. Cambridge, MA: MIT Press.

26