To appear in Action To Language via the Mirror Neuron System (Michael A. Arbib, Editor), Cambridge University Press, 2005.

The Origin and Evolution of Language:
A Plausible, Strong-AI Account

Jerry R. Hobbs

USC Information Sciences Institute

Marina del Rey, California

Abstract

A large part of the mystery of the origin of language is the difficulty we experience in trying to imagine what the intermediate stages along the way to language could have been. An elegant, detailed, formal account of how discourse interpretation works in terms of a mode of inference called abduction, or inference to the best explanation, enables us to spell out with some precision a quite plausible sequence of such stages. In this chapter I outline plausible sequences for two of the key features of language - Gricean nonnatural meaning and syntax. I then speculate on the time in the evolution of modern humans each of these steps may have occurred.

1 Framework

In this chapter I show in outline how human language as we know it could have evolved incrementally from mental capacities it is reasonable to attribute to lower primates and other mammals. I do so within the framework of a formal computational theory of language understanding (Hobbs et al., 1993). In the first section I describe some of the key elements in the theory, especially as it relates to the evolution of linguistic capabilities. In the next two sections I describe plausible incremental paths to two key aspects of language - meaning and syntax. In the final section I discuss various considerations of the time course of these processes.

1.1. Strong AI

It is desirable for psychology to provide a reduction in principle of intelligent, or intentional, behavior to neurophysiology. Because of the extreme complexity of the human brain, more than the sketchiest account is not likely to be possible in the near future. Nevertheless, the central metaphor of cognitive science, ³The brain is a computer², gives us hope. Prior to the computer metaphor, we had no idea of what could possibly be the bridge between beliefs and ion transport. Now we have an idea. In the long history of inquiry into the nature of mind, the computer metaphor gives us, for the first time, the promise of linking the entities and processes of intentional psychology to the underlying biological processes of neurons, and hence to physical processes. We could say that the computer metaphor is the first, best hope of materialism.

The jump between neurophysiology and intentional psychology is a huge one. We are more likely to succeed in linking the two if we can identify some intermediate levels. A view that is popular these days identifies two intermediate levels - the symbolic and the connectionist.

 

Intentional Level

|

Symbolic Level

|

Connectionist Level

|

Neurophysiological Level

 

The intentional level is implemented in the symbolic level, which is implemented in the connectionist level, which is implemented in the neurophysiological level.[1] From the ³strong AI² perspective, the aim of cognitive science is to show how entities and processes at each level emerge from the entities and processes of the level below.[2]  The reasons for this strategy are clear. We can observe intelligent activity and we can observe the firing of neurons, but there is no obvious way of linking these two together. So we decompose the problem into three smaller problems. We can formulate theories at the symbolic level that can, at least in a small way so far, explain some aspects of intelligent behavior; here we work from intelligent activity down. We can formulate theories at the connectionist level in terms of elements that are a simplified model of what we know of the neuron's behavior; here we work from the neuron up. Finally, efforts are being made to implement the key elements of symbolic processing in connectionist architecture. If each of these three efforts were to succeed, we would have the whole picture.

In my view, this picture looks very promising indeed. Mainstream AI and cognitive science have taken it to be their task to show how intentional phenomena can be implemented by symbolic processes. The elements in a connectionist network are modeled on certain properties of neurons.  The principal problems in linking the symbolic and connectionist levels are representing predicate-argument relations in connectionist networks, implementing variable-binding or universal instantiation in connectionist networks, and defining the right notion of ³defeasibility² or ³nonmonotonicity² in logic[3] to reflect the ³soft corners², or lack of rigidity, that make connectionist models so attractive. Progress is being made on all these problems (e.g., Shastri and Ajjanagade, 1993; Shastri, 1999).

Although we do not know how each of these levels is implemented in the level below, nor indeed whether it is, we know that it could be, and that at least is something.

1.2. Logic as the Language of Thought

A very large body of work in AI begins with the assumptions that information and knowledge should be represented in first-order logic and that reasoning is theorem-proving. On the face of it, this seems implausible as a model for people. It certainly doesn't seem as if we are using logic when we are thinking, and if we are, why are so many of our thoughts and actions so illogical? In fact, there are psychological experiments that purport to show that people do not use logic in thinking about a problem (e.g., Wason and Johnson-Laird, 1972).

I believe that the claim that logic is the language of thought comes to less than one might think, however, and that thus it is more controversial than it ought to be. It is the claim that a broad range of cognitive processes are amenable to a high-level description in which six key features are present. The first three of these features characterize propositional logic and the next two first-order logic. I will express them in terms of ³concepts², but one can just as easily substitute propositions, neural elements, or a number of other terms.

·       Conjunction: There is an additive effect (P Ù Q) of two distinct concepts (P and Q) being activated at the same time.

·       Modus Ponens: The activation of one concept (P) triggers the activation of another concept (Q) because of the existence of some structural relation between them (PQ).

·       Recognition of Obvious Contradictions: It can be arbitrarily difficult to recognize contradictions in general, but we have no trouble with the easy ones, for example, that cats aren't dogs.

·       Predicate-Argument Relations: Concepts can be related to other concepts in several different ways. We can distinguish between a dog biting a man (bite(D,M)) and a man biting a dog (bite(M,D)).

·       Universal Instantiation (or Variable Binding): We can keep separate our knowledge of general (universal) principles (³All men are mortal²) and our knowledge of their instantiations for particular individuals (³Socrates is a man² and ³Socrates is mortal²).

Any plausible proposal for a language of thought must have at least these features, and once you have these features you have first-order logic. Note that in this list there are no complex rules for double negations or for contrapositives (if P implies Q then not Q implies not P). In fact, most of the psychological experiments purporting to show that people don't use logic really show that they don't use the contrapositive rule or that they don't handle double negations well. If the tasks in those experiments were recast into problems involving the use of modus ponens, no one would think to do the experiments because it is obvious that people would have no trouble with the task.

There is one further property we need of the logic if we are to use it for representing and reasoning about commonsense world knowledge -- defeasibility or nonmonotonicity. Our knowledge is not certain. Different proofs of the same fact may have different consequences, and one proof can be ³better² than another.

The mode of defeasible reasoning used here is ³abduction²[4], or inference to the best explanation. Briefly, one tries to prove something, but where there is insufficient knowledge, one can make assumptions. One proof is better than another if it makes fewer, more plausible assumptions, and if the knowledge it uses is more plausible and more salient. This is spelled out in detail in Hobbs et al. (1993).  The key idea is that intelligent agents understand their environment by coming up with the best underlying explanations for the observables in it.  Generally not everything required for the explanation is known, and assumptions have to be made.  Typically, abductive proofs have the following structure.

We want to prove R.

We know P Ù Q É R.

We know P.

We assume Q.

We conclude R.

A logic is ³monotonic² if once we conclude something, it will always be true.  Abduction is ³nonmonotonic² because we could assume Q and thus conclude R, and later learn that Q is false. 

There may be many Q¹s that could be assumed to result in a proof (including R itself), giving us alternative possible proofs, and thus alternative possible and possibly mutually inconsistent explanations or interpretations.  So we need a kind of ³cost function² for selecting the best proof.  Among the factors that will make one proof better than another are the shortness of the proof, the plausibility and salience of the axioms used, a smaller number of assumptions, and the exploitation of the natural redundancy of discourse.  A more complete description of the cost function is found in Hobbs et al. (1993).

1.3. Discourse Interpretation: Examples of Definite Reference

In the ³Interpretation as Abduction² framework, world knowledge is expressed as defeasible logical axioms. To interpret the content of a discourse is to find the best explanation for it, that is, to find a minimal-cost abductive proof of its logical form. To interpret a sentence is to deduce its syntactic structure and hence its logical form, and simultaneously to prove that logical form abductively. To interpret suprasentential discourse is to interpret individual segments, down to the sentential level, and to abduce relations among them.

Consider as an example the problem of resolving definite references. The following four examples are sometimes taken to illustrate four different kinds of definite reference.

I bought a new car last week. The car is already giving me trouble.

I bought a new car last week. The vehicle is already giving me trouble.

I bought a new car last week. The engine is already giving me trouble.

The engine of my new car is already giving me trouble.

In the first example, the same word is used in the definite noun phrase as in its antecedent. In the second example, a hyponym is used. In the third example, the reference is not to the ³antecedent² but to an object that is related to it, requiring what Clark (1975) called a ³bridging inference². The fourth example is a determinative definite noun phrase, rather than an anaphoric one; all the information required for its resolution is found in the noun phrase itself.

These distinctions are insignificant in the abductive approach. In each case we need to prove the existence of the definite entity. In the first example it is immediate. In the second, we use the axiom

(" x) car(x) É vehicle(x)

In the third example, we use the axiom

(" x) car(x) É ($ y) engine(y,x)

that is, cars have engines. In the fourth example, we use the same axiom, but after assuming the existence of the speaker's new car.

This last axiom is ³defeasible² since it is not always true; some cars don¹t have engines.  To indicate this formally in the abduction framework, we can add another proposition to the antecedent of this rule.

(" x) car(x) Ù etci(x) É ($ y) engine(y,x)

The proposition etci(x) means something like ³and other unspecified properties of x².  This particular etc predicate would appear in no other axioms, and thus it could never be proved.  But it could be assumed, at a cost, and could thus be a part of the least-cost abductive proof of the content of the sentence.  This maneuver implements defeasibility in a set of first-order logical axioms operated on by an abductive theorem prover.

1.4. Syntax in the Abduction Framework

Syntax can be integrated into this framework in a thorough fashion, as described at length in Hobbs (1998). In this treatment, the predication

(1)            Syn (w,e,Š)

says that the string w is a grammatical, interpretable string of words describing the situation or entity e.  For example, Syn(³John reads Hamlet², e,Š) says that the string ³John reads Hamlet.² (w) describes the event e (the reading by John of the play Hamlet).  The arguments of Syn indicated by the dots include information about complements and various agreement features.

Composition is effected by axioms of the form

(2)        Syn(w1, e, Š, y, Š) Ù Syn(w2, y, Š) É Syn(w1w2, e, Š)

A string w1 whose head describes the eventuality e and which is missing an argument y can be concatenated with a string w2 describing y, yielding a string describing e.  For example, the string ³reads² (w1), describing a reading event e but missing the object y of the reading, can be concatenated with the string ³Hamlet² (w2) describing a book y, to yield a string ³reads Hamlet² (w1w2), giving a richer description of the event e in that it does not lack the object of the reading.

The interface between syntax and world knowledge is effected by ³lexical axioms² of a form illustrated by

(3)        read¹(e,x,y) Ù text(y) É Syn(³read², e, Š, x, Š, y, Š)

This says that if e is the eventuality of x reading y (the logical form fragment supplied by the word ³read²), where y is a text (the selectional constraint imposed by the verb ³read² on its object), then e can be described by a phrase headed by the word ³read² provided it picks up, as subject and object, phrases of the right sort describing x and y.

To interpret a sentence w, one seeks to show it is a grammatical, interpretable string of words by proving there in an eventuality e that it describes, that is, by proving (1). One does so by decomposing it via composition axioms like (2) and bottoming out in lexical axioms like (3). This yields the logical form of the sentence, which then must be proved abductively, the characterization of interpretation we gave in Section 1.3.

A substantial fragment of English grammar is cast into this framework in Hobbs (1998), which closely follows Pollard and Sag (1994).

1.5 Discourse Structure

When confronting an entire coherent discourse by one or more speakers, one must break it into interpretable segments and show that those segments themselves are coherently related.  That is, one must use a rule like

Segment(w1, e1) Ù Segment(w2, e2) Ù rel(e,e1,e2) É Segment(w1w2, e)

That is, if w1 and w2 are interpretable segments describing situations e1 and e2 respectively, and e1 and e2 stand in some relation rel to each other, then the concatenation of w1 and w2 constitutes an interpretable segment, describing a situation e that is determined by the relation. The possible relations are discussed further in Section 4.

This rule applies recursively and bottoms out in sentences.

Syn(w, e, Š)  É Segment(w, e)

A grammatical, interpretable sentence w describing eventuality e is a coherent segment of discourse describing e. This axiom effects the interface between syntax and discourse structure.  Syn is the predicate whose axioms characterize syntactic structure; Segment is the predicate whose axioms characterize discourse structure; and they meet in this axiom.  The predicate Segment says that string w is a coherent description of an eventuality e; the predicate Syn says that string w is a grammatical and interpretable description of eventuality e; and this axiom says that being grammatical and interpretable is one way of being coherent.

To interpret a discourse, we break it into coherently related successively smaller segments until we reach the level of sentences. Then we do a syntactic analysis of the sentences, bottoming out in their logical form, which we then prove abductively.[5]

1.6 Discourse as a Purposeful Activity

This view of discourse interpretation is embedded in a view of interpretation in general in which an agent, to interpret the environment, must find the best explanation for the observables in that environment, which includes other agents.

An intelligent agent is embedded in the world and must, at each instant, understand the current situation. The agent does so by finding an explanation for what is perceived. Put differently, the agent must explain why the complete set of observables encountered constitutes a coherent situation. Other agents in the environment are viewed as intentional, that is, as planning mechanisms, and this means that the best explanation of their observable actions is most likely to be that the actions are steps in a coherent plan. Thus, making sense of an environment that includes other agents entails making sense of the other agents' actions in terms of what they are intended to achieve. When those actions are utterances, the utterances must be understood as actions in a plan the agents are trying to effect. The speaker's plan must be recognized.

Generally, when a speaker says something it is with the goal that the hearer believe the content of the utterance, or think about it, or consider it, or take some other cognitive stance toward it.[6] Let us subsume all these mental terms under the term ³cognize². We can then say that to interpret a speaker A's utterance to B of some content, we must explain the following:

goal(A, cognize(B, content-of-discourse)

Interpreting the content of the discourse is what we described above. In addition to this, one must explain in what way it serves the goals of the speaker to change the mental state of the hearer to include some mental stance toward the content of the discourse. We must fit the act of uttering that content into the speaker's presumed plan.

The defeasible axiom that encapsulates this is

(" s, h, e1, e, w)[goal(s, e1) Ù cognize¹(e1, h, e) Ù Segment(w, e) É utter(s, h, w)]

That is, normally if a speaker s has a goal e1 of the hearer h cognizing a situation e and w is a string of words that conveys e, then s will utter w to h.  So if I have the goal that you think about the existence of a fire, then since the word ³fire² conveys the concept of fire, I say ³Fire² to you.  This axiom is only defeasible because there are multiple strings w that can convey e.  I could have said, ³Something¹s burning.²

We appeal to this axiom to interpret the utterance as an intentional communicative act. That is, if A utters to B a string of words W, then to explain this observable event, we have to prove utter(A,B,W).  That is, just as interpreting an observed flash of light is finding an explanation for it, interpreting an observed utterance of a string W by one person A to another person B is to find an explanation for it.  We begin to do this by backchaining on the above axiom. Reasoning about the speaker's plan is a matter of establishing the first two propositions in the antecedent of the axiom. Determining the informational content of the utterance is a matter of establishing the third. The two sides of the proof influence each other since they share variables and since a minimal proof will result when both are explained and when their explanations use much of the same knowledge.

1.7 A Structured Connectionist Realization of Abduction

Because of its elegance and very broad coverage, the abduction model is very appealing on the symbolic level. But to be a plausible candidate for how people understand language, there must be an account of how it could be implemented in neurons. In fact, the abduction framework can be realized in a structured connectionist model called shruti developed by Lokendra Shastri (Shastri and Ajjanagadde, 1993; Shastri, 1999). The key idea is that nodes representing the same variable fire in synchrony.  Substantial work must be done in neurophysics to determine whether this kind of model is what actually exists in the human brain, although there is suggestive evidence.  A good recent review of the evidence for the binding-via-synchrony hypothesis is given in Engel and Singer (2001).  A related article by Fell et al. (2001) reports results on gamma band synchronization and desynchronization between parahippocampal regions and the hippocampus proper during episodic memory memorization. 

By linking the symbolic and connectionist levels, one at least provides a proof of possibility for the abductive framework.

There is a range of connectionist models.  Among those that try to capture logical structure in the structure of the network, there has been good success in implementing defeasible propositional logic. Indeed, nearly all the applications to natural language processing in this tradition begin by setting up the problem so that it is a problem in propositional logic. But this is not adequate for natural language understanding in general. For example, the coreference problem, e.g., resolving pronouns to their antecedents, requires the expressivity of first-order logic even to state; it involves recognizing the equality of two variables or a constant and a variable presented in different places in the text. We need a way of expressing predicate-argument relations and a way of expressing different instantiations of the same general principle. We need a mechanism for universal instantiation, that is, the binding of variables to specific entities.  In the connectionist literature, this has gone under the name of the variable-binding problem.

The essential idea behind the shruti architecture is simple and elegant. A predication is represented as an assemblage or cluster of nodes, and axioms representing general knowledge are realized as connections among these clusters. Inference is accomplished by means of spreading activation through these structures.

Figure 1 Predicate cluster for p(x,y).  The collector node (+) fires asynchronously in proportion to how plausible it is that p(x,y) is part of the desired proof.  The enabler node (?) fires asynchronously  in proportion to how much p(x,y) is required in the proof.  The argument nodes for x and y fire in synchrony with argument nodes in other predicate clusters that are bound to the same variable.

p

 

+

 

x

 

y

 
 


?

 
In the cluster representing predications (Figure 1), two nodes, a collector node and an enabler node, correspond to the predicate and fire asynchronously.  That is, they don¹t need to fire synchronously, in contrast to the ³argument nodes² described below; for the collector and enabler nodes, only the level of activation matters.  The level of activation on the enabler node keeps track of the ³utility² of this predication in the proof that is being searched for. That is, the activation is higher the greater the need to find a proof for this predication, and thus the more expensive it is to assume. For example, in interpreting ³The curtains are on fire,² it is very inportant to prove curtains(x) and thereby identify which curtains are being talked about; the level of activation on the enabler node for that cluster would be high.  The level of activation on the collector node is higher the greater the plausibility that this predication is part of the desired proof.  Thus, if the speaker is standing in the living room, there might be a higher activation on the collector node for curtains(c1) where c1 represents the curtains in the living room than on curtains(c2), where c2 represents the curtains in the dining room.

We can think of the activations on the enabler nodes as prioritizing goal expressions, whereas the activations on the collector nodes indicate degree of belief in the predications, or more properly, degree of belief in the current relevance of the predications. The connections between nodes of different predication clusters have a strength of activation, or link weight, that corresponds to strength of association between the two concepts.  This is one way we can capture the defeasibility of axioms in the shruti model.  The proof process then consists of activation spreading through enabler nodes, as we backchain through axioms, and spreading forward through collector nodes from something known or assumed.  In addition, in the predication cluster, there are argument nodes, one for each argument of the predication. These fire synchronously with the argument nodes in other predication clusters to which they are connected. Thus, if the clusters for p(x, y) and q(z, x) are connected, with the two x nodes linked to each other, then the two x nodes will fire in synchrony, and the y and z nodes will fire at an offset with the x nodes and with each other. This synchronous firing indicates that the two x nodes represent variables bound to the same value. This constitutes the solution to the variable-binding problem.  The role of variables in logic is to capture the identity of entities referred to in different places in a logical expression; in shruti this identity is captured by the synchronous firing of linked nodes.

Proofs are searched for in parallel, and winner-takes-all circuitry suppresses all but the one whose collector nodes have the highest level of activation.

There are complications in this model for such things as managing different predications with the same predicate but different arguments. But the essential idea is as described. In brief, the view of relational information processing implied by shruti is one where reasoning is a transient but systematic propagation of rhythmic activity over structured cell-ensembles, each active entity is a phase in the rhythmic activity, dynamic bindings are represented by the synchronous firing of appropriate nodes, and rules are high-efficacy links that cause the propagation of rhythmic activity between cell-ensembles. Reasoning is the spontaneous outcome of a shruti network.

In the abduction framework, the typical axiom in the knowledge base is of the form

(4)     (" x,y)[p1(x,y) Ù p2(x,y) É ($ z)[q1(x,z) Ù q2(x,z)]]

That is, the top-level logical connective will be implication. There may be multiple predications in the antecedent and in the consequent. There may be variables (x) that occur in both the antecedent and the consequent, variables (y) that occur only in the antecedent, and variables (z) that occur only in the consequent. Abduction backchains from predications in consequents of axioms to predications in antecedents. That is, to prove the consequent of such a rule, it attempts to find a proof of the antecedent.  Every step in the search for a proof can be considered an abductive proof where all unproved predications are assumed for a cost. The best proof is the least cost proof.

The implementation of this axiom in shruti requires predication clusters of nodes and axiom clusters of nodes (see Figure 1). A predication cluster, as described above, has one collector node and one enabler node, both firing asynchronously, corresponding to the predicate and one synchronously firing node for each argument. An axiom cluster has one collector node and one enabler node, both firing asynchronously, recording the plausibility and the utility, respectively, of this axiom participating in the best proof.  It also has one synchronously firing node for each variable in the axiom -- in our example, nodes for x, y and z. The collector and enabler nodes fire asynchronously and what is significant is their level of activation or rate of firing.  The argument nodes fire synchronously with other nodes, and what is significant is whether two nodes are the same or different in their phases.

The axiom is then encoded in a structure like that shown in Figure 2. There is a predication cluster for each of the predications in the axiom and one axiom cluster that links the predications of the consequent and antecedent. In general, the predication clusters will occur in many axioms; this is why their linkage in a particular axiom must be mediated by an axiom cluster.

Suppose (Figure 2) the proof process is backchaining from the predication q1(x,z). The activation on the enabler node (?) of the cluster for q1(x,z) induces an activation on the enabler node for the axiom cluster.  This in turn induces activation on the enabler nodes for predications p1(x,y) and p2(x,y). Meanwhile the firing of the x node in the q1 cluster induces the x node of the axiom cluster to fire in synchrony with it, which in turn causes the x nodes of the p1 and p2 clusters to fire in synchrony as well. In addition, a link (not shown) from the enabler node of the axiom cluster to the y argument node of the same cluster causes the y argument node to fire, while links (not shown) from the x and z nodes cause that firing to be out of phase with the firing of the x and z nodes. This firing of the y node of the axiom cluster induces synchronous firing in the y nodes of the p1 and p2 clusters.

 

 

Figure 2  shruti encoding of axiom (" x,y)[p1(x,y) Ù p2(x,y) É ($ z)[q1(x,z) Ù q2(x,z)]].  Activation spreads backward from the enabler nodes (?) of the q1 and q2 clusters to that of the Ax1 cluster and on to those of the p1 and p2 clusters, indicating the utility of this axiom in a possible proof.  Activation spreads forward from the collector nodes (+) of the p1 and p2 clusters to that of the axiom cluster Ax1 and on to those of the q1 and q2 clusters, indicating the plausibility of this axiom being used in the final proof.  Links between the argument nodes cause them to fire in synchrony with other argument nodes representing the same variable.

 

By this means we have backchained over axiom (4) while keeping distinct the variables that are bound to different values. We are then ready to backchain over axioms in which p1 and p2 are in the consequent. As mentioned above, the q1 cluster is linked to other axioms as well, and in the course of backchaining, it induces activation in those axioms' clusters too. In this way, the search for a proof proceeds in parallel. Inhibitory links suppress contradictory inferences and will eventually force a winner-takes-all outcome.

1.8 Incremental Changes to Axioms

In this framework, incremental increases in linguistic competence, and other knowledge as well, can be achieved by means of a small set of simple operations on the axioms in the knowledge base:

1.  The introduction of a new predicate, where the utility of that predicate can be argued for cognition in general, independent of language.

2.  The introduction of a new predicate p specializing an old predicate q:

(" x) p(x) É q(x)

For example, we learn that a beagle is a type of dog.

(" x) beagle(x) É dog(x)

3.  The introduction of a new predicate p generalizing one or more old predicates qi:

(" x) q1(x) É p(x),  (" x) q2(x) É p(x), Š

For example, we learn that dogs and cats are both mammals.

(" x) dog(x) É mammal(x),  (" x) cat(x) É mammal(x)

4.  Increasing the arity of a predicate to allow more arguments.

p(x)  è p(x,y)

For example, we learn that ³mother² is not a property but a relation.

mother(x) è mother(x,y)

5.  Adding a proposition to the antecedent of an axiom.

p1(x) É q(x)  è  p1(x) Ù p2(x) É  q(x)

For example, we might first believe that a seat is a chair, then learn that a seat with a back is a chair.

seat(x) É chair(x)  è  seat(x) Ù back(y,x) É chair(x)

6.  Adding a proposition to the consequent of an axiom.

p(x) É q1(x)  è  p(x) É q1(x) Ù q2(x)

For example, a child might see snow for the first time and see that it's white, and then goes outside and realizes it's also cold.

snow(x) É white(x)  è  snow(x) É white(x) Ù cold(x)

It was shown in Section 1.7 that axioms such as these can be realized at the connectionist level in the shruti model.  To complete the picture, it must be shown that these incremental changes to axioms could also be implemented at the connectionist level.  In fact, Shastri and his colleagues have demonstrated that incremental changes such as these can be implemented in the shruti model via relatively simple means involving the recruitment of nodes, by strengthening latent connections as a response to frequent simultaneous activations (Shastri, 2001; Shastri and Wendelken, 2003; Wendelken and Shastri, 2003).

These incremental operations can be seen as constituting a plausible mechanism for both the development of cognitive capabilities in individuals and, whether directly or indirectly through developmental processes, their evolution in populations. In this paper, I will show how the principal features of language could have resulted from a sequence of such incremental steps, starting from the cognitive capacity one could expect of ordinary primates.

1.9 Summary of Background

To summarize, the framework assumed in this chapter has the following features:

A detailed, plausible, computational model for a large range of linguistic behavior.

A possible implementation in a connectionist model.

An incremental model of learning, development (physical maturation), and evolution.

An implementation of that in terms of node recruitment.

In the remainder of the paper it is shown how two principal features of language – Gricean meaning and syntax –could have arisen from nonlinguistic cognition through the action of three mechanisms:

incremental changes to axioms,

folk theories required independent of language,

compilation of proofs into axioms.

These two features of language are, in a sense, the two key features of language. The first, Gricean meaning, tells how single words convey meaning in discourse. The second, syntax, tells how multiple words combine to convey complex meanings.

2 The Evolution of Gricean Meaning

In Gricean non-natural meaning, what is conveyed is not merely the content of the utterance, but also the intention of the speaker to convey that meaning, and the intention of the speaker to convey that meaning by means of that specific utterance. When A shouts ³Fire!² to B, A expects that

1. B will believe there is a fire

2. B will believe A wants B to believe there is fire

3. 1 will happen because of 2

Five steps take us from natural meaning, as in ³Smoke means fire,² to Gricean meaning (Grice, 1948). Each step depends on certain background theories being in place, theories that are motivated even in the absence of language. Each new step in the progression introduces a new element of defeasibility. The steps are as follows:

1. Smoke means fire

2. ³Fire!² means fire

3. Mediation by belief

4. Mediation by intention

5. Full Gricean meaning

Once we get into theories of belief and intention, there is very little that is certain.  Thus, virtually all the axioms used in this section are defeasible.  That is, they are true most of the time, and they often participate in the best explanation produced by abductive reasoning, but they are sometimes wrong.  They are nevertheless useful to intelligent agents.

The theories that will be discussed in this section – belief, mutual belief, intention, and collective action – are some of the key elements of a theory of mind (e.g., Premack and Woodruff, 1978; Heyes, 1998; Gordon, this volume).  I discuss the possible courses of evolution of a theory of mind in Section 4. 

2.1 Smoke Means Fire

The first required folk theory is a theory of causality (or rather, a number of theories with causality). There will be no definition of the predicate cause, that is, no set of necessary and sufficient conditions.

cause(e1, e2) º Š

Rather there will be a number of domain-dependent theories saying what sorts of things cause what other sorts of things. There will be lots of necessary conditions