To appear in Action To Language via the Mirror Neuron System (Michael A. Arbib, Editor), Cambridge University Press, 2005.
The Origin and Evolution of Language:
A Plausible, Strong-AI Account
Jerry R. Hobbs
USC Information Sciences Institute
Marina del Rey, California
A large part of
the mystery of the origin of language is the difficulty we experience in trying
to imagine what the intermediate stages along the way to language could have
been. An elegant, detailed, formal account of how discourse interpretation
works in terms of a mode of inference called abduction, or inference to the
best explanation, enables us to spell out with some precision a quite plausible
sequence of such stages. In this chapter I outline plausible sequences for two
of the key features of language - Gricean nonnatural meaning and syntax. I
then speculate on the time in the evolution of modern humans each of these
steps may have occurred.
In this chapter
I show in outline how human language as we know it could have evolved
incrementally from mental capacities it is reasonable to attribute to lower
primates and other mammals. I do so within the framework of a formal
computational theory of language understanding (Hobbs et al., 1993). In the
first section I describe some of the key elements in the theory, especially as
it relates to the evolution of linguistic capabilities. In the next two
sections I describe plausible incremental paths to two key aspects of language - meaning and
syntax. In the final section I discuss various considerations of the time
course of these processes.
It is desirable
for psychology to provide a reduction in principle of intelligent, or
intentional, behavior to neurophysiology. Because of the extreme complexity of
the human brain, more than the sketchiest account is not likely to be possible
in the near future. Nevertheless, the central metaphor of cognitive science,
³The brain is a computer², gives us hope. Prior to the computer metaphor, we
had no idea of what could possibly be the bridge between beliefs and ion
transport. Now we have an idea. In the long history of inquiry into the nature
of mind, the computer metaphor gives us, for the first time, the promise of
linking the entities and processes of intentional psychology to the underlying
biological processes of neurons, and hence to physical processes. We could say
that the computer metaphor is the first, best hope of materialism.
The jump between
neurophysiology and intentional psychology is a huge one. We are more likely to
succeed in linking the two if we can identify some intermediate levels. A view
that is popular these days identifies two intermediate levels - the
symbolic and the connectionist.
Intentional Level
|
Symbolic Level
|
Connectionist
Level
|
Neurophysiological Level
The intentional
level is implemented in the symbolic level, which is implemented in the
connectionist level, which is implemented in the neurophysiological level.[1] From the ³strong AI² perspective, the
aim of cognitive science is to show how entities and processes at each level
emerge from the entities and processes of the level below.[2] The
reasons for this strategy are clear. We can observe intelligent activity and we
can observe the firing of neurons, but there is no obvious way of linking these
two together. So we decompose the problem into three smaller problems. We can
formulate theories at the symbolic level that can, at least in a small way so
far, explain some aspects of intelligent behavior; here we work from
intelligent activity down. We can formulate theories at the connectionist level
in terms of elements that are a simplified model of what we know of the
neuron's behavior; here we work from the neuron up. Finally, efforts are being
made to implement the key elements of symbolic processing in connectionist
architecture. If each of these three efforts were to succeed, we would have the
whole picture.
In my view, this
picture looks very promising indeed. Mainstream AI and cognitive science have
taken it to be their task to show how intentional phenomena can be implemented
by symbolic processes. The elements in a connectionist network are modeled on
certain properties of neurons. The principal problems in linking
the symbolic and connectionist levels are representing predicate-argument relations
in connectionist networks, implementing variable-binding or universal
instantiation in connectionist networks, and defining the right notion of
³defeasibility² or ³nonmonotonicity² in logic[3] to reflect the ³soft corners², or lack
of rigidity, that make connectionist models so attractive. Progress is being
made on all these problems (e.g., Shastri and Ajjanagade, 1993; Shastri, 1999).
Although we do
not know how each of these levels is implemented in the level below, nor indeed
whether it is, we know
that it could be, and
that at least is something.
A very large
body of work in AI begins with the assumptions that information and knowledge
should be represented in first-order logic and that reasoning is theorem-proving.
On the face of it, this seems implausible as a model for people. It certainly
doesn't seem as if we are using logic when we are thinking, and if we are, why
are so many of our thoughts and actions so illogical? In fact, there are
psychological experiments that purport to show that people do not use logic in
thinking about a problem (e.g., Wason and Johnson-Laird, 1972).
I believe that
the claim that logic is the language of thought comes to less than one might
think, however, and that thus it is more controversial than it ought to be. It
is the claim that a broad range of cognitive processes are amenable to a
high-level description in which six key features are present. The first three
of these features characterize propositional logic and the next two first-order
logic. I will express them in terms of ³concepts², but one can just as easily
substitute propositions, neural elements, or a number of other terms.
· Conjunction: There is an additive effect
(P Ù Q) of two distinct concepts (P and Q) being activated at the same time.
· Modus Ponens: The activation of one
concept (P) triggers
the activation of another concept (Q) because of the existence of some structural relation
between them (P
Q).
· Recognition of Obvious Contradictions: It
can be arbitrarily difficult to recognize contradictions in general, but we
have no trouble with the easy ones, for example, that cats aren't dogs.
· Predicate-Argument Relations: Concepts
can be related to other concepts in several different ways. We can distinguish
between a dog biting a man (bite(D,M)) and a man biting a dog (bite(M,D)).
· Universal Instantiation (or Variable
Binding): We can keep separate our knowledge of general (universal) principles
(³All men are mortal²) and our knowledge of their instantiations for particular
individuals (³Socrates is a man² and ³Socrates is mortal²).
Any plausible
proposal for a language of thought must have at least these features, and once
you have these features you have first-order logic. Note that in this list
there are no complex rules for double negations or for contrapositives (if P
implies Q then not Q implies not P). In fact, most of the psychological
experiments purporting to show that people don't use logic really show that
they don't use the contrapositive rule or that they don't handle double
negations well. If the tasks in those experiments were recast into problems
involving the use of modus ponens, no one would think to do the experiments
because it is obvious that people would have no trouble with the task.
There is one further
property we need of the logic if we are to use it for representing and
reasoning about commonsense world knowledge -- defeasibility or
nonmonotonicity. Our knowledge is not certain. Different proofs of the same
fact may have different consequences, and one proof can be ³better² than
another.
The mode of
defeasible reasoning used here is ³abduction²[4], or inference to the best explanation.
Briefly, one tries to prove something, but where there is insufficient
knowledge, one can make assumptions. One proof is better than another if it
makes fewer, more plausible assumptions, and if the knowledge it uses is more
plausible and more salient. This is spelled out in detail in Hobbs et al.
(1993). The
key idea is that intelligent agents understand their environment by coming up
with the best underlying explanations for the observables in it. Generally not everything required for
the explanation is known, and assumptions have to be made. Typically, abductive proofs have the
following structure.
We
want to prove R.
We
know P Ù Q É R.
We
know P.
We
assume Q.
We
conclude R.
A logic is
³monotonic² if once we conclude something, it will always be true. Abduction is ³nonmonotonic² because we
could assume Q and
thus conclude R, and
later learn that Q is
false.
There may be
many Q¹s that could
be assumed to result in a proof (including R itself), giving us alternative possible
proofs, and thus alternative possible and possibly mutually inconsistent
explanations or interpretations.
So we need a kind of ³cost function² for selecting the best proof. Among the factors that will make one
proof better than another are the shortness of the proof, the plausibility and
salience of the axioms used, a smaller number of assumptions, and the
exploitation of the natural redundancy of discourse. A more complete description of the cost function is found in
Hobbs et al. (1993).
In the
³Interpretation as Abduction² framework, world knowledge is expressed as
defeasible logical axioms. To interpret the content of a discourse is to find
the best explanation for it, that is, to find a minimal-cost abductive proof of
its logical form. To interpret a sentence is to deduce its syntactic structure
and hence its logical form, and simultaneously to prove that logical form
abductively. To interpret suprasentential discourse is to interpret individual
segments, down to the sentential level, and to abduce relations among them.
Consider as an
example the problem of resolving definite references. The following four
examples are sometimes taken to illustrate four different kinds of definite
reference.
I bought a new
car last week. The car is
already giving me trouble.
I bought a new
car last week. The vehicle is
already giving me trouble.
I bought a new
car last week. The engine is
already giving me trouble.
The engine of my new car is already giving me
trouble.
In the first
example, the same word is used in the definite noun phrase as in its
antecedent. In the second example, a hyponym is used. In the third example, the
reference is not to the ³antecedent² but to an object that is related to it,
requiring what Clark (1975) called a ³bridging inference². The fourth example
is a determinative definite noun phrase, rather than an anaphoric one; all the
information required for its resolution is found in the noun phrase itself.
These
distinctions are insignificant in the abductive approach. In each case we need
to prove the existence of the definite entity. In the first example it is
immediate. In the second, we use the axiom
(" x) car(x)
É vehicle(x)
In
the third example, we use the axiom
(" x) car(x) É ($ y) engine(y,x)
that
is, cars have engines. In the fourth example, we use the same axiom, but after
assuming the existence of the speaker's new car.
This last axiom
is ³defeasible² since it is not always true; some cars don¹t have engines. To indicate this formally in the
abduction framework, we can add another proposition to the antecedent of this
rule.
(" x) car(x) Ù etci(x) É ($ y) engine(y,x)
The
proposition etci(x) means
something like ³and other unspecified properties of x².
This particular etc
predicate would appear in no other axioms, and thus it could never be
proved. But it could be assumed,
at a cost, and could thus be a part of the least-cost abductive proof of the
content of the sentence. This
maneuver implements defeasibility in a set of first-order logical axioms
operated on by an abductive theorem prover.
Syntax can be
integrated into this framework in a thorough fashion, as described at length in
Hobbs (1998). In this treatment, the predication
(1) Syn (w,e,Š)
says
that the string w is
a grammatical, interpretable string of words describing the situation or entity
e. For example, Syn(³John reads Hamlet², e,Š) says that the string ³John reads Hamlet.² (w) describes the event e (the reading by John of the play Hamlet).
The arguments of Syn indicated
by the dots include information about complements and various agreement
features.
Composition is
effected by axioms of the form
(2) Syn(w1, e, Š, y, Š) Ù Syn(w2,
y, Š) É Syn(w1w2, e, Š)
A
string w1 whose
head describes the eventuality e and
which is missing an argument y can
be concatenated with a string w2 describing y, yielding a string describing e. For example, the string ³reads² (w1), describing a reading event e but missing the object y of the reading, can be concatenated with
the string ³Hamlet² (w2) describing a book y, to yield a string ³reads Hamlet² (w1w2), giving a richer description of the
event e in that it
does not lack the object of the reading.
The interface
between syntax and world knowledge is effected by ³lexical axioms² of a form
illustrated by
(3) read¹(e,x,y) Ù text(y) É Syn(³read²,
e, Š, x, Š, y, Š)
This
says that if e is the
eventuality of x reading
y (the logical form
fragment supplied by the word ³read²), where y is a text (the selectional constraint
imposed by the verb ³read² on its object), then e can be described by a phrase headed by
the word ³read² provided it picks up, as subject and object, phrases of the
right sort describing x and
y.
To interpret a
sentence w, one seeks
to show it is a grammatical, interpretable string of words by proving there in
an eventuality e that
it describes, that is, by proving (1). One does so by decomposing it via
composition axioms like (2) and bottoming out in lexical axioms like (3). This
yields the logical form of the sentence, which then must be proved abductively,
the characterization of interpretation we gave in Section 1.3.
A substantial
fragment of English grammar is cast into this framework in Hobbs (1998), which
closely follows Pollard and Sag (1994).
When confronting
an entire coherent discourse by one or more speakers, one must break it into
interpretable segments and show that those segments themselves are coherently
related. That is, one must use a
rule like
Segment(w1, e1) Ù Segment(w2,
e2) Ù rel(e,e1,e2) É Segment(w1w2,
e)
That
is, if w1 and
w2 are
interpretable segments describing situations e1 and e2 respectively, and e1 and e2 stand in some relation rel to each other, then the concatenation of w1 and w2 constitutes an interpretable segment,
describing a situation e that
is determined by the relation. The possible relations are discussed further in
Section 4.
This rule
applies recursively and bottoms out in sentences.
Syn(w, e, Š) É Segment(w, e)
A grammatical,
interpretable sentence w describing
eventuality e is a
coherent segment of discourse describing e. This axiom effects the interface between syntax and
discourse structure. Syn is the predicate whose axioms
characterize syntactic structure; Segment is the predicate whose axioms characterize discourse
structure; and they meet in this axiom.
The predicate Segment
says that string w is
a coherent
description of an eventuality e;
the predicate Syn
says that string w is
a grammatical and interpretable
description of eventuality e;
and this axiom says that being grammatical and interpretable is one way of
being coherent.
To interpret a
discourse, we break it into coherently related successively smaller segments
until we reach the level of sentences. Then we do a syntactic analysis of the
sentences, bottoming out in their logical form, which we then prove
abductively.[5]
This view of
discourse interpretation is embedded in a view of interpretation in general in
which an agent, to interpret the environment, must find the best explanation
for the observables in that environment, which includes other agents.
An intelligent
agent is embedded in the world and must, at each instant, understand the
current situation. The agent does so by finding an explanation for what is
perceived. Put differently, the agent must explain why the complete set of
observables encountered constitutes a coherent situation. Other agents in the
environment are viewed as intentional, that is, as planning mechanisms, and
this means that the best explanation of their observable actions is most likely
to be that the actions are steps in a coherent plan. Thus, making sense of an
environment that includes other agents entails making sense of the other
agents' actions in terms of what they are intended to achieve. When those
actions are utterances, the utterances must be understood as actions in a plan
the agents are trying to effect. The speaker's plan must be recognized.
Generally, when
a speaker says something it is with the goal that the hearer believe the
content of the utterance, or think about it, or consider it, or take some other
cognitive stance toward it.[6] Let us subsume all these mental terms
under the term ³cognize². We can then say that to interpret a speaker A's utterance to B of some content, we must explain the
following:
goal(A, cognize(B, content-of-discourse)
Interpreting the
content of the discourse is what we described above. In addition to this, one
must explain in what way it serves the goals of the speaker to change the
mental state of the hearer to include some mental stance toward the content of
the discourse. We must fit the act of uttering that content into the speaker's
presumed plan.
The defeasible
axiom that encapsulates this is
(" s, h, e1, e, w)[goal(s, e1) Ù cognize¹(e1,
h, e) Ù Segment(w, e) É utter(s,
h, w)]
That
is, normally if a speaker s has
a goal e1 of the hearer h
cognizing a situation e
and w is a string of words that conveys e, then s will utter w to h. So if I have
the goal that you think about the existence of a fire, then since the word
³fire² conveys the concept of fire, I say ³Fire² to you. This axiom is only defeasible because
there are multiple strings w
that can convey e. I could have said, ³Something¹s
burning.²
We appeal to
this axiom to interpret the utterance as an intentional communicative act. That
is, if A utters to B
a string of words W, then to explain this observable event,
we have to prove utter(A,B,W).
That is, just as interpreting an observed flash of light is finding an
explanation for it, interpreting an observed utterance of a string W by one person A to another person B is to find an explanation for it. We begin to do this by backchaining on
the above axiom. Reasoning about the speaker's plan is a matter of establishing
the first two propositions in the antecedent of the axiom. Determining the
informational content of the utterance is a matter of establishing the third.
The two sides of the proof influence each other since they share variables and
since a minimal proof will result when both are explained and when their
explanations use much of the same knowledge.
Because of its
elegance and very broad coverage, the abduction model is very appealing on the
symbolic level. But to be a plausible candidate for how people understand
language, there must be an account of how it could be implemented in neurons.
In fact, the abduction framework can be realized in a structured connectionist
model called shruti developed by
Lokendra Shastri (Shastri and Ajjanagadde, 1993; Shastri, 1999). The key idea
is that nodes representing the same variable fire in synchrony. Substantial work must be done in
neurophysics to determine whether this kind of model is what actually exists in
the human brain, although there is suggestive evidence. A good recent review of the evidence
for the binding-via-synchrony hypothesis is given in Engel and Singer
(2001). A related article by Fell
et al. (2001) reports results on gamma band synchronization and
desynchronization between parahippocampal regions and the hippocampus proper
during episodic memory memorization.
By linking the
symbolic and connectionist levels, one at least provides a proof of possibility
for the abductive
framework.
There is a range
of connectionist models. Among
those that try to capture logical structure in the structure of the network,
there has been good success in implementing defeasible propositional logic. Indeed, nearly all the
applications to natural language processing in this tradition begin by setting
up the problem so that it is a problem in propositional logic. But this is not
adequate for natural language understanding in general. For example, the
coreference problem, e.g., resolving pronouns to their antecedents, requires
the expressivity of first-order logic even to state; it involves recognizing
the equality of two variables or a constant and a variable presented in
different places in the text. We need a way of expressing predicate-argument
relations and a way of expressing different instantiations of the same general
principle. We need a mechanism for universal instantiation, that is, the
binding of variables to specific entities. In the connectionist literature, this has gone under the
name of the variable-binding problem.
The essential
idea behind the shruti
architecture is simple and elegant. A predication is represented as an
assemblage or cluster of nodes, and axioms representing general knowledge are
realized as connections among these clusters. Inference is accomplished by
means of spreading activation through these structures.

Figure 1 Predicate cluster for p(x,y). The
collector node (+) fires asynchronously in proportion to how plausible it is
that p(x,y) is part of the desired proof. The enabler node (?) fires
asynchronously in proportion to
how much p(x,y) is required in the proof. The argument nodes for x and y
fire in synchrony with argument nodes in other predicate clusters that are
bound to the same variable.
p + x y
?
In the
cluster representing predications (Figure 1), two nodes, a collector node and
an enabler node, correspond to the predicate and fire asynchronously. That is, they don¹t need to fire
synchronously, in contrast to the ³argument nodes² described below; for the
collector and enabler nodes, only the level of activation matters. The level of activation on the enabler
node keeps track of the ³utility² of this predication in the proof that is
being searched for. That is, the activation is higher the greater the need to
find a proof for this predication, and thus the more expensive it is to assume.
For example, in interpreting ³The curtains are on fire,² it is very inportant
to prove curtains(x) and thereby identify which curtains are
being talked about; the level of activation on the enabler node for that
cluster would be high. The level
of activation on the collector node is higher the greater the plausibility that
this predication is part of the desired proof. Thus, if the speaker is standing in the living room, there
might be a higher activation on the collector node for curtains(c1) where c1 represents the curtains in the living
room than on curtains(c2), where c2 represents the curtains in the dining
room.
We can think of
the activations on the enabler nodes as prioritizing goal expressions, whereas
the activations on the collector nodes indicate degree of belief in the
predications, or more properly, degree of belief in the current relevance of
the predications. The connections between nodes of different predication
clusters have a strength of activation, or link weight, that corresponds to
strength of association between the two concepts. This is one way we can capture the defeasibility of axioms
in the shruti model. The proof process then consists of
activation spreading through enabler nodes, as we backchain through axioms, and
spreading forward through collector nodes from something known or assumed. In
addition, in the predication cluster, there are argument nodes, one for each
argument of the predication. These fire synchronously with the argument nodes
in other predication clusters to which they are connected. Thus, if the
clusters for p(x,
y) and q(z, x) are connected, with the two x nodes linked to each other, then the two x
nodes will fire in
synchrony, and the y and
z nodes will fire at
an offset with the x nodes
and with each other. This synchronous firing indicates that the two x nodes represent variables bound to the
same value. This constitutes the solution to the variable-binding problem. The role of variables in logic is to
capture the identity of entities referred to in different places in a logical
expression; in shruti this
identity is captured by the synchronous firing of linked nodes.
Proofs are
searched for in parallel, and winner-takes-all circuitry suppresses all but the
one whose collector nodes have the highest level of activation.
There are
complications in this model for such things as managing different predications
with the same predicate but different arguments. But the essential idea is as
described. In brief, the view of relational information processing implied by shruti is one where reasoning is a
transient but systematic propagation of rhythmic activity over structured cell-ensembles,
each active entity is a phase in the rhythmic activity, dynamic bindings are
represented by the synchronous firing
of appropriate nodes, and rules are high-efficacy links that cause the
propagation of rhythmic activity between cell-ensembles. Reasoning is the
spontaneous outcome of a shruti
network.
In the abduction
framework, the typical axiom in the knowledge base is of the form
(4) (" x,y)[p1(x,y) Ù p2(x,y) É ($ z)[q1(x,z) Ù q2(x,z)]]
That
is, the top-level logical connective will be implication. There may be multiple
predications in the antecedent and in the consequent. There may be variables (x) that occur in both the antecedent and
the consequent, variables (y)
that occur only in the antecedent, and variables (z) that occur only in the consequent.
Abduction backchains from predications in consequents of axioms to predications
in antecedents. That is, to prove the consequent of such a rule, it attempts to
find a proof of the antecedent.
Every step in the search for a proof can be considered an abductive
proof where all unproved predications are assumed for a cost. The best proof is
the least cost proof.
The
implementation of this axiom in shruti
requires predication clusters of nodes and axiom clusters of nodes (see Figure
1). A predication cluster, as described above, has one collector node and one
enabler node, both firing asynchronously, corresponding to the predicate and
one synchronously firing node for each argument. An axiom cluster has one
collector node and one enabler node, both firing asynchronously, recording the
plausibility and the utility, respectively, of this axiom participating in the
best proof. It also has one
synchronously firing node for each variable in the axiom -- in our example,
nodes for x, y and z. The collector and enabler nodes fire
asynchronously and what is significant is their level of activation or rate of
firing. The argument nodes fire
synchronously with other nodes, and what is significant is whether two nodes
are the same or different in their phases.
The axiom is
then encoded in a structure like that shown in Figure 2. There is a predication
cluster for each of the predications in the axiom and one axiom cluster that
links the predications of the consequent and antecedent. In general, the
predication clusters will occur in many axioms; this is why their linkage in a
particular axiom must be mediated by an axiom cluster.
Suppose (Figure
2) the proof process is backchaining from the predication q1(x,z). The activation on the enabler node (?) of the cluster for
q1(x,z) induces an activation on the enabler
node for the axiom cluster. This in turn induces activation on the enabler nodes
for predications p1(x,y) and p2(x,y). Meanwhile the firing of the x node in the q1 cluster induces the x node of the axiom cluster to fire in
synchrony with it, which in turn causes the x nodes of the p1 and p2 clusters to fire in synchrony as well.
In addition, a link (not shown) from the enabler node of the axiom cluster to
the y argument node
of the same cluster causes the y argument
node to fire, while links (not shown) from the x and z nodes cause that firing to be out of
phase with the firing of the x and
z nodes. This firing of
the y node of the
axiom cluster induces synchronous firing in the y nodes of the p1 and p2 clusters.

Figure 2 shruti encoding of axiom (" x,y)[p1(x,y) Ù p2(x,y) É ($ z)[q1(x,z) Ù q2(x,z)]]. Activation spreads backward from the enabler nodes (?) of
the q1 and q2 clusters to that of the Ax1
cluster and on to those of the p1 and p2 clusters,
indicating the utility of this axiom in a possible proof. Activation spreads forward from the
collector nodes (+) of the p1 and p2 clusters to that of
the axiom cluster Ax1 and on to those of the q1 and q2
clusters, indicating the plausibility of this axiom being used in the final
proof. Links between the argument
nodes cause them to fire in synchrony with other argument nodes representing
the same variable.
By this means we
have backchained over axiom (4) while keeping distinct the variables that are
bound to different values. We are then ready to backchain over axioms in which p1 and p2 are in the consequent. As mentioned
above, the q1 cluster
is linked to other axioms as well, and in the course of backchaining, it
induces activation in those axioms' clusters too. In this way, the search for a
proof proceeds in parallel. Inhibitory links suppress contradictory inferences
and will eventually force a winner-takes-all outcome.
In this
framework, incremental increases in linguistic competence, and other knowledge
as well, can be achieved by means of a small set of simple operations on the
axioms in the knowledge base:
1. The introduction of a new predicate,
where the utility of that predicate can be argued for cognition in general,
independent of language.
2. The introduction of a new predicate p
specializing an old
predicate q:
(" x) p(x) É q(x)
For
example, we learn that a beagle is a type of dog.
(" x) beagle(x) É dog(x)
3. The introduction of a new predicate p
generalizing one or more
old predicates qi:
(" x) q1(x) É p(x),
(" x) q2(x) É p(x), Š
For
example, we learn that dogs and cats are both mammals.
(" x) dog(x) É mammal(x),
(" x) cat(x) É mammal(x)
4. Increasing the arity of a predicate to
allow more arguments.
p(x) è p(x,y)
For
example, we learn that ³mother² is not a property but a relation.
mother(x) è mother(x,y)
5. Adding a proposition to the antecedent
of an axiom.
p1(x) É q(x) è
p1(x) Ù p2(x) É q(x)
For
example, we might first believe that a seat is a chair, then learn that a seat
with a back is a chair.
seat(x) É chair(x) è
seat(x) Ù back(y,x) É chair(x)
6. Adding a proposition to the consequent of an axiom.
p(x) É q1(x) è
p(x) É q1(x) Ù q2(x)
For
example, a child might see snow for the first time and see that it's white, and
then goes outside and realizes it's also cold.
snow(x) É white(x) è
snow(x) É white(x) Ù cold(x)
It was shown in
Section 1.7 that axioms such as these can be realized at the connectionist
level in the shruti model. To complete the picture, it must be
shown that these incremental changes to axioms could also be implemented at the
connectionist level. In fact,
Shastri and his colleagues have demonstrated that incremental changes such as
these can be implemented in the shruti
model via relatively simple means involving the recruitment of nodes, by
strengthening latent connections as a response to frequent simultaneous
activations (Shastri, 2001; Shastri and Wendelken, 2003; Wendelken and Shastri,
2003).
These
incremental operations can be seen as constituting a plausible mechanism for
both the development of cognitive capabilities in individuals and, whether
directly or indirectly through developmental processes, their evolution in
populations. In this paper, I will show how the principal features of language
could have resulted from a sequence of such incremental steps, starting from
the cognitive capacity one could expect of ordinary primates.
To summarize,
the framework assumed in this chapter has the following features:
A
detailed, plausible, computational model for a large range of linguistic
behavior.
A
possible implementation in a connectionist model.
An
incremental model of learning, development (physical maturation), and
evolution.
An
implementation of that in terms of node recruitment.
In the remainder
of the paper it is shown how two principal features of language – Gricean
meaning and syntax –could have arisen from nonlinguistic cognition
through the action of three mechanisms:
incremental
changes to axioms,
folk
theories required independent of language,
compilation
of proofs into axioms.
These two
features of language are, in a sense, the two key features of language. The
first, Gricean meaning, tells how single words convey meaning in discourse. The
second, syntax, tells how multiple words combine to convey complex meanings.
In Gricean non-natural
meaning, what is conveyed is not merely the content of the utterance, but also
the intention of the speaker to convey that meaning, and the intention of the
speaker to convey that meaning by means of that specific utterance. When A shouts
³Fire!² to B, A expects that
1. B will
believe there is a fire
2. B will
believe A wants B to believe there is fire
3. 1 will happen
because of 2
Five steps take
us from natural meaning, as in ³Smoke means fire,² to Gricean meaning (Grice,
1948). Each step depends on certain background theories being in place,
theories that are motivated even in the absence of language. Each new step in
the progression introduces a new element of defeasibility. The steps are as
follows:
1. Smoke means
fire
2. ³Fire!² means
fire
3. Mediation by
belief
4. Mediation by
intention
5. Full Gricean
meaning
Once we get into
theories of belief and intention, there is very little that is certain. Thus, virtually all the axioms used in
this section are defeasible. That
is, they are true most of the time, and they often participate in the best
explanation produced by abductive reasoning, but they are sometimes wrong. They are nevertheless useful to
intelligent agents.
The theories
that will be discussed in this section – belief, mutual belief,
intention, and collective action – are some of the key elements of a
theory of mind (e.g., Premack and Woodruff, 1978; Heyes, 1998; Gordon, this
volume). I discuss the possible
courses of evolution of a theory of mind in Section 4.
The first
required folk theory is a theory of causality (or rather, a number of theories with
causality). There will
be no definition of the predicate cause, that is, no set of necessary and sufficient conditions.
cause(e1, e2) º Š
Rather
there will be a number of domain-dependent theories saying what sorts of things
cause what other sorts of things. There will be lots of necessary conditions