Appendix 1. First-Order Logic
1. Predicate-Argument Relations
This book is necessarily heavy in logic, but we would like it to
appeal to a wide audience, including those who have not studied logic.
Fortunately, the logic we use here is neither deep nor complicated.
In this appendix we assume nothing more than a vague memory of high
school algebra, and present all of the logic the reader needs for
understanding the formulas in this book.
The two headlines
Dog Bites Man
and
Man Bites Dog
both convey the same lexical content. In both situations there are a
dog, a man, and a biting event. The two situations differ in the
respective roles of the dog and the man in the biting event. We can
represent the situations formally by positing a "predicate" -- "bite"
-- that takes two "arguments" -- a biter and a bitee. Then if a dog
named D1 bites a man named M2, we can represent the first situation by
the expression
(bite Biter: D1 Bitee: M2)
Suppose we know that for the predicate "bite" the Biter argument
always comes first and the Bitee second. Then we can dispense with
the labels and write more economically
(bite D1 M2)
This is the notation used in CommonLogic (Menzel, 20??). A more
common notation puts the left parenthesis after the predicate and
separates the arguments with a comma.
bite(D1,M2)
In this book we use the first notation in order to support the
CommonLogic standardization effort.
We generally try to have the order of the arguments be the same as in
the simplest English sentence that would express its meaning. Thus,
this expression means
D1 bites M2.
(A linguist one of the authors once worked with was intimidated by
logic at first, until he realized, as he put it, "logic is English in
Verb-Subject-Object order.")
Each predicate has a fixed number of arguments. For example, we might
say
(sleep M2)
to mean "M2 sleeps," and
(give M2 D1 W3)
to mean that M2 gives D1 to W3. As we are using them, the predicate
"sleep" always takes one argument, and the predicate "give" always
takes three. If a predicate takes n arguments, we call it an "n-ary
predicate".
The term in logic for a predicate applied to the appropriate number of
arguments is "positive literal", but this is very unintuitive
terminology for the nonlogician. In this book we will refer to such
an expression as a "proposition" or a "predication".
A proposition can be true or false. If we are able to identify the
dog D1 and the man M2, then the expression "(bite D1 M2)" is true
exactly when D1 bites M2.
2. Logical Connectives
Propositions can be combined by what are called "logical connectives".
The logical connectives we use in this book are "and", "not:, "or",
"if" and "iff" (pronounced "if and only if"). These connectives are,
in a sense, cleaned-up versions of the corresponding word(s) in
English. In logic, they are given precise definitions. The
connective "and" is used to combine two or more propositions, and the
resulting expression is true exactly when all of the propositions it
combines are true. Thus, the expression
(and (bite D1 M2)(bite M2 D1))
is true exactly when D1 bites M2 and M2 bites D1. This semantics of
"and" can be expressed in a truth table as follows:
(and P Q):
Q:
T F
-------
T | T | F |
P: -------
F | F | F |
-------
That is, if P and Q are both true (T), then so is (and P Q).
Otherwise, (and P Q) is false.
In ordinary English "and" usually means "and then", as in
Pat opened the window and a bird flew in.
or "and similarly", as in
Sugar is sweet and lemons are sour.
The two propositions being linked generally have something to do with
each other. But in logic this is not necessary. The sentence
Sugar is sweet and Los Angeles is diverse.
is a bit strange, but from the point of view of logic, it is simply
true.
The other connectives can be defined similarly.
(not P): (or P Q):
Q:
T F
--- -------
T | F | T | T | T |
P: --- P: -------
F | T | F | T | F |
--- -------
(if P Q): (iff P Q):
Q: Q:
T F T F
------- -------
T | T | F | T | T | F |
P: ------- P: -------
F | T | T | F | F | T |
------- -------
Several things should be pointed out about these definitions. In
English, the word "or" can be the exclusive "or", where both P and Q
can't be true, or the inclusive "or", where they can. Our logical
connective "or" is the latter. If both P and Q are true, then so is
(or P Q).
The connectives "and" and "or" can link two or more propositions. The
expression "(or P Q R)" is just an abbreviation for "(or P (or Q R))".
It's true when at least one of the three propositions is true.
The connective "if" differs the most from standard English usage. In
ordinary English, the two propositions joined by "if" usually have
some causal connection, as in
If Pat goes to the party, Chris will go too.
In logic there's no such requirement. The connective "if" is defined
in strictly logical terms. If P and Q are both true, then (if P Q) is
true, just as we'd expect. If P is true and Q is false, then (if P Q)
is false, again just as we'd expect. But if P is false and Q is
false, we might be inclined to call (if P Q) true, or we might call it
nonsense. If P is false and Q is true, we would probably call the
sentence either deceptive or nonsense. But in logic, in the latter
two cases, i.e., when P is false, we say that (if P Q) is true. For
example, by our definition the sentences
If salt is sweet, then sugar is bitter.
If salt is sweet, then sugar is sweet too.
are both true, even though most of us would say they are nonsense.
But we can perhaps see the motivation in this move from sentences like
If salt is sweet, then I'll eat my hat.
in which P and Q are both false and the sentence as a whole strikes us
as true. You're safe -- since salt isn't sweet, you don't have to eat
your hat.
We call "(and P Q)" a conjunction and P and Q its conjuncts. We call
"(or P Q)" a disjunction, and P and Q its disjuncts. We call
"(if P Q)" an implication, P its antecedent, and Q its consequent.
The connective "iff" is sometimes called an "equivalence" or a
"biconditional".
Expressions with logical connectives can be embedded arbitrarily
deeply in other expressions, and their truth or falsity can be
determined from the truth tables. For example, the following
expression is true:
(iff (or P (and Q R))(and (or P Q)(or P R)))
Just plug in all eight possible combinations of values T and F for P,
Q, and R, and use the tables to compute the truth value of the
expression. They all turn out to be true.
The expression (iff P Q) is equivalent to the expression
(and (if P Q)(if Q P))
as the reader can verify from the truth tables.
3. Variables and Quantifiers
In Section 1 we used the "names" of the dog and the man as arguments
of predications. In logic, those are called "constants". We can also
use variables as arguments. In high school algebra, variables stood
for numbers. In logic, variables can stand for any kind of entity.
In high school algebra, there were two distinct uses for variables.
We saw them in equations that had to be solved, as in
x^2 - 7x + 12 = 0
where x has to be 3 or 4. We also saw them in identities, such as
(x + y)(x - y) = x^2 - y^2
which is true for any x and y. The first formula is to be read as
saying, "there exists an x such that ...." The second formula is to
be read as saying, "for all x and y, ...."
In logic, we say the first formula is "existentially quantified", and
the second is "universally quantified". We represent this by means of
two "quantifiers" that in CommonLogic are called "exist" and
"forall". A quantified expression begins with a quantifier and is
followed by a list of variables and then the expression that the
variables occur in. Thus,
(exist (x y) (p x y))
says that there exist entities x and y such that the predicate p is
true of x and y. The expression
(forall (x y) (q x y))
says that for all entities x and y, the predicate q is true of x and
y. In this book we do not use variables that are not within the
"scope" of a quantifier that tells how they are to be interpreted.
These expressions can be embedded in other logical expressions,
arbitrarily deeply. Suppose we want to say that cars have engines.
Then we can write
(forall (x) (if (car x) (exist (y) (engineOf y x))))
This says that for all entities x, if x is a car, then there exists an
entity y such that y is the engine of x. If there is more than one
car, there can be more than one engine.
4. Rules of Inference
The principal use of logic is for reasoning. Suppose we know some
propositions to be true. Then logic provides the means for deriving
other true propositions. The means are "rules of inference". The two
most important rules of inference are Modus Ponens and Universal
Instantiation.
Modus Ponens is the rule
P, (if P Q) |- Q
That is, if we know P and we know (if P Q), then we can conclude
Q.
Universal Instantiation can be illustrated by the following rule,
where A is a constant:
(forall (x) (p x)) |- (p A)
This says that if we know that p is true of x for all x, then p must
be true of A, where A can be any constant. In the general version of
the rule, A is substituted for all occurrences of x in the expresssion
that forall scopes over. The "Universal" in Universal Instantiation
is because of the universal quantifier on the left side of the rule.
The "Instantiation" is because the right side of the rule is one
instance of the general pattern expressed on the left side.
Suppose we know that all persons are mortal.
(forall (x) (if (person x)(mortal x)))
and we know about an entity Socrates (S), that he is a person.
(person S).
Then by Universal Instantiation we can conclude
(if (person S)(mortal S))
and by Modus Ponens we can conclude
(mortal S).
That is, Socrates is mortal.
5. Axioms and Logical Theories
An axiom is a logical expression that we stipulate to be true. A
theorem is a logical expression that we can derive from axioms by
means of rules of inference.
A logical theory is characterized by a set of predicates and a set of
axioms in which the predicates occur. We can also talk about the
theorems of a logical theory as the set of theorems derivable from its
axioms.
A simple example of a logical theory is one that contains the
predicates "car", "engineOf" and "vehicle", plus the two axioms
(forall (x) (if (car x) (vehicle x)))
(forall (x) (if (car x) (exist (y) (engineOf y x))))
That is, cars are vehicles and have engines.
6. Models
A English description of a landscape is an object in a symbolic system
(English) being used to characterize some part of the world (the
hills, bushes and trees). Similarly, a painting of a landscape is an
object in a symbolic system (paintings) being used to characterize
some part of the world. We can look at the description or the
painting and at the hills, trees and bushes, and we can decide how
accurately the symbolic object corresponds to reality. Suppose the
English description says "There is a bush next to the tree." We can
check whether there really is a bush next to the tree. Similarly, if
there is a bush next to the tree, we can check whether the painting
depicts that.
Logic is also a symbolic system, and logical theories are objects
within that system. We can ask about a logical theory, does it give a
true description of that part of the world it purports to describe?
To answer this, we first have to say what set, or "domain", of
entities in the world the variables are intended to range over. We
have to say what the predicate symbols in the theory mean. That's
called an "interpretation". Then we have to check the world to see if
the axioms thus interpreted are really true. If they are, the
interpretation is a "model".
For example, consider the logical theory consisting of the predicates
"tree", "bush" and "nextTo", and the single axiom
(forall (x) (if (tree x) (exist (y) (and (bush y)(nextTo y x)))))
That is, every tree has a bush next to it. Let's say the variables x
and y range over all physical objects in Los Angeles. Suppose I've
decided a variable x denotes a particular tree. The interpretation of
x is easy enough: it's just the tree. But what about predicates? How
do I say what the interpretation of the predicattes "tree" and
"nextTo" are. For this we use set theoretic constructions. We will
say that the meaning of "tree" is the set of all those physical
objects that happen to be trees and similarly the meaning of "bush" is
the set of all the bushes. The meaning of "nextTo" is the set of all
ordered pairs of physical objects in Los Angeles such that the first
is next to the second. Now suppose we know what physical object we
want x to correspond to; then we will say that "(tree x)" is true
exactly when that physical object is in the interpretation of "tree".
That is, when it is in the set of all trees in Los Angeles. That is,
when it is in fact a tree. Similarly, if we know what two physical
objects we want x and y to correspond to, we will say "(nextTo x y)"
is true exactly when the ordered pair of those two physical objects is
in the interpretation of "nextTo", i.e., exactly when the first of
those physical objects is next to the second. Now we have our
interpretation. We can now check if it is a model by seeing if the
axiom holds. We do this by looking at every tree in Los Angeles and
checking whether it has a bush next to it. If so, we have a model.
If not, we don't.
In general, an interpretation of a theory is a set of entities, called
a "domain", together with an mapping from every n-ary predicate in the
theory to a set of n-tuples of the entities, for all n, namely, the
n-tuples of arguments for which the predicate is true. If the axioms
of the theory hold in the interpretation, it is a model.
Suppose we have a predicate "sum" with three arguments. We intend
the expression "(sum w x y)" to mean that w is the sum of x and y. We
posit two axioms involving "sum". The first is the Associative Law.
(forall (x y z w) (A1)
(iff (exist (u) (and (sum u y z)(sum w x u)))
(exist (v) (and (sum v x y)(sum w v z)))))
The second line says that u is the sum of y and z and w is the sum of
x and u. That is, we add y and z together first and add x to the sum
of those. The third line says that v is the sum of x and y and w is
the sum of v and z. That is, we add x and y together first and add
the sum of those and z. The fact that the same variable w, the sum of
all three, occurs in both lines says that we get the same answer
either way we do it.
The second axiom is the Commutative Law.
(forall (w x y) (A2)
(iff (sum w x y)(sum w y x)))
This says that if we take the sum of x and y, we get the same answer
w as when we take the sum of y and x. Order doesn't matter.
We said we intend "sum" to mean addition. Let's consider the theory
consisting of the predicate "sum" and the axioms (A1) and (A2). For
our domain, we pick the nonnegative integers. The interpretation of
the predicate "sum" is the set of all triples such that
w = x + y. Thus, the set would include triples such as <4,2,2> and
<10,4,6> but not triples such as <1,1,1> or <2,5,3>. The two axioms
are true under this interpretation, so the interpretation is a model.
But addition is not the only model. These two axioms are true for
multiplication as well. We could also say that (sum w x y) is true
whenever w is the product of x and y. Multiplication is also
associative and commutative, so multiplication of nonnegative integers
is also a model.
We can think of models of a theory as examples of the theory.
Two uses of models are in establishing the consistency of a theory and
in establishing the independence of axioms from each other.
A theory is inconsistent if one can prove a contradiction from its
axioms. It is a theorem in logic that if a theory has a model, then
it is consistent. So suppose we want to know if the theory of "sum"
with Axioms (A1) and (A2) is consistent. We could try to prove a
contradiction from the axioms. We would fail to do so, but that
wouldn't tell us the theory is consistent. Maybe we needed to try
harder to prove a contradiction. But we do know the theory is
consistent, because it has a model -- addition. Multiplication as
well.
Two axioms are independent if you can't prove either from the other.
So we can ask, are Axioms (A1) and (A2) independent? We try to prove
(A2) from (A1). Then we try to prove (A1) from (A2). We fail at
both. But that doesn't allow us to conclude they are independent.
Maybe we needed to try harder.
But it is a theorem of logic that two axioms are independent if there
is an interpretation that is a model of one but not the other.
Suppose we interpret "sum" as concatenation of strings. Axiom (A1)
holds under this interpretation, since concatenation is associative.
If we concatenate "ab" with "cd" yielding "abcd" and then concatenate
that with "ef", we get "abcdef". If we concatenate "cd" with "ef",
yielding "cdef", and then concatenate "ab" with that, we again get
"abcdef".
But Axiom (A2) does not hold under this interpretation. If we
concatenate "ab" with "cd" we get "abcd", whereas if we concatenate
"cd" with "ab", we get "cdab", a different result. Axiom (A1) is true
under this interpretation while Axiom (A2) is false, and therefore the
two axioms must be independent.
7. Definition and Characterization
The definition of a concept is an "if and only if" statement. If we
want to define "person" as a featherless biped, the axiom would be
(forall (x) (iff (person x) (and (biped x)(featherless x))))
That is, x is a person if and only if x is a biped and x is
featherless.
In commonsense knowledge very few concepts can be defined precisely.
For example, by the above definition a plucked chicken is a person.
The most we can hope to do is characterize the predicates by means of
axioms that involve them and thereby constrain what they can mean.
With no axioms, the predicate "person" can mean anything -- person,
dog, bird, telephone, and so on. Suppose we have the axiom
(forall (x) (if (person x) (biped x)))
where the predicate "biped" has been constrained to mean having
exactly two legs. Then "person" can no longer mean "dog" or
"telephone", because they do not have exactly two legs. But it can
still mean "bird". By adding the axiom
(forall (x) (if (person x) (featherless x)))
we rule out normal birds as persons. We have thus further constrained
the meaning of the predicate "person" by adding an axiom involving it.
The role of axioms in constraining predicates can be illustrated by an
example from an early version of a well-known ontology. The predicate
"near" was characterized by two axioms:
(forall (x y) (if (near x y) (not (equal x y))))
(forall (x y) (if (near x y) (near y x)))
The first axiom says something cannot be near itself. The second says
that "near" is symmetric; if x is near y, then y is near x. Because
of the first axiom, "near" cannot mean "equal". Because of the
second, it cannot mean "lessThan"; if x is less than y, then it is not
the case that y is less than x. But "near" can still mean "far"; both
axioms are still true if we substitute "far" for "near".
In general, increasing the number of axioms reduces the set of models.
If we have only Axiom (A1), we have ruled out interpreting "sum" as
subtraction; Subtraction is not associative. But we have not ruled
out addition, multiplication, or the concatenation of strings. They
are all associative. When we add Axiom (A2), we can still interpret
"sum" as addition or multiplication, but it can no longer be
concatenation. Concatenating "ab" and "cd" does not give the same
result as concatenating "cd" and "ab". We have added an axiom and
thereby eliminated some of the models.
Where we have an axiom of the form
(forall (x) (if (p x) (q x)))
we say that (q x) is a necessary condition for (p x). If p is true of
x, then q is necessarily true as well. Where we have an axiom of the
form
(forall (x) (if (q x) (p x)))
we say that (q x) is a sufficient condition for (p x). q being true
of x is sufficient for making p be true of x. Where we have an axiom
of the form
(forall (x) (iff (p x) (q x)))
we say that (q x) is a necessary and sufficient condition for (p x).
For most interesting concepts in commonsense knowledge, we will not
have necessary and sufficient conditions, but we will have lots of
necessary conditions and lots of sufficient conditions. Even if we
cannot state precisely the meaning of "near", we can state enough
axioms involving "near" to rule out its being interpreted as "far".
The idea of characterizing concepts rather than defining them
contrasts with the "Euclidean" program familiar from geometry and set
theory. There we take certain predicates to be primitive, and define
all other predicates, ultimately, in terms of these. For example, in
set theory, we take the predicate "member" to be primitive; we don't
attempt to say formally what it means. Then we define a more complex
concept like "subset" in terms of this.
(forall (s1 s2)
(iff (subset s1 s2)
(forall (x) (if (member x s1)(member x s2)))))
That is, a set s1 is a subset of another set s2 if and only if every
member of s1 is also a member of s2.
In our approach it is as if nearly every predicate is a primitive.
But the set of axioms they participate in constrain their possible
interpretations more or less tightly. If we have only a small number
of axioms constraining our predicates, we may be able to invent some
bizarre interpretation in which the axioms are true. But the more
axioms we add, the more likely it is that the only interpretation we
can find is the one we intend.
8. Common Patterns in Axioms
Some of the axioms in this book can appear very formidable, being ten
or more lines long and having a large number of embeddings. But most
axioms can be seen as examples of a fairly small number of patterns.
Here we point out some of the most common patterns.
Let's start off with the simplest pattern. Suppose we want to say a
car is a vehicle. We can write
(forall (x) (if (car x) (vehicle x)))
That is, if x is a car, it's also a vehicle.
If we are able to state that two conditions are equivalent, we use
"iff" in place of "if".
(forall (x) (iff (car x) (automobile x)))
That is, x is a car if and only if it's an automobile.
Often there will be conjunctions of conditions in the antecedent and
consequent of "if" or "iff". For example,
(forall (x) (if (and (car x)(intact x))
(and (vehicle x)(motorized x))))
says that an intact car is a motorized vehicle.
Suppose we want to say a car has an engine. We might take "engineOf"
to be a two-place predicate -- "y is the engine of x". Then the axiom
would be
(forall (x) (if (car x)
(exist (y) (engineOf y x))))
That is, if x is a car, there is a y that is the engine of x. Or we
could say that "engine" is a one-place predicate describing the kind
of thing y is, and there is another predicate "in" that will be used
to say the engine is in the car.
(forall (x) (if (car x)
(exist (y) (and (engine y)(in y x)))))
That is, if x is a car, there is a y that is an engine, and y is in
x.
In general, when you see a universal quantifier, you should expect to
see an "if" as the next logical operator.
(forall (x ...) (if ...))
Very few if any properties are true of every thing in the universe, so
the antecedent of the "if" operator constrains the universal
quantifier to only those things that we think should have the
property. This will be the top-level structure of almost every axiom.
When you see an existential quantifier, you will often see an "and" as
the next logical operator.
(exist (x ...) (and ...))
When we posit the existence of something, we generally have more than
one property in mind that we want it to have.
Now suppose we want to say a car has four wheels. There is a set.
The set has four members; this will be represented by the predication
(card 4 s), i.e., 4 is the cardinality of s. Each member is a wheel.
The following axiom expresses this.
(forall (x)
(if (car x)
(exist (s)
(and (set s)
(card 4 s)
(forall (y) (if (member y s)
(and (wheel y)(part y x))))))))
Line 2 introduces a car x. Lines 3 and 4 posit the existence of a set
s for that car. Line 5 says that 4 is the cardinality of that set.
Lines 6-7 say that every member y of s is a wheel and a part of x.
In general, when we posit the existence of a set, we will see an
"exist" quantifier and a variable (s) standing for the set. Then there
will usually be a conjunction ("and") of properties of s, and a
"forall" expression that says something about all the members of s.
Sometimes we want to overload a predicate and have it mean something
slightly different for different kinds of arguments. For example, we
make use of the two-place predicate "part" in several different
contexts. A member or subset is a part of a set. A component is part
of a device, e.g, a wheel is a part of a car. A bit of material is a
part of the larger bit of material it is embedded in. There may be
other uses we want for "part" as we develop the ontology. Let us
define the predicate just for sets.
(forall (x s)
(if (set s)
(iff (part x s)
(or (member x s)(subset x s)))))
That is, if s is a set, then x is part of s if and only if x is a
member or subset of s. Line 2 of this axiom has the effect of
restricting this definition of "part" to only those cases where the
second argument is a set. The general pattern is
(forall (...)
(if
))
The predicate of interest is then available to be defined in
different ways for different types of arguments.
These patterns, together with the glosses we give for every axiom,
should be enough to enable the reader to understand the axioms in
this book, though sometimes with a little bit of effort.