Appendix 1. First-Order Logic 1. Predicate-Argument Relations This book is necessarily heavy in logic, but we would like it to appeal to a wide audience, including those who have not studied logic. Fortunately, the logic we use here is neither deep nor complicated. In this appendix we assume nothing more than a vague memory of high school algebra, and present all of the logic the reader needs for understanding the formulas in this book. The two headlines Dog Bites Man and Man Bites Dog both convey the same lexical content. In both situations there are a dog, a man, and a biting event. The two situations differ in the respective roles of the dog and the man in the biting event. We can represent the situations formally by positing a "predicate" -- "bite" -- that takes two "arguments" -- a biter and a bitee. Then if a dog named D1 bites a man named M2, we can represent the first situation by the expression (bite Biter: D1 Bitee: M2) Suppose we know that for the predicate "bite" the Biter argument always comes first and the Bitee second. Then we can dispense with the labels and write more economically (bite D1 M2) This is the notation used in CommonLogic (Menzel, 20??). A more common notation puts the left parenthesis after the predicate and separates the arguments with a comma. bite(D1,M2) In this book we use the first notation in order to support the CommonLogic standardization effort. We generally try to have the order of the arguments be the same as in the simplest English sentence that would express its meaning. Thus, this expression means D1 bites M2. (A linguist one of the authors once worked with was intimidated by logic at first, until he realized, as he put it, "logic is English in Verb-Subject-Object order.") Each predicate has a fixed number of arguments. For example, we might say (sleep M2) to mean "M2 sleeps," and (give M2 D1 W3) to mean that M2 gives D1 to W3. As we are using them, the predicate "sleep" always takes one argument, and the predicate "give" always takes three. If a predicate takes n arguments, we call it an "n-ary predicate". The term in logic for a predicate applied to the appropriate number of arguments is "positive literal", but this is very unintuitive terminology for the nonlogician. In this book we will refer to such an expression as a "proposition" or a "predication". A proposition can be true or false. If we are able to identify the dog D1 and the man M2, then the expression "(bite D1 M2)" is true exactly when D1 bites M2. 2. Logical Connectives Propositions can be combined by what are called "logical connectives". The logical connectives we use in this book are "and", "not:, "or", "if" and "iff" (pronounced "if and only if"). These connectives are, in a sense, cleaned-up versions of the corresponding word(s) in English. In logic, they are given precise definitions. The connective "and" is used to combine two or more propositions, and the resulting expression is true exactly when all of the propositions it combines are true. Thus, the expression (and (bite D1 M2)(bite M2 D1)) is true exactly when D1 bites M2 and M2 bites D1. This semantics of "and" can be expressed in a truth table as follows: (and P Q): Q: T F ------- T | T | F | P: ------- F | F | F | ------- That is, if P and Q are both true (T), then so is (and P Q). Otherwise, (and P Q) is false. In ordinary English "and" usually means "and then", as in Pat opened the window and a bird flew in. or "and similarly", as in Sugar is sweet and lemons are sour. The two propositions being linked generally have something to do with each other. But in logic this is not necessary. The sentence Sugar is sweet and Los Angeles is diverse. is a bit strange, but from the point of view of logic, it is simply true. The other connectives can be defined similarly. (not P): (or P Q): Q: T F --- ------- T | F | T | T | T | P: --- P: ------- F | T | F | T | F | --- ------- (if P Q): (iff P Q): Q: Q: T F T F ------- ------- T | T | F | T | T | F | P: ------- P: ------- F | T | T | F | F | T | ------- ------- Several things should be pointed out about these definitions. In English, the word "or" can be the exclusive "or", where both P and Q can't be true, or the inclusive "or", where they can. Our logical connective "or" is the latter. If both P and Q are true, then so is (or P Q). The connectives "and" and "or" can link two or more propositions. The expression "(or P Q R)" is just an abbreviation for "(or P (or Q R))". It's true when at least one of the three propositions is true. The connective "if" differs the most from standard English usage. In ordinary English, the two propositions joined by "if" usually have some causal connection, as in If Pat goes to the party, Chris will go too. In logic there's no such requirement. The connective "if" is defined in strictly logical terms. If P and Q are both true, then (if P Q) is true, just as we'd expect. If P is true and Q is false, then (if P Q) is false, again just as we'd expect. But if P is false and Q is false, we might be inclined to call (if P Q) true, or we might call it nonsense. If P is false and Q is true, we would probably call the sentence either deceptive or nonsense. But in logic, in the latter two cases, i.e., when P is false, we say that (if P Q) is true. For example, by our definition the sentences If salt is sweet, then sugar is bitter. If salt is sweet, then sugar is sweet too. are both true, even though most of us would say they are nonsense. But we can perhaps see the motivation in this move from sentences like If salt is sweet, then I'll eat my hat. in which P and Q are both false and the sentence as a whole strikes us as true. You're safe -- since salt isn't sweet, you don't have to eat your hat. We call "(and P Q)" a conjunction and P and Q its conjuncts. We call "(or P Q)" a disjunction, and P and Q its disjuncts. We call "(if P Q)" an implication, P its antecedent, and Q its consequent. The connective "iff" is sometimes called an "equivalence" or a "biconditional". Expressions with logical connectives can be embedded arbitrarily deeply in other expressions, and their truth or falsity can be determined from the truth tables. For example, the following expression is true: (iff (or P (and Q R))(and (or P Q)(or P R))) Just plug in all eight possible combinations of values T and F for P, Q, and R, and use the tables to compute the truth value of the expression. They all turn out to be true. The expression (iff P Q) is equivalent to the expression (and (if P Q)(if Q P)) as the reader can verify from the truth tables. 3. Variables and Quantifiers In Section 1 we used the "names" of the dog and the man as arguments of predications. In logic, those are called "constants". We can also use variables as arguments. In high school algebra, variables stood for numbers. In logic, variables can stand for any kind of entity. In high school algebra, there were two distinct uses for variables. We saw them in equations that had to be solved, as in x^2 - 7x + 12 = 0 where x has to be 3 or 4. We also saw them in identities, such as (x + y)(x - y) = x^2 - y^2 which is true for any x and y. The first formula is to be read as saying, "there exists an x such that ...." The second formula is to be read as saying, "for all x and y, ...." In logic, we say the first formula is "existentially quantified", and the second is "universally quantified". We represent this by means of two "quantifiers" that in CommonLogic are called "exist" and "forall". A quantified expression begins with a quantifier and is followed by a list of variables and then the expression that the variables occur in. Thus, (exist (x y) (p x y)) says that there exist entities x and y such that the predicate p is true of x and y. The expression (forall (x y) (q x y)) says that for all entities x and y, the predicate q is true of x and y. In this book we do not use variables that are not within the "scope" of a quantifier that tells how they are to be interpreted. These expressions can be embedded in other logical expressions, arbitrarily deeply. Suppose we want to say that cars have engines. Then we can write (forall (x) (if (car x) (exist (y) (engineOf y x)))) This says that for all entities x, if x is a car, then there exists an entity y such that y is the engine of x. If there is more than one car, there can be more than one engine. 4. Rules of Inference The principal use of logic is for reasoning. Suppose we know some propositions to be true. Then logic provides the means for deriving other true propositions. The means are "rules of inference". The two most important rules of inference are Modus Ponens and Universal Instantiation. Modus Ponens is the rule P, (if P Q) |- Q That is, if we know P and we know (if P Q), then we can conclude Q. Universal Instantiation can be illustrated by the following rule, where A is a constant: (forall (x) (p x)) |- (p A) This says that if we know that p is true of x for all x, then p must be true of A, where A can be any constant. In the general version of the rule, A is substituted for all occurrences of x in the expresssion that forall scopes over. The "Universal" in Universal Instantiation is because of the universal quantifier on the left side of the rule. The "Instantiation" is because the right side of the rule is one instance of the general pattern expressed on the left side. Suppose we know that all persons are mortal. (forall (x) (if (person x)(mortal x))) and we know about an entity Socrates (S), that he is a person. (person S). Then by Universal Instantiation we can conclude (if (person S)(mortal S)) and by Modus Ponens we can conclude (mortal S). That is, Socrates is mortal. 5. Axioms and Logical Theories An axiom is a logical expression that we stipulate to be true. A theorem is a logical expression that we can derive from axioms by means of rules of inference. A logical theory is characterized by a set of predicates and a set of axioms in which the predicates occur. We can also talk about the theorems of a logical theory as the set of theorems derivable from its axioms. A simple example of a logical theory is one that contains the predicates "car", "engineOf" and "vehicle", plus the two axioms (forall (x) (if (car x) (vehicle x))) (forall (x) (if (car x) (exist (y) (engineOf y x)))) That is, cars are vehicles and have engines. 6. Models A English description of a landscape is an object in a symbolic system (English) being used to characterize some part of the world (the hills, bushes and trees). Similarly, a painting of a landscape is an object in a symbolic system (paintings) being used to characterize some part of the world. We can look at the description or the painting and at the hills, trees and bushes, and we can decide how accurately the symbolic object corresponds to reality. Suppose the English description says "There is a bush next to the tree." We can check whether there really is a bush next to the tree. Similarly, if there is a bush next to the tree, we can check whether the painting depicts that. Logic is also a symbolic system, and logical theories are objects within that system. We can ask about a logical theory, does it give a true description of that part of the world it purports to describe? To answer this, we first have to say what set, or "domain", of entities in the world the variables are intended to range over. We have to say what the predicate symbols in the theory mean. That's called an "interpretation". Then we have to check the world to see if the axioms thus interpreted are really true. If they are, the interpretation is a "model". For example, consider the logical theory consisting of the predicates "tree", "bush" and "nextTo", and the single axiom (forall (x) (if (tree x) (exist (y) (and (bush y)(nextTo y x))))) That is, every tree has a bush next to it. Let's say the variables x and y range over all physical objects in Los Angeles. Suppose I've decided a variable x denotes a particular tree. The interpretation of x is easy enough: it's just the tree. But what about predicates? How do I say what the interpretation of the predicattes "tree" and "nextTo" are. For this we use set theoretic constructions. We will say that the meaning of "tree" is the set of all those physical objects that happen to be trees and similarly the meaning of "bush" is the set of all the bushes. The meaning of "nextTo" is the set of all ordered pairs of physical objects in Los Angeles such that the first is next to the second. Now suppose we know what physical object we want x to correspond to; then we will say that "(tree x)" is true exactly when that physical object is in the interpretation of "tree". That is, when it is in the set of all trees in Los Angeles. That is, when it is in fact a tree. Similarly, if we know what two physical objects we want x and y to correspond to, we will say "(nextTo x y)" is true exactly when the ordered pair of those two physical objects is in the interpretation of "nextTo", i.e., exactly when the first of those physical objects is next to the second. Now we have our interpretation. We can now check if it is a model by seeing if the axiom holds. We do this by looking at every tree in Los Angeles and checking whether it has a bush next to it. If so, we have a model. If not, we don't. In general, an interpretation of a theory is a set of entities, called a "domain", together with an mapping from every n-ary predicate in the theory to a set of n-tuples of the entities, for all n, namely, the n-tuples of arguments for which the predicate is true. If the axioms of the theory hold in the interpretation, it is a model. Suppose we have a predicate "sum" with three arguments. We intend the expression "(sum w x y)" to mean that w is the sum of x and y. We posit two axioms involving "sum". The first is the Associative Law. (forall (x y z w) (A1) (iff (exist (u) (and (sum u y z)(sum w x u))) (exist (v) (and (sum v x y)(sum w v z))))) The second line says that u is the sum of y and z and w is the sum of x and u. That is, we add y and z together first and add x to the sum of those. The third line says that v is the sum of x and y and w is the sum of v and z. That is, we add x and y together first and add the sum of those and z. The fact that the same variable w, the sum of all three, occurs in both lines says that we get the same answer either way we do it. The second axiom is the Commutative Law. (forall (w x y) (A2) (iff (sum w x y)(sum w y x))) This says that if we take the sum of x and y, we get the same answer w as when we take the sum of y and x. Order doesn't matter. We said we intend "sum" to mean addition. Let's consider the theory consisting of the predicate "sum" and the axioms (A1) and (A2). For our domain, we pick the nonnegative integers. The interpretation of the predicate "sum" is the set of all triples such that w = x + y. Thus, the set would include triples such as <4,2,2> and <10,4,6> but not triples such as <1,1,1> or <2,5,3>. The two axioms are true under this interpretation, so the interpretation is a model. But addition is not the only model. These two axioms are true for multiplication as well. We could also say that (sum w x y) is true whenever w is the product of x and y. Multiplication is also associative and commutative, so multiplication of nonnegative integers is also a model. We can think of models of a theory as examples of the theory. Two uses of models are in establishing the consistency of a theory and in establishing the independence of axioms from each other. A theory is inconsistent if one can prove a contradiction from its axioms. It is a theorem in logic that if a theory has a model, then it is consistent. So suppose we want to know if the theory of "sum" with Axioms (A1) and (A2) is consistent. We could try to prove a contradiction from the axioms. We would fail to do so, but that wouldn't tell us the theory is consistent. Maybe we needed to try harder to prove a contradiction. But we do know the theory is consistent, because it has a model -- addition. Multiplication as well. Two axioms are independent if you can't prove either from the other. So we can ask, are Axioms (A1) and (A2) independent? We try to prove (A2) from (A1). Then we try to prove (A1) from (A2). We fail at both. But that doesn't allow us to conclude they are independent. Maybe we needed to try harder. But it is a theorem of logic that two axioms are independent if there is an interpretation that is a model of one but not the other. Suppose we interpret "sum" as concatenation of strings. Axiom (A1) holds under this interpretation, since concatenation is associative. If we concatenate "ab" with "cd" yielding "abcd" and then concatenate that with "ef", we get "abcdef". If we concatenate "cd" with "ef", yielding "cdef", and then concatenate "ab" with that, we again get "abcdef". But Axiom (A2) does not hold under this interpretation. If we concatenate "ab" with "cd" we get "abcd", whereas if we concatenate "cd" with "ab", we get "cdab", a different result. Axiom (A1) is true under this interpretation while Axiom (A2) is false, and therefore the two axioms must be independent. 7. Definition and Characterization The definition of a concept is an "if and only if" statement. If we want to define "person" as a featherless biped, the axiom would be (forall (x) (iff (person x) (and (biped x)(featherless x)))) That is, x is a person if and only if x is a biped and x is featherless. In commonsense knowledge very few concepts can be defined precisely. For example, by the above definition a plucked chicken is a person. The most we can hope to do is characterize the predicates by means of axioms that involve them and thereby constrain what they can mean. With no axioms, the predicate "person" can mean anything -- person, dog, bird, telephone, and so on. Suppose we have the axiom (forall (x) (if (person x) (biped x))) where the predicate "biped" has been constrained to mean having exactly two legs. Then "person" can no longer mean "dog" or "telephone", because they do not have exactly two legs. But it can still mean "bird". By adding the axiom (forall (x) (if (person x) (featherless x))) we rule out normal birds as persons. We have thus further constrained the meaning of the predicate "person" by adding an axiom involving it. The role of axioms in constraining predicates can be illustrated by an example from an early version of a well-known ontology. The predicate "near" was characterized by two axioms: (forall (x y) (if (near x y) (not (equal x y)))) (forall (x y) (if (near x y) (near y x))) The first axiom says something cannot be near itself. The second says that "near" is symmetric; if x is near y, then y is near x. Because of the first axiom, "near" cannot mean "equal". Because of the second, it cannot mean "lessThan"; if x is less than y, then it is not the case that y is less than x. But "near" can still mean "far"; both axioms are still true if we substitute "far" for "near". In general, increasing the number of axioms reduces the set of models. If we have only Axiom (A1), we have ruled out interpreting "sum" as subtraction; Subtraction is not associative. But we have not ruled out addition, multiplication, or the concatenation of strings. They are all associative. When we add Axiom (A2), we can still interpret "sum" as addition or multiplication, but it can no longer be concatenation. Concatenating "ab" and "cd" does not give the same result as concatenating "cd" and "ab". We have added an axiom and thereby eliminated some of the models. Where we have an axiom of the form (forall (x) (if (p x) (q x))) we say that (q x) is a necessary condition for (p x). If p is true of x, then q is necessarily true as well. Where we have an axiom of the form (forall (x) (if (q x) (p x))) we say that (q x) is a sufficient condition for (p x). q being true of x is sufficient for making p be true of x. Where we have an axiom of the form (forall (x) (iff (p x) (q x))) we say that (q x) is a necessary and sufficient condition for (p x). For most interesting concepts in commonsense knowledge, we will not have necessary and sufficient conditions, but we will have lots of necessary conditions and lots of sufficient conditions. Even if we cannot state precisely the meaning of "near", we can state enough axioms involving "near" to rule out its being interpreted as "far". The idea of characterizing concepts rather than defining them contrasts with the "Euclidean" program familiar from geometry and set theory. There we take certain predicates to be primitive, and define all other predicates, ultimately, in terms of these. For example, in set theory, we take the predicate "member" to be primitive; we don't attempt to say formally what it means. Then we define a more complex concept like "subset" in terms of this. (forall (s1 s2) (iff (subset s1 s2) (forall (x) (if (member x s1)(member x s2))))) That is, a set s1 is a subset of another set s2 if and only if every member of s1 is also a member of s2. In our approach it is as if nearly every predicate is a primitive. But the set of axioms they participate in constrain their possible interpretations more or less tightly. If we have only a small number of axioms constraining our predicates, we may be able to invent some bizarre interpretation in which the axioms are true. But the more axioms we add, the more likely it is that the only interpretation we can find is the one we intend. 8. Common Patterns in Axioms Some of the axioms in this book can appear very formidable, being ten or more lines long and having a large number of embeddings. But most axioms can be seen as examples of a fairly small number of patterns. Here we point out some of the most common patterns. Let's start off with the simplest pattern. Suppose we want to say a car is a vehicle. We can write (forall (x) (if (car x) (vehicle x))) That is, if x is a car, it's also a vehicle. If we are able to state that two conditions are equivalent, we use "iff" in place of "if". (forall (x) (iff (car x) (automobile x))) That is, x is a car if and only if it's an automobile. Often there will be conjunctions of conditions in the antecedent and consequent of "if" or "iff". For example, (forall (x) (if (and (car x)(intact x)) (and (vehicle x)(motorized x)))) says that an intact car is a motorized vehicle. Suppose we want to say a car has an engine. We might take "engineOf" to be a two-place predicate -- "y is the engine of x". Then the axiom would be (forall (x) (if (car x) (exist (y) (engineOf y x)))) That is, if x is a car, there is a y that is the engine of x. Or we could say that "engine" is a one-place predicate describing the kind of thing y is, and there is another predicate "in" that will be used to say the engine is in the car. (forall (x) (if (car x) (exist (y) (and (engine y)(in y x))))) That is, if x is a car, there is a y that is an engine, and y is in x. In general, when you see a universal quantifier, you should expect to see an "if" as the next logical operator. (forall (x ...) (if ...)) Very few if any properties are true of every thing in the universe, so the antecedent of the "if" operator constrains the universal quantifier to only those things that we think should have the property. This will be the top-level structure of almost every axiom. When you see an existential quantifier, you will often see an "and" as the next logical operator. (exist (x ...) (and ...)) When we posit the existence of something, we generally have more than one property in mind that we want it to have. Now suppose we want to say a car has four wheels. There is a set. The set has four members; this will be represented by the predication (card 4 s), i.e., 4 is the cardinality of s. Each member is a wheel. The following axiom expresses this. (forall (x) (if (car x) (exist (s) (and (set s) (card 4 s) (forall (y) (if (member y s) (and (wheel y)(part y x)))))))) Line 2 introduces a car x. Lines 3 and 4 posit the existence of a set s for that car. Line 5 says that 4 is the cardinality of that set. Lines 6-7 say that every member y of s is a wheel and a part of x. In general, when we posit the existence of a set, we will see an "exist" quantifier and a variable (s) standing for the set. Then there will usually be a conjunction ("and") of properties of s, and a "forall" expression that says something about all the members of s. Sometimes we want to overload a predicate and have it mean something slightly different for different kinds of arguments. For example, we make use of the two-place predicate "part" in several different contexts. A member or subset is a part of a set. A component is part of a device, e.g, a wheel is a part of a car. A bit of material is a part of the larger bit of material it is embedded in. There may be other uses we want for "part" as we develop the ontology. Let us define the predicate just for sets. (forall (x s) (if (set s) (iff (part x s) (or (member x s)(subset x s))))) That is, if s is a set, then x is part of s if and only if x is a member or subset of s. Line 2 of this axiom has the effect of restricting this definition of "part" to only those cases where the second argument is a set. The general pattern is (forall (...) (if )) The predicate of interest is then available to be defined in different ways for different types of arguments. These patterns, together with the glosses we give for every axiom, should be enough to enable the reader to understand the axioms in this book, though sometimes with a little bit of effort.