Peirce's rules of inference
sowa@watson.ibm.com
Message-id: <199208280557.AA09863@cs.umn.edu>
Date: Fri, 28 Aug 92 01:57:48 EDT
From: sowa@watson.ibm.com
To: PHAYES@HERODOTUS.CS.UIUC.EDU
Cc: INTERLINGUA@ISI.EDU, SRKB@ISI.EDU, CG@cs.umn.edu
Subject: Peirce's rules of inference
Pat,
> First, your comparison with resolution doesn't seem quite fair, since you
> include the cost of transforming from Fregean notation to clausal form,
> whereas you ignore the cost of transforming from it to Pierce's notation.
Not from Fregean notation, but from Peirce's original 1883 notation.
Frege's notation was very different. It had only three primitives:
implication, negation, and the universal quantifier. To express
(Ex)(p(x) & q(x)),
you had to say the equivalent of
^(Ax)(p(x) -> ^q(x)).
Ernst Schroeder was aware of Frege's system, but he chose Peirce's
notation because Frege's was so awkward for very simple statements.
Peirce started with Boolean algebra with "+" for "or" and "." for "and",
and used Sigma for the existential quantifier (generalized "or") and
Pi for the universal (generalized "and"). It was Peano who changed
the symbols because he wanted to mix mathematical operators and logical
operators in the same formula.
In any case, I agree with you that the conversion to clause form is a
simple compilation step that can be done algorithmically in linear time.
I really meant that note as a comparison with the Principia system,
which I really believe is "the world's worst proof procedure",
especially considering that it came 14 years after Peirce's system.
> (And by the way, how do you get the figure of eleven steps to do that
> conversion? This suggests that this conversion is an inference process,
> whereas it is of course a simple algorithm which involves no search or
> heuristics, and can be made to happen extremely quickly.) Theorem-proving
> folk quite rapidly get used to writing axioms in clausal form, so that
> Prolog for example is written in Horn clauses, as you know. Therefore if we
> compare the resolution world to Pierce's, we find that the proof of
> proposition 3.47 is rather shorter in resolution than in Pierce's notation.
Yes, it is true that proficient programmers in language X learn to think,
eat, sleep, and dream in language X. But as I said, the main point
was to compare Peirce's system to the Principia (14 years later) and
Gentzen's system (38 years later) and only incidentally to resolution
(69 years later).
> But moreover, crucially for computation, it is conservative: that is, at no
> step is the inference process required to introduce a new symbol,
> exercising creativity. That was exactly the great advance that resolution
> represented over earlier theorem-proving methods, since without this
> property, the machine must search through the Herbrand universe, a hopeless
> task. Starting from a blank sheet of paper is a recipe for disaster in a
> mechanical system.
Actually, Peirce's system is very similar to Gentzen's natural deduction,
which some people (including Woody Bledsoe) have suggested as a useful
alternative to resolution for many purposes. Alonzo Church defined a
system of "natural deduction" as one that starts with a blank sheet as
its only axiom. In that sense, Peirce's system qualifies.
In any case, when you prove theorems in Peirce's system or in Gentzen's
system, a common approach is to take the theorem to be proved as the
goal and to work backwards to see what steps are needed to construct it.
That usually makes the proof quite straightforward.
Furthermore, the first two steps in a proof by P's rules are always the
same: (1) draw a double negation around the empty set; (2) insert the
antecedent of the implication into the if-context. With that approach,
there is no searching whatever for the first 3 steps of that example.
> While you correctly observe that clausal form sometimes looks rather unlike
> the nonclausal version of an expression, there are two quite reasonable
> replies to this. One is that this doesn't matter in practice (cf. Prolog).
> The other is that what makes clausal form sometimes rather unnatural is
> that it involves distributive 'multiplication' of AND over OR. But it is
> fairly easy to define a less awkward normal form which has negations pushed
> inwards using DeMorgans laws but does not do this distribution, and
> redefine resolution on that form. This is rarely done for the first reason,
> but see D. Wilkins, "A nonclausal theorem-proving system", 1st AISB
> conference 1974. Wilkins system has the 'remarkable property' you mention,
> of allowing inferences to move down many depths of nesting.
Yes, but. The excerpt that I sent in my last note was taken from a paper
about the use of P's system for reasoning about nests of encapsulated
contexts in object-oriented systems where the nesting gets rather deep.
For that purpose, it is useful to have a system where the rules of
inference are stated in terms of the conditions for moving things in
and out of contexts.
> You say that Pierces graphs have no variables, but you don't tell us how
> then they express quantification. Your account leaves them with only
> propositional expressibility.
Peirce started with graphs in 1882 for expressing his work on relations.
In that form, he assumed the existential as the default quantifier.
That is also the default that I assume for conceptual graphs: a single
concept node [CAT] may be translated to (Ex)cat(x).
His original graphs were very similar to many of the semantic networks
of the 1960s that could express relations quite nicely, but had trouble
with negation, disjunction, and the universal quantifier. His discovery
of 1896 was that by combining the relational graphs with an oval for
a negative context, you got a complete system of FOL with some very
nice rules of inference.
His equivalent of a variable was a coreference link between nodes in
different contexts. In conceptual graphs, I draw the coreference links
as dotted lines, but allow variables as an option to minimize line
crossing in complex diagrams. Peirce's same 5 rules extend to erasing
and drawing coreference links in positive or negative contexts.
> Can a 'context' contain more than two graphs? For example, how does the
> notation express p&q&r&s ? If it does it by simply writing p q r s on the
> plain paper, then you don't need the negation prefix ^ at all, since it is
> implicit in the bracketing. That simplifies the notation somewhat.
Yes. In his notation of 1896, his only enclosure was for negations, for
which he used an oval. But in his later work, he introduced colors for
different modalities. In order to avoid colors, I mark the negative
contexts with the ^ symbol, possibility with PSBL, necessity with NECS,
etc. Contexts can also be used to enclose the description of a situation
as in situation semantics; in that case, they are linked to relations
such as PTIM for point in time and LOC for location.
> Finally, I protest at your usage of the word 'context' here. This word is
> rapidly getting covered with more confusion than 'state' or even
> 'ontology'. Even in your brief message it has several meanings. First, it
> means something like the scope of a negation sign, ie a subexpression. But
> you also imply that it has some operational significance, so that a
> 'context' somehow restricts the attention of a search process:
I use the word "context" in a very narrow sense -- I mean it as nothing
more than a notation for packaging a collection of graphs. Its semantics
is similar to the semantics of (QUOTE ...) in LISP. And as in LISP, the
real meaning comes from the relation that you attach to the package.
In the following sentence, you can read "package" for "context":
>> Furthermore, they allow graphs to be imported
>> into a small context where only the information relevant to a proof
>> needs to be considered.
> But this is quite misleading. By the very permissive nature of the
> importation rule, these 'small' contexts can be made as large as we like.
That's true. A package could contain all the world's knowledge or it
could contain just one simple atom.
> There is nothing in the inference system to keep them small. They are
> merely scopes of negation symbols. If we translate Pierce's notation into a
> simpler form by pushing all negations inwards, we get a notation consisting
> of dual clausal form, with all this 'context' structure eliminated. The
> inference search problem is exactly the same as it was before we did this
> translation: every inference we can now do has a corresponding inference we
> could do previously. (In fact, since this translation can be done using
> Pierce's rules, we can regard the process of performing this simplification
> as the utilisation of a certain proof-theoretic normalisation theorem which
> simplifies the search space: do all inferences of type A before any of type
> B.)
As a matter of fact, this dual form is another one of Peirce's alternate
notations -- he called them the "entiative graphs" with the three
primitives being disjunction, negation, and the universal quantifier.
However, he preferred the form with conjunction, negation, and
existential because it gave simpler representations for most of the
kinds of statements he wanted to make.
> Look at Pierce's version of your example ( I omit '^' )
>
> [ [p [r]] [q [s]] [ [p q [r s]] ] ]
>
> Apply double-negation elimination twice and we get
>
> [p[r]] [q[s]] p q [rs]
>
> which is simply the dual of the clausal form
>
> ^p|r, ^q|s, p, q, ^r|^s.
>
> Surely, to apply double-negation elimination wherever possible might seem
> to be a useful heuristic. In general, these 'contexts' just create
> difficulty in the search process.
> The theorem-proving community has developed a useful set of technical ideas
> for analysing the search properties of logical spaces.
The theorem-proving community has tended to focus on one narrow type
of problem: Given a collection of axioms A, prove a proposition p.
But it's also important to consider large systems of knowledge of the
kind that Cyc is trying to build. For such systems, it is important
to analyze the permissible operations for moving information in and
out of various contexts (i.e. packages), reasoning within one of those
packages, and then exporting an answer to another package. In such a
system, a proof procedure that preserves the package structure can
be very helpful.
But in any case, you can mix proof procedures if you like. Since P's
rules allow you to prove resolution as a derived rule of inference, you
could use them in a hybrid system. First use P's rules to move the
information in and out of the packages. Then once you settle on a
particular package, you could apply a resolution theorem prover or any
other kind of theorem prover to that package.
> Talk of the
> 'structure of encapsulated contexts' where 'only the information relevant
> to a proof
> needs to be considered' just creates confusion. If we have some way of
> knowing what information is relevant to a proof, we can use it in any
> normal form we like. And to think that a notation will control exponential
> growth of search spaces by allowing 'large problems to be broken down into
> smaller contexts' is just fantasy.
It often happens that you find information in convenient packages.
I happened to be thinking about object-oriented systems where everything
already comes in neat little bundles. When you're reasoning about such
systems, it is useful to have a proof procedure that allows you to move
information around without disturbing the pre-existing structure. And
in fact, the methods in an O-O system are often very small -- it is
common to have a procedure that does nothing but access the value of
one simple variable and pass it back to the caller.
I agree that the general problem of finding what is relevant may be
very difficult. But if you have a well-designed system that has already
been organized in prepackaged modules, it is useful to preserve that
structure and take advantage of it in the reasoning process.
John