Re: Standardizing FOL

sowa <sowa@turing.pacss.binghamton.edu>
Reply-To: cg@cs.umn.edu
Date: Wed, 22 Sep 93 09:28:16 EDT
From: sowa <sowa@turing.pacss.binghamton.edu>
Message-id: <9309221328.AA11444@turing.pacss.binghamton.edu>
To: cg@cs.umn.edu, interlingua@isi.edu, sowa@turing.pacss.binghamton.edu
Subject: Re: Standardizing FOL
Cc: phayes@cs.uiuc.edu
Pat,

My summertime reading has strengthened my conviction that a typed or
sorted FOL with metalevels is the way to go.  I found the following
book at a Barnes & Noble (totally uncharacteristic of them), and I
bought it to give them some encouragement:

   T. Gergely and L. Ury, _First-Order Programming Theories_,
   Springer-Verlag, 1991.

A quote from the preface gives the general outlook:

   Many different approaches with different mathematical frameworks
   have been proposed as a basis for programming theory.  They differ
   in the mathematical machinery they use....  These logics, however,
   are very eclectic since they use special entities to reflect a
   special world of programs, and also they are incomparable with one
   another....  This work reflects our journey from the eclectic
   programming logics to the simple classical first-order logic,
   which furthermore turned out to be more powerful than the 
   nonclassical ones.

In the book, they develop logic and model theory for many-sorted FOL,
use it to develop a theory of computability and apply it to program
schemas.  Then in Part II, they consider extended dynamic logics,
and in Part III, they consider temporal logic -- neither of which
is expressible in the other.  Then in Part IV, they develop time
logic -- FOL with time as an explicit sort -- and they embed both
temporal logic and dynamic logic in it.

I bought the next two books at the AAAI book exhibit (there wasn't
much else but books there any more):

   K. Meinke & J. V. Tucker,eds, _Many-Sorted Logic and its Applications_,
   John Wiley & Sons, 1993.

This book contains a variety of papers on theoretical topics
concerning many-sorted logic, proof procedures, etc.  But the first
chapter is a very nice 80-page tutorial that summarizes the standard
results, relates sorted logic to other logics, and discusses the
trade-offs.  In particular, she has a good treatment of second-order
logic in its usual sense and the corresponding sorted version, where
you introduce a special sort for functions of various kinds.

The usual SOL is extremely powerful:  you can, for example, define
finiteness, say that the domain is uncountable, etc.  But that gives
you noncompactness:  For each n, you could have a statement that
there exist n individuals, but at the same time, you could say that
the domain is finite.  It also violates my favorite theorem --
the Loewenheim-Skolem theorem, since you can say that the domain is
uncountable and cannot therefore have a countable model.

The many-sorted mapping lets you say the same things, but the
theory has all the nice properties:  compactness, countable models,
etc.  You can give the definition of finiteness, if you like, but
it doesn't work:  there may exist infinite models for it.

The reaons for these differences is that true SOL lets you quantify
over uncountable numbers of functions -- of which at most a countable
number could ever be defined or even named.  But the many-sorted
version of SOL requires you to specify up front what your sorts are.
And there is no way to specify an uncountable set.

The third book (which I also got at AAAI):

   Kurt Jensen, _Coloured Petri Nets_, Springer-Verlag, 1992.

This book presents an extended version of Petri Nets (CPN) that
has been implemented and used for some practical applications.
The "colors" in CPN are arbitrary specifications that can be
associated with each node and with the tokens that pass between
nodes.  In this book, they use ML, a purely functional programming
language to define the colors.  Since ML is purely functional, it
can be written in KIF or CG notation in the usual way.  Its lack
of global variables means that any ML code written inside a node of
the Petri net will only operate on the tokens passed to it without
any possible side effects.

The global network structure, however, gives you a very powerful
and natural way to specify and visualize interactions of parallel
processes.  They have implemented graphic tools that let you draw
CPNs, simulate them, and compile them into executable code (also 
in ML).  And at the end of the book, they discuss several major
(non-toy) applications in production use.

I like the CPN approach because the nets can be mapped directly
to conceptual graphs with nodes of type EVENT and STATE, connected
by relations of type Successor.  The Gergely-Ury time logic can be
used to axiomatize the interactions, and the CGs can be translated
directly to KIF.  I won't claim that this is the only or even the
best way to do industrial-strength procedural programming in logic,
but it is one way that is both easy-to-use and efficient.  There
very well may be many others that could be developed once KIF and
CGs become a standard.

Some comments on your comments:

> I understand, but the meta-meta hierarchy you outline isnt the same
> one as the higher-order hierarchy (of functions of functions of..) that
> we were talking about.

True.  But I would challenge you to find any real application that
would require full HOL that cannot be done with either the many-sorted
version of HOL or the metalevels of KIF and CGs.

> The context mechanism doesnt go far beyond first-order logic, by the 
> way, in fact it never goes beyond first-order logic. All it amounts to, 
> as far as I can see, is a kind of quotation; but that hardly extends 
> the semantic base at all. (Perhaps I am missing something, however.)

True.  It is purely first order.  But what it does do is let you
partition your knowledge base into modules.  Outside the module, you
can use metalanguage to talk about what's inside, axiomatize the
insides, reason about them, etc.  Furthermore, you can have families
of contexts that are inconsistent with one another, and outside the
modules you can talk about how they are related to one another.
 
> Maybe, but you can't define ANY kind. For example, it doesnt have the 
> syntactic power to enable you to define lambda-conversion. And its not 
> clear that this notion of 'define' always captures the semantics of the 
> logics in question. (Although Im sure that it can be proved to do so 
> in many cases.)

True.  There may be cases where the logics are fully embedded, as in
the treatment of temporal logic and dynamic logic by Gergely & Ury,
and there may be cases where you have statements that "look the same"
but have different semantics, as in the mapping from HOL to a many-sorted
simulation of HOL.  And yes, indeed, you can define lambda conversion.

> Yes, but your scare quotes are quite appropriate here. Its not clear 
> that this notion of 'true' is the one which a genuine semantic theory 
> would provide. Suppose I have a language LP with a model 
> theory defined on it, and suppose we can describe the syntax of LP and its 
> model theory in L1: call this the lp-theory. Now, take a first-order 
> model of the lp-theory: will this be suitably isomorphic to a model 
> of LP? Will every model of LP be suitably isomorphic to a first-order 
> model of lp-theory? Neither of these questions has an obvious answer.

I would say that the answer is intuitively obvious, although verifying
the intuition might take some work.  Let me ask you how you (or anyone
else) defined this language LP and its model theory?  Standard practice
in mathematical textbooks and research papers is to use some natural
language as the metalanguage for defining the theory (or for defining
the language that specifies the theory).  That use of NL is essentially
always a rather simple version of a sorted FOL using NL words and syntax
intermixed with variables and mathematical expressions.  If you can
specify your LP in such a way, you should be able to map it into the
KIF-CG version of FOL with metalevels.

> No, you don't have them. I was imagining a 'context' syntax which would 
> allow higher-order heirarchies to exist but with a limited range. The 
> semantic difficulty with (fullblown) higher-order logic is ....

Yes, but.  The quoting mechanism of KIF and CGs gives you the
limited range you were asking for.  Outside the context box, you
can specify exactly what kinds of expressions and rules of inference
you will permit inside the box.  You can limit it in any way you like.
Writing the specifications that define exactly the limits you want
may not be easy, but it can be done -- "A simple matter of programming,"
as they say in AI.

> What you can make the logic talk about is one thing, but what that talk 
> means is another. Until you provide a model theory, its not clear what 
> these internal translations and descriptions amount to. (Now do you see 
> why I make such a fuss about the importance of semantics? :-)

Yes, of course.  But that is exactly why we have adopted FOL for KIF
and CGs.  We have a very nice, very simple, fully defined model theory.
If you use our notations to define your logic inside one of our contexts,
you are free to use our constructions to create your models, but then
defining the model theory for your logic is your responsibility, not ours.
We give you the lego blocks and let you play with them in our context 
boxes.  But if your construction collapses, our responsibility is limited
to ensuring that your local disaster doesn't affect anybody else's
contexts.  KIF and CGs come with a limited warranty.

> ... any semantics, be it fuzzy or modal or whatever, thinks about 
> a sentence as having some kind of property - truth, for example - which
> inference rules preserve somehow. So sets of sentences which are 
> semantically interesting are always ones which are closed under the 
> application of these rules. But we seem often to be dealing with sets 
> of sentences which don't have this recursive-closure character. So how 
> can they be characterised semantically? (Maybe this is not true of linear 
> logics, and I confess to not yet understanding what the semantic base 
> of linear logic could be.)

All that I am claiming is that if you can define your semantics in
mathematical English, you can translate your English into CGs and
use it to specify the semantics of what is inside a context box.
There are a lot of interesting logics around, such as linear logic,
which we haven't fully assimilated to current practice.  That is a
job for the next century.  All that we are trying to do with KIF and
CGs is to give the stamp of approval to the current state of the art
and use it as a base for industrial-strength applications.

> I largely agree, defining theories is where the real work needs to be done. 
> My worries were inspired by the thought that maybe some of our recurrent
> problems
> in doing this well are the result of not having the proper conceptual 'tools' 
> (not the implementational stuff you are talking about, but the languages that 
> underlie them.)

That is very possibly true.  On the other hand, we can see very large
application areas where the current technology is not going anywhere
near the limits.  Just good old fashioned FOL is a major conceptual step
for most programmers.  As I have been trying to say again and again and
again in all these notes is that we are not trying to develop standards
for research, but standards for industrial-strength applications.

> If building your tools requires standardising my tools, then its too early 
> to be standardising. But lets stop arguing about this: I agree, if we 
> can save some wheel-reinventing, that wont be a bad thing. Lets just bear 
> in mind that research hasnt finished in this area, and not give students 
> the wrong impression.

Pat, we are not stopping you from building any kind of tools you want.
But what we want to do is to develop standards for commercial tools so
that you can buy off-the-shelf parsers, theorem provers, etc., in the
same kinds of stores where you can buy the latest spreadsheets, word
processors, and megadeath games.

As far as giving things to our students, it is a crying shame that
the DB teachers can give their students SQL to play with, the programming
language teachers can give their students all sorts of languages to use,
but there are very few AI things that we can give our students to use
for their homework exercises.  The available expert system shells
are based on the technology of the 1970s, which the experience of
the 1980s showed would not scale up to decent applications.

I'm less worried about giving PhD students "the wrong impression"
than I am about not having decent educational tools for the freshmen.

John