The Seed and the Scaffold

Tom Gruber <Gruber@sumex-aim.stanford.edu>
Message-id: <2900651950-212935@KSL-Mac-69>
Date: Mon, 2 Dec 91  00:39:10 PST
From: Tom Gruber <Gruber@sumex-aim.stanford.edu>
To: Doug Lenat <lenat@mcc.com>, "R. V. Guha" <Guha@mcc.com>
Cc: Tracy Schwartz <schwartz@surya.cyc-west.mcc.com>, interlingua@isi.edu,
        kr-advisory@isi.edu, srkb@isi.edu, krd@ai.mit.edu,
        Peter Friedland <friedland@ptolemy.arc.nasa.gov>
Subject: The Seed and the Scaffold
In-reply-to: Msg of Wed, 27 Nov 1991 09:56-0800 from Tracy Schwartz <schwartz@surya.cyc-west.mcc.com>
> If there is sufficient call for it, we'd like to try to find some way to
> share Cyc -- its content and context mechanism, as well as the
> less-important syntax and vocabulary of its language -- with you.
> Think of it either as a seed, or as scaffolding, but in any case we feel
> that something like it (in both breadth and size, which is currently over
> a million axioms) is going to be needed to serve as the semantic glue to
> enable the sort of knowledge sharing we all have in mind.

> Sincerely,

> Doug Lenat  and   R. V. Guha

Fantastic. 

I'd to start that call with a specific wish list for ontologies and
test examples, and a mechanism for evaluating them in the context of
the DARPA Knowledge Sharing Effort.

1.  ONTOLOGIES

An ontology is a coherent set of definitions of terms (in KIF,
relation, function, or object constants) including axioms constraining
their usage and meaning, and English text decribing it for humans.  So
we're talking about the Epistemological Level rendition of a bunch of
units (more on mechanisms later).  Cyc has several of well-thought-out
ontologies in it, although they are not explicitly partitioned as such
(more on microtheories later).  Going for generality first, I would
wish for a sharable form of the basic Cyc ontologies that were outlined
in the book and subsequently refined.  There are two flavors of these
ontologies, knowledge-organization conventions and beliefs about the
World-According-To-Cyc.

  (a) KNOWLEDGE ORGANIZING ONTOLOGIES: collections, second order
relations over slots, etc.  These establish valuable conventions, for
both machine and human efficiency.  For machine efficiency, these
ontologies represent the cliches that have been hacked in the Heuristic
Level inference mechanisms.  The Cyc project has been busy implementing
versions of most special-purpose reasoning hacks in the literature.
Whether or not they have the best implementation of each, they cover
the space (and it's the EL definitions that are sharable anyway).
   They are also useful for human efficiency as a sharable library of
cliches.  Common Lisp programs share a lot of functions and macros that
aren't in the kernel (special forms & hooks to environment).  COND,
CASE, WITH-OPEN-FILE, SORT, etc.  all could be written over and over
again, and be slightly different in every application.  But the de
facto library that comes with Common Lisp makes it easier for humans to
share code, as well as enabling optimizations by implementors.  That's
why it would be useful to be able to incorporate relations like
#%canHaveSlots and #%transfersThro into the public discourse in shared
knowledge bases.

  (b) HIGH-LEVEL CONTENT ONTOLOGIES: InternalMachineThing versus
RepresentedThing, flavors of time, etc.  These will be more
contentious as something a community might want to share, but if we
put the axiomatizations into a sharable format (an ontology), then
they are available for experimentation and public debate.
Isn't it the case that microtheories share a common "base theory"?
This is it.  It would be very instructive to see the axioms that get
lifted out of the base theory.  That might be the first Cyc ontology to
start work on.

2. EXAMPLES
  In order to understand the ontologies, we would need examples of
microtheories, wearing their contexts on their sleeves.  Where
ontologies are sets of term definitions, examples files are sentences
using those terms on instances.  A family of related microtheories, in
sharable format, would provide a valuable testbed for the knowledge
sharing effort.  They would need to be delivered in such a way that one
wouldn't need the entire Cyc KB (the mother of all microtheories) to
understand the microtheories.  The context mechanism that Guhas has
been working on is designed to make this possible.  The context
mechanism serves to encapsulate a microtheory, and the "lifting axioms"
identify how it couples with the other theories in the world.  So, to
make use of the example of a family of microtheories, we would need the
lifting axioms, too.

3. MECHANISM
  The basic strategy of the Knowledge Sharing Effort is 
  (a) provide a declarative formalism for the base representation
     language (KIF), optimized for expressiveness and clean semantics
  (b) capture content theories and wrap existing knowledge bases with
     ontologies
  (c) factor out issues of implementing reasoning systems (KRSS) and
    communication/invocation protocols (KQML).

The mechanism for sharing Cyc's context mechanism is (a), the
interlingua working group, to which Guha already contributes.  The
mechanism for sharing Cyc's content is (b), in the form of ontologies
evaluated and made available via the SRKB working group.  The question
is, how can we get Cyc's ontologies into a "sharable" format?

To make use of Cyc ontologies, we need only put them into the form of
KIF definitions.  I have written a system called Ontolingua which can
help with this task, but the main requirement is that the definitions
consist of KIF sentences associated with terms.  At a minimum, a
definition should include the type (relation/function/object) and
arity, and restrictions on the types of its arguments, if known.
For relations with EL axiomatizations, they should be given as well.

Cyc already has the architectural hooks to translate its KB and MTs
into Epistemological Level CycL sentences.  These should be able to be
transformed into KIF sentences; if not, we learn something.  MCC would
not have to release any CYC code or fragments of CycL, just the KIF
equivalents at the EL.  For a first cut at partitioning the namespace
(so that the Cyc-specific names don't get confused with names from
other ontologies), one can use the Common Lisp package system.  This
would allow us to begin defining the mappings among various names for
subclass-of, instance-of, etc.

If this is acceptable to the Cyc folks, I can volunteer the KSL as a
place for processing and distributing the ontologies.  Once there are
some examples available, then the SRKB group can coordinate evaluations
and experiments.

						Tom Gruber


P.  S.  The original message from MCC was sent to a long list of names.
The addresses above are the same people, with redundancies removed.