Tools to Enable Knowledge Sharing

sowa@watson.ibm.com
Message-id: <199111272245.AA28919@venera.isi.edu>
Date: Wed, 27 Nov 91 17:39:00 EST
From: sowa@watson.ibm.com
To: GINSBERG@t.stanford.edu, AI.LENAT@mcc.com, AI.GUHA@mcc.com
Cc: INTERLINGUA@isi.edu, KR-ADVISORY@isi.edu, SRKB@isi.edu
Subject: Tools to Enable Knowledge Sharing
The recent notes by Matt Ginsberg and by Lenat and Guha are promising
contributions towards a possible consensus about knowledge sharing.

I agree with Matt that we should not do anything to inhibit ANY lines
of research.  If the term "Tools to Enable Knowledge Sharing" (TEKS)
is more palatable than the word "standards", that is fine with me.
But we have to recognize that there is an enormous push for TEKS
that is coming from many directions.  Boeing, for example, is working
very hard through ANSI, PDES, and ISO (all organizations with the
word "standards" in their names).  In an earlier note, I sent a copy
of Roger Burkhart's impassioned plea for some kind of logic-based TEKS
for manufacturing companies like Deere & Co.  Roger made the point that
if we don't take a leadership role in designing a logically clean system,
we will have a very dirty system like SQL2 thrust upon us.

I agree with Lenat and Guha that the most important prerequisite for
knowledge sharing is a common vocabulary or ontology.  It is in fact
so important that I would not even attempt to standardize it.  The
Cyc ontology is a very big one that should provide many instructive
examples, but I would not want to adopt it or any other proposed
vocabulary (such as the list of Common Lisp functions) as a standard.
Instead, I would only want to provide a common logic-based syntax and
allow people to develop libraries of ontologies for any purpose they
like.  A common syntax would allow ontologies to be written, shared, and
exchanged in packages for different application areas.  In that sense,
the common syntax would be a facilitator or enabler for sharing, but
anyone who bought a package could modify it, extend it, or translate it
into any other representation that might be appropriate.

Matt raised a question about the need for levels of metalanguages,
since he did not see any prototypes or pressing requirements for them.
I'll admit that there aren't very many prototypes in the AI community,
but there are commercial packages with levels of metalanguages in the
database world, especially within the scope of the ANSI IRDS (Information
Resource Dictionary Systems).  As an example, I'll cite IBM's AD/Cycle
(Application Development/Cycle), which is a framework for CASE tools
and products, some of which are in use today and others are being built.
AD/Cycle is probably the largest such framework, but there are others
developed by other vendors, and the purpose of ISO and ANSI IRDS is to
provide some guidance for these Tools to Enable Knowledge Sharing.

In describing the levels of metalanguage in AD/Cycle and related systems
within the scope of IRDS, I'll start from the bottom up:

 1. At the bottom is the real world or some restricted subset of it.
    In logic, it's called the domain of discourse; and in business
    data processing, it is called the Enterprise.

 2. The next level is the database, which may be viewed as a collection
    of ground-level clauses, usually with only non-negated atomic
    predicates.  Some people talk about the database as a theory, but
    because of its simple logical structure, I prefer to consider it
    a model of a first-order theory.  (If you call it a theory, that
    just adds one more level of metalanguage.)

 3. At the next level are the database constraints, such as "Every
    person must have two parents, one male and one female."  These are
    quantified, first-order axioms which constitute a theory of which
    the database is a model.  Many database management systems provide
    automatic checks for at least some of the constraints.  But the
    more complex constraints must often be enforced or checked by
    procedural code.

 4. Next are the database design aids, which have been in existence
    in some form or other for nearly 20 years.  These are expressed in
    some language for writing DB constraints, such as Entity-Relationship
    diagrams, NIAM diagrams, etc.  A number of companies sell tools that
    support these languages, allow the diagrams to be drawn on a screen,
    generate database definitions from the diagrams, and send them to
    the DBMS.

 5. Next is the AD/Cycle Information Model.  This is a metalanguage for
    defining the languages at Level 4 so that systems built by different
    vendors can exchange database definitions.  IBM has several "business
    partners" who have agreed to make their CASE tools conform to the
    AD/Cycle Information Model.  Among them are companies such as
    KnowledgeWare, Bachman, and Index Technology.  They sell tools like
    the KnowledgeWare Application Development Workbench, which generates
    E-R diagrams that obey axioms written in the AD/Cycle metalanguage.
    (There are about 500 axioms written in a kind of first-order logic,
    but whose domain of discourse consists of the nodes and arcs of
    the E-R diagrams at level 4.)

 6. Nothing has ever been implemented at the highest level, but the
    goal of the ANSI IRDS committee is to define a metametalanguage
    that would be able to describe information models at level 5
    and the transformation rules that would enable one vendor's
    metalanguage to be translated to another's.  This is important
    not only for sharing data between vendors, but for sharing data
    between different releases of systems from the same vendor.

 7. Since the IRDS metametalanguage should be general enough to describe
    any system, it should also be general enough to describe itself.
    Such generality would be necessary to enable an IRDS system to be
    migrated from one version to another as the implementations evolve.
    This capability is similar to the ability of a virtual machine to
    emulate itself in order to run two different releases of an operating
    system at the same time while applications are being migrated from
    one to the other.

The first five levels are in commercial use today, and the next two
levels are part of the requirements that the ANSI IRDS must support.
Knowledge sharing among AI systems should require levels of metalanguage
of at least this level of complexity.

Matt expressed doubts about the need for metalanguages because he
does not "believe that machines will be able to make decisions at the
metalevel any time soon -- if I axiomatize my language and you axiomatize
yours, there is simply no way that a machine is going to understand the
connection."

That may be true, but the DB people have found a need for metalanguages
even without any kind of deep reasoning capability.  Theorem-proving
is an NP-complete process, but checking constraints against a model
(such as a conventional DB) can be done in polynomial time.  That is
all they're doing now -- checking constraints.  And that alone is very
important for the existing CASE tools.  Deep reasoning is certainly more
fun from a research point of view, but constraint checking is enough
justification for metalanguages in current applications.

John Sowa