Tools to Enable Knowledge Sharing
sowa@watson.ibm.com
Message-id: <199111272245.AA28919@venera.isi.edu>
Date: Wed, 27 Nov 91 17:39:00 EST
From: sowa@watson.ibm.com
To: GINSBERG@t.stanford.edu, AI.LENAT@mcc.com, AI.GUHA@mcc.com
Cc: INTERLINGUA@isi.edu, KR-ADVISORY@isi.edu, SRKB@isi.edu
Subject: Tools to Enable Knowledge Sharing
The recent notes by Matt Ginsberg and by Lenat and Guha are promising
contributions towards a possible consensus about knowledge sharing.
I agree with Matt that we should not do anything to inhibit ANY lines
of research. If the term "Tools to Enable Knowledge Sharing" (TEKS)
is more palatable than the word "standards", that is fine with me.
But we have to recognize that there is an enormous push for TEKS
that is coming from many directions. Boeing, for example, is working
very hard through ANSI, PDES, and ISO (all organizations with the
word "standards" in their names). In an earlier note, I sent a copy
of Roger Burkhart's impassioned plea for some kind of logic-based TEKS
for manufacturing companies like Deere & Co. Roger made the point that
if we don't take a leadership role in designing a logically clean system,
we will have a very dirty system like SQL2 thrust upon us.
I agree with Lenat and Guha that the most important prerequisite for
knowledge sharing is a common vocabulary or ontology. It is in fact
so important that I would not even attempt to standardize it. The
Cyc ontology is a very big one that should provide many instructive
examples, but I would not want to adopt it or any other proposed
vocabulary (such as the list of Common Lisp functions) as a standard.
Instead, I would only want to provide a common logic-based syntax and
allow people to develop libraries of ontologies for any purpose they
like. A common syntax would allow ontologies to be written, shared, and
exchanged in packages for different application areas. In that sense,
the common syntax would be a facilitator or enabler for sharing, but
anyone who bought a package could modify it, extend it, or translate it
into any other representation that might be appropriate.
Matt raised a question about the need for levels of metalanguages,
since he did not see any prototypes or pressing requirements for them.
I'll admit that there aren't very many prototypes in the AI community,
but there are commercial packages with levels of metalanguages in the
database world, especially within the scope of the ANSI IRDS (Information
Resource Dictionary Systems). As an example, I'll cite IBM's AD/Cycle
(Application Development/Cycle), which is a framework for CASE tools
and products, some of which are in use today and others are being built.
AD/Cycle is probably the largest such framework, but there are others
developed by other vendors, and the purpose of ISO and ANSI IRDS is to
provide some guidance for these Tools to Enable Knowledge Sharing.
In describing the levels of metalanguage in AD/Cycle and related systems
within the scope of IRDS, I'll start from the bottom up:
1. At the bottom is the real world or some restricted subset of it.
In logic, it's called the domain of discourse; and in business
data processing, it is called the Enterprise.
2. The next level is the database, which may be viewed as a collection
of ground-level clauses, usually with only non-negated atomic
predicates. Some people talk about the database as a theory, but
because of its simple logical structure, I prefer to consider it
a model of a first-order theory. (If you call it a theory, that
just adds one more level of metalanguage.)
3. At the next level are the database constraints, such as "Every
person must have two parents, one male and one female." These are
quantified, first-order axioms which constitute a theory of which
the database is a model. Many database management systems provide
automatic checks for at least some of the constraints. But the
more complex constraints must often be enforced or checked by
procedural code.
4. Next are the database design aids, which have been in existence
in some form or other for nearly 20 years. These are expressed in
some language for writing DB constraints, such as Entity-Relationship
diagrams, NIAM diagrams, etc. A number of companies sell tools that
support these languages, allow the diagrams to be drawn on a screen,
generate database definitions from the diagrams, and send them to
the DBMS.
5. Next is the AD/Cycle Information Model. This is a metalanguage for
defining the languages at Level 4 so that systems built by different
vendors can exchange database definitions. IBM has several "business
partners" who have agreed to make their CASE tools conform to the
AD/Cycle Information Model. Among them are companies such as
KnowledgeWare, Bachman, and Index Technology. They sell tools like
the KnowledgeWare Application Development Workbench, which generates
E-R diagrams that obey axioms written in the AD/Cycle metalanguage.
(There are about 500 axioms written in a kind of first-order logic,
but whose domain of discourse consists of the nodes and arcs of
the E-R diagrams at level 4.)
6. Nothing has ever been implemented at the highest level, but the
goal of the ANSI IRDS committee is to define a metametalanguage
that would be able to describe information models at level 5
and the transformation rules that would enable one vendor's
metalanguage to be translated to another's. This is important
not only for sharing data between vendors, but for sharing data
between different releases of systems from the same vendor.
7. Since the IRDS metametalanguage should be general enough to describe
any system, it should also be general enough to describe itself.
Such generality would be necessary to enable an IRDS system to be
migrated from one version to another as the implementations evolve.
This capability is similar to the ability of a virtual machine to
emulate itself in order to run two different releases of an operating
system at the same time while applications are being migrated from
one to the other.
The first five levels are in commercial use today, and the next two
levels are part of the requirements that the ANSI IRDS must support.
Knowledge sharing among AI systems should require levels of metalanguage
of at least this level of complexity.
Matt expressed doubts about the need for metalanguages because he
does not "believe that machines will be able to make decisions at the
metalevel any time soon -- if I axiomatize my language and you axiomatize
yours, there is simply no way that a machine is going to understand the
connection."
That may be true, but the DB people have found a need for metalanguages
even without any kind of deep reasoning capability. Theorem-proving
is an NP-complete process, but checking constraints against a model
(such as a conventional DB) can be done in polynomial time. That is
all they're doing now -- checking constraints. And that alone is very
important for the existing CASE tools. Deep reasoning is certainly more
fun from a research point of view, but constraint checking is enough
justification for metalanguages in current applications.
John Sowa