Sharing ontologies and softwre

sowa@watson.ibm.com
Message-id: <9208271740.AA15562@SMI.Stanford.EDU>
Date: Thu, 27 Aug 92 13:19:35 EDT
From: sowa@watson.ibm.com
To: PETRIE@INFORMATIK.UNI-KL.DE, GRUBER@SUMEX-AIM.STANFORD.EDU
Cc: SRKB@ISI.EDU, CG@CS.UMN.EDU
Subject: Sharing ontologies and softwre
Charles and Tom,

I strongly endorse the points that Charles Petrie made about the
issues of sharing ontologies.   And I want to go further in saying
that the problems in sharing ontologies are no different from the
problems of sharing any other kind of software.

As an example, every bank in the world has programs for handling
checking accounts, passbook accounts, mortgages, loans, etc.  But
every bank has an "ontology" embedded in those programs that is
different from every other bank's.  In all the bank mergers that
have been occurring during the past decade, it is noteworthy that
no two banks were ever able to share software that was developed
before the merger.  Even after the merger, their database formats
were invariably the union of the two previous databases, and the
intersections of the formats were almost empty.

A few comments on some of Charles's notes:

>  The examples you address suggest that, as usual, the problem is less
> with the adequacy of the modeling language, than with agreement on the
> domain semantics: even with a smallish niche like bibliographies.  But
> the problem is that no one is offering a bibliography service.

But if there were two such services, you can be sure that they would be
as incompatible as the Dewey Decimal System and the Library of Congress
classification for books.  Recently the Library of Congress scheme seems
to have won out, at least in the U.S., for one very simple reason:  the
price is right -- they classify every book published in the U.S. and
distribute the classification to anyone who asks for it.

>  Folks are just not likely to agree upon an ontology in the abstract.
> They may use an Internet service and Ontolinqua may be a good way to
> formally describe the service so that it can be ported from one
> machine to another rather than just used as a remote server.  There is
> a commitment to an ontology to the extent there is use of the service.

Yes, and the example of the Library of Congress classification scheme
is the only way such things ever get shared in practice.

>  None of this directly addresses the crucial problem of merging
> software, which Bob Neches raised, again. But, it does address this
> problem indirectly.

Fundamental principle:  Independently developed software is NEVER merged.
At best, it coexists with other software on the same machine, provided
that there are strong fire walls between contexts.  If there is a strong
need, people may write conversion programs for the data.  But the
programs themselves are never merged -- they coexist until one or the
other (or both) becomes obsolete.

>  Pick a university that has an expert systems class. Every year, say,
> fifty student expert systems get built. Say, five of them are on
> agricultural management. Over ten years, we get fifty expert systems
> on various aspects farm management, perhaps in the same language.  Is
> there any hope of combining these fifty into a network that comprises
> a super farm management system? Rhetorical question.

>  Ontolinqua will provide a "complete" solution for future systems to
> the extent that person 1 will use it to describe his/her current
> system and person 2 will use that description to guide his/her
> construction of the next system.  Apart from the obstacles of tool
> properties (e.g., adequacy, ease-of-use), why would person 2 do so?
> Not because he/she wants to avoid the bother of reconstructing the
> ontology. Every programmer knows that it's easier to write the program
> from scratch than really understand what someone else has done.
> Besides they can always do a better job.  And the new task is a little
> different.  And...  For that matter, why should person 1 go to the
> trouble?

>  One possible reason is that the system from person 1 provides an
> service S1. Person 2 wants service S2 to use S1, or part of it, as a
> utility.**  Whatever the motivation for offering (Internet or local)
> services, Ontolinqua might be a good way to describe services for
> reuse.  However, such a scenario makes it clear that Ontolinqua alone
> is not sufficient. There must be a way of advertising services and
> making them available as severs.  Then there is a motivation, and a
> way, to share ontologies.

There is some evidence of shared ontologies among the conceptual graph
people.  The 37 relations in Appendix B of my CS book have formed the
core of essentially every implementation of CG systems over the past 8
years.  However, the extensions beyond that are all incompatible with
one another.

> * In the Enterprise Modeling conference in June at Hilton Head, the
> "traditional" enterprise modeling people came to the startling
> conclusion that the reason enterprise models were always "shelfware"
> (much less integrated), was because they were not written to be used!
> One of this subgroup's resolutions was to emphasize that enterprise
> models should be built with some _use_ in mind. Models should be able
> to make inferences from domain knowledge and answer questions. This
> was a radical breakthrough for them.  I suggested that they look at
> expert system technology, which was also a new thought.  On the other
> hand, there was some evidence that users already found IDEF0 too
> complex.  Sigh.

In the IRDS effort and in the CG projects, we have been emphasizing
the need to have natural language generated automatically by the formal
system.  When Tom G. said that the documentation string was an essential
part of the formal definition, I objected that a character string that
could not be interpreted by the formal system could not be considered
part of the formal definition.

Tom and others replied that the commentary in the documentation was
absolutely essential for the human users.  I have no quarrel with that.
But the point I have been making is that the English or other NL comments
should be generated automatically from the formal definition.

The job of translating NL into a formal language is still a major
research issue.  But the task of translating a formal language such
as CGs into readable, if not elegant English prose has been done
many times in quite straightforward ways.  As an example, the DANTE
project at the IBM Rome Scientific Center required several person
years to develop a system for mapping Italian to and from CGs.  But
then they hired one student for one summer who was able to write a
grammar for generating English from the graphs that had the same
coverage as the Italian generator.

One point that has always been true of comments in programs and will
certainly be true of comments in Ontolingua is that the comments are
usually incompatible with the formal statement.  Furthermore, if the
comments happen to be exactly compatible with the implementation of
version 1.0, they will begin to diverge from the implementation with
version 1.0001.  And the divergence will escalate as the number of
users, reviewers, maintainers, etc., increases.

Another example:  Walling Cyre at Virginia Tech has a project that is
translating patent descriptions into conceptual graphs.  The patents
typically have English prose, diagrams, and mathematical equations,
and they are translating all three forms into CGs.  Interesting point:
they discovered that many published patents are inconsistent.  In the
process of doing the translations (by hand), they discovered that the
prose, the diagrams, and the equations were inconsistent with one another.

Another fundamental principle:  The documentation and the implementation
will never be in exact agreement unless the documentation is automatically
generated from the implementation.

> ** Reuse of software is a different problem from that of coordinating
> heterogeneous systems to accomplish a task, a la SHADE and PACT, in
> which the systems need to agree to some extent upon the current state
> of computation. But we need to think about the relationship between
> the problems, especially with respect to design rationales.

Absolutely!  And the experience in large software projects of any kind
is very directly applicable to shared KBs and "ontologies".

John Sowa