Skuce on Creating and Sharing Ontologies

Robert Neches <neches@ISI.EDU>
Message-id: <199203031953.AA01955@quark.isi.edu>
To: srkb@ISI.EDU
Subject: Skuce on Creating and Sharing Ontologies 
Cc: doug@csi.uottowa.ca
Reply-To: neches@ISI.EDU
Date: Tue, 03 Mar 92 11:53:08 PST
From: Robert Neches <neches@ISI.EDU>

------- Forwarded Message

Date:    02 Mar 92 13:15:53 -0400 
From:    "doug skuce" <doug@csi.uottawa.ca>
To:      cg@cs.umn.edu, interlingua
Subject: creating and sharing ontologies

How Shall We Discover Very General Ontologies and 
How Shall We Ever Agree On Them?

Doug Skuce, Computer Science, Univ. of Ottawa (doug@csi.uottawa.ca)

This is just a quick note describing my ideas on how to attack the above 
problems. (March 1, 1992)

Knowledge sharing will need agreement, assuming this is possible, on the 
most general categories that ontologies can have, notions like: thing, 
object, entity, property, attribute, event, process, state, situation, 
collection, relation, etc, to name a few favorites. At the moment, everyone 
uses these concepts and terms, but probably a) in very different ways and b) 
they cannot tell anyone else what they mean by them. A well known example 
would be the top levels in the CYC ontology, which I find difficult to 
understand. (My review of CYC for the AI J discusses this at length. See 
also my paper in the 90 Banff Workshop, coauthored with Ira Monarch). I 
believe this problem is very deep and critical to k sharing, yet I have not 
seen much discussion of it. 

I believe that such notions can only be clarified by studying linguistic and 
psychological data, first, say, for English, but ultimately seeking true 
linguistic universals. At the bottom line we are talking word meanings. The 
only AI ontology I know of based on linguistic research is the Penman 
ontology based on Halliday's studies of English. It is also the best 
documented one, with reasonable if minimal descriptions of each of some 
fifty categories. (Contact Ed Hovy at ISI). To contrast, there is no mention 
in the CYC ontology of where they got their ideas from. I would characterize 
such idiosyncratic ontologies as "ad hoc", and feel we should not continue 
to work in this manner. 

Instead, there should be proposals, just like the k sharing proposals, that 
can be debated, iterated, and, hopefully, accepted by some large community. 
They should be based on linguistic data, preferably multilingual, so that, 
say, Japanese speakers won't say "there's no notion of ___ in Japanese!!". I 
have been working on such as a "background" task for a number of years. The 
sources include "AI" ontologies like CYC's and Penman's, and research such 
as:

1. George Miller's Wordnet system. This has 40K English nouns and verbs 
arranged in hierarchies and is on line.
2 Dixon's book, A New Approach to English Grammar, on Semantic Principles. 
Clarendon Press, Oxford, 1991.
3. Rosch's work on psychological categories (approach this through George 
Lakoff's book, Women, Fire, and Dangerous Things.)

By the end of the summer (92) I hope to have an initial proposal with maybe 
25 categories (25 would be a big step!) But the next problem is: how to 
communicate them? Existing attempts to do this in informal, unstructured 
English leave a lot to be desired. At the moment, we have little choice but 
to attempt to describe each in natural language, in my case, English. Now, 
how should these descriptions be worded? (I would not want to call these 
definitions, since they will still be pretty vague) Should examples be used? 
Should any formal notation be used? Should a restricted vocabulary (e.g. in 
LDOCE) be used? How to deal with circularity (i.e. descriptions that are 
mutually dependent)? In other words, we need also to agree on a "KIF" or 
format for describing these very general categories. It will have to use 
natural language, but I would suggest it should not be totally unstructured. 
(Something along the lines of Mel'cuk's ECD for natural language 
dictionaries would probably be useful.) Hence two proposals are needed, one 
for the format, and then the actual set of descriptions of categories 
themselves. I will try to do both, but even having an agreed format would be 
a big step forward. I am using my CODE knowledge management system, a tool 
that greatly facilitates the job.

The key to finding the categories, I believe, is to identify some primitive 
semantic properties or predicates that are necessary to identify the 
essential nature of each category. Probably each has just one distinguishing 
property, e.g. existence for things. I have in mind notions like existence 
(in one or more of its senses), part-whole, grouping into collections, being 
dependent on something else, natural numbers, equality, etc. Mathematics can 
be built up from a few such primitive notions, so possibly more general 
knowledge can also. 

Unfortunately, semanticists have been searching for these holy grails for 
some time and have not turned up much, so it is possible that such an 
enterprise is premature in 1992. 

I would appreciate hearing from anyone who has similar interests. We need 
more cooperation, and less competing ontologies.

Thanks from
Doug Skuce                      tel 613 564 5418
Dept of Computer Science        fax 613 564 9486
University of Ottawa
Ottawa, Ont, Canada, K1N 6B5    doug@csi.uottawa.ca

------- End of Forwarded Message