ANSI X3H4 standards meeting
sowa@watson.ibm.com
Message-id: <9108232050.AA19051@venera.isi.edu>
Date: Fri, 23 Aug 91 16:45:47 EDT
From: sowa@watson.ibm.com
To: SRKB@isi.edu, INTERLINGUA@isi.edu, KR-ADVISORY@isi.edu
Cc: SKPEREZ@mcimail.com
Subject: ANSI X3H4 standards meeting
To the SRKB participants:
In my note of July 17, I mentioned the ANSI X3H4 meeting in Seattle
and their work on developing standards. In response to that note,
I received several comments from SRKB participants. In one note,
Matt Ginsberg quoted the position against standards that was taken
at the KR '91 meeting in April:
We believe KR standards to be inappropriate at the present time, and
that although we feel there are interesting research questions in
knowledge sharing, they are just that: research questions. As such,
they should be funded, evaluated and reported as research.
I forwarded those notes to Sandra Perez, the chair of the X3H4.6 Task
Group (which is the subcommittee that I was working with most closely).
She had the following reply:
I appreciate the views of the KR community and will pass along the
notes. However, I see our use of CG not as standardizing KR but
rather as an application of KR to solve a problem. KR techniques,
and specifically CG, seem to address our issues and problems.
At the end of this note is a memo I wrote, which summarizes the work
in the X3H4.6 Task Group from August 5 to 9. I circulated the memo
to several of the committee members, updated it with some of their
comments, and received their acknowledgment that it accurately
represents the discussions and conclusions.
During the discussions, I reported on the SRKB work and the K.I.F.
Most of the X3H4.6 members are in favor of declarative, logic-based
representations. But since the languages in current use are graphic
languages with strong typing, such as entity-relationship diagrams,
NIAM, and others, they wanted something much richer than an untyped
version of predicate calculus with a LISP-like notation. No one
objected to considering KIF as one of the languages to write interfaces
to, but it did not seem suitable for the primary language.
The X3H4.6 group drew a distinction that may also be useful for SRKB.
They distinguished a DEFINING language, which could be a very simple,
primitive version of predicate calculus with a model-theoretic semantics,
and a NORMATIVE language, which would be a much richer language with
many more built-in features, extended quantifiers like EXACTLY-n or
UNIQUE, and strong typing, including multiple inheritance. KIF or
something like it might be suitable as the defining language, but the
normative language would have to be much richer.
Before continuing, I should mention a term that is widely used in the
database community: conceptual schema. It has a long history, dating
>From the early 1970s (see the bibliography at the end of this note).
Following is the definition proposed by X3H4.6:
A conceptual schema is an ontology for the objects and relationships
belonging to a universe of discourse together with a set of necessary
propositions about those objects and relationships.
In terms of predicate calculus, you can think of a conceptual schema as
a set of axioms that define the constraints on a database. The database
itself may be viewed as a collection of ground-level atomic sentences
that are consistent with the schema (the axioms). The ANSI/SPARC three
schema approach also distinguishes the conceptual schema from the
internal schema (how the data elements are actually stored) and the
external schema (how an application program accesses them). These
notions have proved to be quite useful in the database community, and
similar notions may be useful for knowledge bases as well.
The primary task of X3H4.6 is to develop standards for the languages
used to express a conceptual schema. The following diagram was used
to show the relationships:
Stylized Normative Existing
Natural <-----------------> Language <------------------ Schema
Language(s) A ..................> Languages
|
|
V
Defining
Language
The existing languages on the right include things like E-R diagrams,
SQL, Express, NIAM, and the languages used in various vendor systems.
The normative language must be rich enough to include a superset of the
semantics of the existing languages. The defining language, however,
may be more primitive, and some constructs in the normative language
may have to map to complex expressions in the defining language.
The dotted line from the normative language to the existing schema
languages indicates that some constructs in the normative language
might not be mappable into certain schema languages, since many of
the existing languages have limited expressive power.
Extended quantifiers, such as UNIQUE, or their equivalents in other
terminology are very important for database specifications. Most of
the schema languages on the right have some way of expressing them.
But the defining language might only have simple universal and
existential quantifiers. If you want to go from schema language A
to schema language B, it would be a waste of effort to map A's
specification for uniqueness into the primitives of the defining
language and then try to recover B's definition of uniqueness from
the low-level primitives. Instead, it would be much more direct to
use the representation of uniqueness in the normative language as
the intermediate stage between A and B.
For readability, every construct in the normative language should have
an equivalent representation in a stylized natural language, including
versions for English, French, and others. The smooth mapping between
conceptual graphs and natural languages is one reason why the X3H4.6
group is considering them as a basis for the normative language.
As an example, following is a typical sentence that might appear in
a database specification written in English:
Any specific car model is constructed by only one manufacturer.
There are several points that require clarification:
What does the word "specific" mean? Is it needed?
Does "only one" mean "exactly one"?
A manufacturer constructs cars, not models. What is the exact
relationship between a model and a car?
Such vagueness occurs even in fairly well-written specifications, and
it is the main reason for Alan Perlis's remark "You can't map informal
specifications into formal specifications by any formal algorithm!"
Although computerized tools may be helpful, the distinctions needed
to formalize such a statement must be made by some human programmer,
designer, or knowledge engineer. Once those distinctions are made,
a conceptual graph (or other formalism) could be derived. Then the
graph could be mapped automatically into a stylized English sentence
like the following:
For every car-model x, there is exactly one manufacturer y,
who constructs every car z of model x.
This sentence is readable and precise. But it avoids the major research
problems that make unrestricted natural language difficult to process:
the syntax is unambiguous; anaphoric references are made explicit with
variables like x, y, and z; the vocabulary is open-ended, but every word
must be defined in advance; metaphor, metonymy, ellipsis, and other
difficult constructions are not permitted.
Besides conceptual graphs, the X3H4.6 group is also considering NIAM
as a basis for the normative language (see references at the end).
NIAM is a graphic language widely used for database design, especially
for manufacturing databases; it is essentially a superset of
entity-relationship diagrams with support for a type hierarchy with
inheritance. Unlike conceptual graphs, NIAM is not a complete system
of logic. In fact, NIAM has many similarities to KL-ONE, and the
combination of NIAM with conceptual graphs forms a hybrid language,
analogous to the combination of KL-ONE with predicate calculus.
Since every construct in NIAM could be defined in conceptual graphs,
NIAM is, strictly speaking, redundant. Such redundancy, however, is
useful for the same kinds of reasons that KL-ONE is useful: NIAM can
represent certain constraints that are especially important for database
design; consistency checks and other transformations on NIAM diagrams
are much faster than consistency checks on full first-order logic;
and finally, NIAM has a long history of practical use in the database
community, including commercial implementations in production use.
Trade-offs between expressive power and computational efficiency arise
in any discussion of knowledge representation. For a specification
language (which is essentially what a conceptual schema must be), the
full power of first-order logic is essential -- some of the existing
languages, such as SQL, already have the power of first-order logic,
and the normative language cannot offer anything less. But different
operations can be done with different levels of efficiency:
1. For the NIAM subset, consistency checking can be highly efficient.
2. For conceptual graph constraints, full consistency checking is as
difficult as checking any other version of F.O.L. -- i.e. in the
general case, it may be undecidable.
3. But certain kinds of checking can be done efficiently, even for
full F.O.L: evaluating the denotation of a formula in terms of a
model can be done in polynomial time (with an exponent equal to the
number of quantifiers in prenex normal form). That is essentially
what a DB processor does when it answers an SQL query, and special
optimization techniques can reduce the time even further in most
practical cases.
For the conceptual schema, we can guarantee consistency checks as in #1,
and we can guarantee that a given model meets all the constraints, both
in the NIAM and CG forms. In fact, if any such model exists, then that
automatically proves consistency of all the constraints.
Following is the memo on the meeting. References are at the end.
An ANSI technical report with more detail is due in January 1992.
Meanwhile, I would appreciate any comments or suggestions that
anyone might have.
John Sowa
________________________________________________________________________
Towards an IRDS Conceptual Schema
August 21, 1991
John F. Sowa
>From August 5 to 9, 1991, the ANSI X3H4.6 Task Group on the conceptual
schema met in Seattle, in conjunction with a plenary session of the
ANSI X3H4 IRDS (Information Resource Dictionary Systems) committee.
This memo summarizes the discussions in the task group, which were
presented to the full committee on August 9.
Most of the time in Task Group X3H4.6 was spent on technical
presentations and discussions. On Monday, August 5, I gave a tutorial
on conceptual graphs, their relationship to logic and entity-relationship
diagrams, and their use in specifications for databases, knowledge bases,
and conventional software. On Tuesday, Robert Meersman from Tilburg
University gave a a talk about NIAM (Nijssen Information Analysis
Methodology). On Wednesday, Bruce Jorgenson from Boeing gave a talk
about BERM (Boeing Entity Relationship Model). Besides the formal
talks, there were fruitful discussions of problems and issues and some
work on a draft of a technical report to be finished by January 1992.
Then on Thursday afternoon, Sandra Perez, Roger Burkhart, and I
presented the X3H4.6 results to the full X3H4 committee.
The conceptual schema must satisfy a wide variety of requirements:
readability by practitioners; a rigorous foundation that would satisfy
theoreticians; and enough expressive power to support all existing
conceptual schema languages. No single language can meet all these
requirements simultaneously. Instead, the IRDS standards should permit
different languages to be used to express a conceptual schema:
1. Existing schema languages: Many languages have been implemented in
vendor systems, and many others have been proposed in the research
literature. These include variations and extensions to SQL, Express,
Entity-Relationship diagrams, NIAM diagrams, conceptual graphs, and
others. The IRDS conceptual schema languages must support and
coexist with these languages.
2. Stylized natural languages: For readability by programmers and
designers who have not been trained in formal logic, a conceptual
schema should be definable in a stylized version of some natural
language, such as Structured English or Structured French. These
stylized languages should have an unambiguous syntax, and their
semantics should be defined by their mapping to a formal schema
language.
3. Defining language: For theoretical precision, the semantics of
the conceptual schema languages should be defined by their mapping
to a language with a model-theoretic semantics, such as a version
of predicate calculus. The defining language will be used only to
establish the foundations of the other conceptual schema languages,
and no practitioner should ever be required to learn it in order to
read or write a conceptual schema. The work by PDES on the Semantic
Unification Meta Model (SUMM) is quite promising and may lead to a
basis for a defining language.
4. Normative language: Since the defining language must have a limited
number of primitives in order to simplify the semantic foundations,
it is likely to be too low-level a language to provide all the
features needed in a practical conceptual schema language. To
support all those features, the X3H4.6 Task Group will define a
normative language whose semantics will be defined by its mapping
to the defining language, but it will be rich enough to include a
superset of the semantics of all the existing schema languages.
Basic position presented by the X3H4.6 Task Group to the full ANSI X3H4
committee:
1. Of the existing schema languages, NIAM and conceptual graphs
were considered to be the best candidates to express the IRDS
conceptual schema. They are both highly readable graphic languages
with a well-defined theoretical basis, a 15-year history of
research publications, world-wide user communities, and commercial
applications in production use. Furthermore, they complement one
another: NIAM is a superset of Entity-Relationship diagrams that
can specify the type hierarchy, functional dependencies, and certain
kinds of constraints in a compact way, but it cannot represent all
of first-order logic. Conceptual graphs can represent all of logic,
and their basis in natural language semantics can simplify the
mapping to structured English or French. Certain implementations
of NIAM supplement the diagrams with a linear language RIDL*, and
conceptual graphs can be used as a replacement for RIDL*.
2. A new normative language for the IRDS conceptual schema should be
based on conceptual graphs and NIAM. Some work is needed to merge
them in a seamless way and to add any and all features necessary
to support the existing schema languages. Although any feature of
NIAM could be defined in CGs, the two languages have complementary
strengths: NIAM is very concise for representing the type hierarchy
and certain constraints, while CGs are a good intermediate language
for mapping to other forms of logic and to stylized natural language.
3. To avoid endless arguments about the advantages of diamonds, ovals,
decorated boxes, solid lines vs. dotted lines, etc., the normative
language will be defined only by an abstract syntax. Examples will
be given in the current notations for NIAM and conceptual graphs,
but the details of the concrete syntax will not be standardized until
much later, if at all.
4. The stylized NL should have an unambiguous syntax that could express
every feature of the normative language. It could therefore be used
for comments, help facilities, and documentation. Programmers and
designers who do not have a strong background in logic could use it
for both reading and writing specifications that would be translated
into the normative language.
5. Other languages may also be used to define an IRDS conceptual schema,
but for each such language, there must be a pair of mapping grammars
that specify the translations to and from the normative language.
The translation from the normative language to another schema
language need not be complete, since the existing schema languages
typically express only a subset of what the normative language should
express. The IRDS committee will specify the formats for mapping
other schema languages to the normative language.
6. The X3H4.6 Task Group will prepare a technical report by Jan. 1992
that presents this position, the rationale for it, and a discussion
of all the issues, problems, etc.
7. If the January technical report is approved in February, X3H4.6
will work on a draft proposed standard. The first draft should be
finished sometime in mid 1992, but then it will have to be reviewed,
revised, reviewed, revised, etc. Don't expect an official ANSI
standard until mid 1993 at the earliest. 1994 is more likely.
To clarify the terminology, X3H4.6 also proposed some basic definitions:
A conceptual schema is an ontology for the objects and relationships
belonging to a universe of discourse together with a set of necessary
propositions about those objects and relationships.
An ontology is a set of types for classifying everything that exists
or may exist in a universe of discourse.
The object types and propositions belonging to a schema are expressed
in a conceptual schema language.
A proposition is an abstract object that has a truth value as its
denotation. A proposition must be stated in some language. Two
statements in different languages are considered equivalent if and
only if the propositions they express have the same denotation in all
possible circumstances.
A populated schema contains the conceptual schema plus propositions
about specific objects in a universe of discourse.
These definitions define a conceptual schema in terms of abstract
propositions rather than "sentences", as in ISO/TR 9007. This change
allows the IRDS standards to accommodate a variety of systems, which
may use different languages: two schemas in different languages could
still be equivalent even if they contained different sentences. The
denotation of a proposition is a truth value, but the question of
two-valued or many-valued logic is left open. For the first draft
of the proposed standards, only a two-valued logic will be used, but
multiple truth values could be considered in future versions.
At the plenary session of the X3H4 group on Thursday, we presented this
position with some examples, further details, and discussion of how the
various E-R models, vendor systems, etc., would fit into this framework.
The full X3H4 group said that it looked promising, and that X3H4.6 should
continue to work on it and produce a technical report by January.
Robert Meersman was also one of the members of the ISO committee that
produced ISO/TR 9007 on the conceptual schema. He believes that if
X3H4.6 can make good progress in working out the details, there is an
excellent chance of getting ISO as well as ANSI support for the approach
outlined here.
________________________________________________________________________
BIBLIOGRAPHY
Following are some of the principle references for the systems mentioned
here (conceptual schemas, E-R diagrams, NIAM, and conceptual graphs).
Conceptual schema issues:
Tsichritzis, Dennis, & Anthony Klug, eds. (1978) "The ANSI/X3/SPARC
DBMS Framework: Report of the study group on database management
systems," _Information Systems_, vol. 3.
van Griethuysen, J.J., ed. (1985) "Assessment guidelines for
conceptual schema language proposals," ISO Report No. ISO
TC97/SC21/WG5-3/N991, International Organization for Standardization.
ISO/TC 97, "Information processing systems: Concepts and Terminology
for the conceptual schema and the information base," ISO Report No.
ISO/TR 9007, International Organization for Standardization, July 1987.
Entity-relationship diagrams:
Chen, Peter Pin-Shan (1976) "The entity-relationship model -- toward
a unified view of data," _ACM Transactions on Database Systems_,
vol. 1, no. 1, pp. 9-36.
Teorey, Toby J. (1990) _Database Modeling and Design: The
Entity-Relationship Approach_, Morgan Kaufmann Publishers, San Mateo.
NIAM:
Nijssen, G. M. (1976) "A gross architecture for the next generation
database management systems," in G. M. Nijssen, ed., _Modeling in
Database Management Systems_, North-Holland Publishing Co., pp. 1-24.
Verheijen, G. & J. van Bekkum (1982) "NIAM: an information
analysis method," in T.W. Olle, H. Sol, & A.A. Verrijn-Stuart, eds.,
_Proceedings of the first IFIP CRIS Conference_, North-Holland
Publishing Co.
Meersman, R. A. (1988) "Towards models for practical reasoning about
conceptual database design," in R. A. Meersman & A. C. Sernadas, eds.,
_Data and Knowledge (DS-2)_, North Holland, pp. 245-263.
Wintraecken, J.J. (1990) _The NIAM Information Analysis Method:
Theory & Practice_, Kluwer Academic Publishers.
Conceptual graphs:
Sowa, John F. (1976) "Conceptual graphs for a data base interface,"
_IBM J. of Research and Development_, vol. 20, no. 4, pp. 336-357.
Sowa, John F. (1984) _Conceptual Structures: Information Processing
in Mind and Machine_, Addison-Wesley, Reading, MA.
Sowa, John F. (1991) "Towards the expressive power of natural
language," in J. F. Sowa, ed., _Principles of Semantic Networks_,
Morgan Kaufmann Publishers, San Mateo, CA.
_Knowledge Based Systems_, the December 1991 issue will be devoted to
10 papers from the recent Sixth Annual Workshop on Conceptual Graphs.