ANSI IRDS Working Paper

sowa@watson.ibm.com

Mail folder: Interlingua Mail
Next message: sowa@watson.ibm.com: "Coalition of logic-based standards efforts"
Previous message: Ramesh Patil: "Re: Multiple ontologies "

Message-id: <9110182056.AA04643@venera.isi.edu>
Date: Fri, 18 Oct 91 16:51:47 EDT
From: sowa@watson.ibm.com
To: SRKB@isi.edu, INTERLINGUA@isi.edu, KR-ADVISORY@isi.edu
Subject: ANSI IRDS Working Paper

Another meeting of the ANSI X3H4 committee on IRDS was held at NIST
(National Institute for Standards and Technology) from October 7 to 11.
I have been participating in the X3H4.6 Task Group on the conceptual schema.

At the end of this note is a working paper produced by X3H4.6, which was
approved by the full X3H4 committee for presentation to the corresponding
ISO IRDS committee in Tokyo in November.  We are now extending this paper
to produce a more detailed ANSI Technical Report by January.

The acronym IRDS stands for Information Resource Dictionary System,
but we are proposing that the D be changed to "definition", since it
becoming a true definition system, rather than just a data dictionary.

The 1988 ANSI standard was based on Entity-Relationship diagrams, which
are too weak to express everything that needs to be expressed for
definitions.  ISO never approved the old ANSI standard, and they have
been proposing a system based on SQL, extended from a query language
to a specification language.  As an example, if a(x), b(x), and c(x)
are three predicates that express possible constraints on x, following
is the SQL equivalent for (Ax)(a(x) -> (b(x) xor c(x))):

     CREATE TABLE A
     ( A1 IRDS_KEY PRIMARY KEY,
       A2 CHAR NOT NULL
          CHECK (A2 IN ('B','C') ),
       UNIQUE (A1, A2),

     CONSTRAINT constraint-name
       CHECK ( ( SELECT COUNT (*) FROM B
                   WHERE A1=B1 )
             = ( SELECT COUNT (*) FROM C
                   WHERE A1=C1 )
               = 1 )
     )

     CREATE TABLE B
     ( BK IRDS_KEY PRIMARY KEY,
       B1 IRDS_KEY NOT NULL,
       B2 CHAR NOT NULL
          CHECK (B2='B'),
          CONSTRAINT constraint-name
       FOREIGN KEY (B1, B2)
         REFERENCES A (A1, A2)
     )

     CREATE TABLE C
     ( CK IRDS_KEY PRIMARY KEY,
       C1 IRDS_KEY NOT NULL,
       C2 CHAR NOT NULL
          CHECK (C2='C'),
          CONSTRAINT constraint-name
       FOREIGN KEY (C1, C2)
         REFERENCES A (A1, A2)
     )

The ISO position is that "Standards should feed on other standards."
Since SQL is an international standard for query languages, it is
the obvious choice for a specification language.  As one can see from
this example (taken from ISO DIS 10728, IRDS Services Interface, p. 9),
it is much easier to understand than predicate calculus.

The ANSI X3H4 committee acknowledges that E-R diagrams are inadequate,
but they have not yet been persuaded that SQL is an improvement.
Instead, they are proposing a logic-based approach using predicate
calculus and conceptual graphs.  Following is the working paper.

John Sowa
________________________________________________________________________

ISO/IEC JTC1/SC21/WG3 N

Source:   USA
Date:     October 18, 1991
Subject:  Working Paper on IRDS Conceptual Schema

IRDS Conceptual Schema
WORKING PAPER

The following persons participated in the development of this report:

Roger Burkhart    Deere & Company
Scott Dickson     Ontek Corporation
John Hanna        Vitro Corporation
Sandra Perez      Concept Technology, Inc.
Tony Sarris       Ontek Corporation
Madhu Singh       Bell Communications Research
John Sowa         IBM Corporation
Cliff Sundberg    Digital Equipment Corporation

Acknowledgment:

Robert Meersman, a contributor to current and earlier work on ISO IRDS
and database standards, contributed to description of the conceptual
schema layers and to interpretation of ISO background documents.

TABLE OF CONTENTS

1.0 Introduction
2.0 Scope
3.0 Definitions and Abbreviations
4.0 Foundations of the Conceptual Schema
5.0 Roles of the Conceptual Schema
6.0 Conceptual Schema Taxonomy
6.1 Layers of Schema Definition
6.2 Types of Model Expressiveness
6.3 The Three Schema Architecture
6.4 Views and View Integration
6.5 Life Cycle Phases
6.6 Model Generality
7.0 Levels of Description
8.0 Conceptual Schema of the IRDS
9.0 Schema Language Framework
10.0 References

1.0 Introduction

2.0 Scope

An IRDS describes and manages an organization's information system
resources and information describing other entities such as
engineering designs, equipment, manufacturing processes, operational
procedures, databases and documents. It serves as a repository for all
the information about these resources that users and computer systems
need to share. Because this information comes from many sources both
within and outside the enterprise, it is important that it be sharable
on as wide a basis as possible. The IRDS supports the integration of an
enterprise's information environment and information models conforming
to national and international standards. The IRDS conceptual schema
specifies the basic concepts, definitions, rules and integration
algorithms that make this integration and sharing possible.

An IRDS serves as a communications path between senders and receivers
of information who may be widely separated in time and place.
Communication cannot take place, however, unless the sender and
receiver share a common understanding, or interpretation, of the data
they transfer. The IRDS puts the meaning of information at the core of
its design, and keeps this interpretation entirely separate from
details of its representation, storage, processing or presentation. The
IRDS separates the conceptual view of information so that it can be
used and applied in many different ways. It can be presented in
multiple external forms, for either users or computer programs, and it
can be stored and processed using any selected technology, including
separate systems such as database managers.

Because it may be used across many different industries and for many
needs throughout the systems life cycle, the IRDS is based on a generic
design that is not constrained by current technology or a short-term
vision of needs. The contents of an IRDS are entirely customizable, and
the range of information systems it describes is unrestricted. This
means that its conceptual schema must be general enough to cover the
needs of any potential user community.

A solution to the need for generality is to base the IRDS conceptual
schema on the basic structures of meaning inherent in any attempt to
communicate. This approach assumes that the structures of meaning
captured by an information system are the same as those used by people
when they communicate. The IRDS need not discover these principles on
its own, but can instead draw from established sources in logic,
linguistics, mathematics, computer science and philosophy. The IRDS
conceptual schema must be based on such foundations to remain neutral
for any selected application.

3.0 Definitions and Abbreviations

Application Schema - A schema in which domain-specific types of objects
and the rules obeyed by those objects are described.

Conceptual Model - The composite of the conceptual schema and the
information base which together describe objects in a universe of
discourse.

Conceptual Schema - An ontology for the objects and relationships
belonging to a universe of discourse along with necessary propositions
about those objects and relationships.

IRDS Definition Schema - A schema in which primitive concepts, object types
and operations, and fundamental lexical and syntactic categories are
defined. Collectively these define the basic modeling capability of an
IRDS.

IRDS Normative Schema - A schema which defines formal modeling
constructs available for defining models and for translating between
models.

Layers of Schema Definition - An approach to describing the IRDS
Conceptual Schema.  This approach divides the IRDS Conceptual Schema
into four layers each having unique roles and characteristics.

Level of Description - The relative position of a conceptual model in
a series of conceptual models that describe each other.

Modeling Schema - A schema that defines a framework or system for
capturing abstract conceptual content of a model.

Ontology - A system for classifying everything that exists in a
universe of discourse.

Proposition - A description of a state of affairs.  Formally, a
proposition is a primitive, abstract entity on which logical operations
may be performed.

Universe of Discourse - Those entities and happenings that have been,
are or ever might be and about which there exists a collection of
represented information having a common understanding.

4.0 Foundations of the Conceptual Schema

The IRDS conceptual schema is founded on the assumption that the
meaning of information can be separated, at least conceptually, from
the specific forms or languages used to represent it. This separation
is a necessary base for managing the conceptual schema as a separate resource,
and for independently mapping the conceptual schema to its external and
internal interfaces.

Talking about the conceptual schema can be somewhat difficult, since it
is separate from language, yet some language must always be used to
conduct the discussion. The first requirement in defining the
conceptual schema is to understand precisely its relation both to
language and to the objects belonging to an application domain. These
relationships were expressed by Ogden and Richards [1] in a diagram
called the meaning triangle:

Figure 1.  The Meaning Triangle.

Each of the points on the triangle indicates a separate component that
may be involved in thought or communication. The Object is any entity
>From some real or imagined world about which an idea is held. The
Concept is the idea or thought of the object as held in the mind of a
person. The Symbol is an auditory, visual, or other form of utterance
which is taken to stand for the object when communicated as part of a
language.  Any one of these components may be present without the
others.  An object does not depend on ideas formed about it, a concept
may be formed about an object which does not exist (such as a unicorn),
and a symbol may be held without knowing what object it stands for
(such as a word from a foreign language).  The relation between symbol
and object is not direct, but is imputed as the combination of the
relation between symbol and concept and the relation between concept
and object.  The conceptual schema is concerned with defining the
concepts that lie at the center of meaning and which are separate both
>From the symbols that express them and the objects they refer to.

The corners of the triangle each represent a single symbol, concept, or
object that may be involved in communication.  Typical application
domains, however, contain a rich assortment of objects that requires a
complex structure of concepts and symbols to describe them. The meaning
triangle can be extended to show these larger collections of elements:

Figure 2.  Collections of Meaning Elements.

The application domain is the collection of all real or imagined
objects of interest. These objects include not only particular
individuals, but changes to these objects and associations between
them. The conceptual schema is the collection of types and generic
rules for objects that may exist in a domain, and the information base
is the collection of concepts for the individual objects that exist in
the domain.  The terminology of conceptual schema and information base
is derived from the ISO technical report TR9007, "Concepts and
Terminology for the Conceptual Schema" [2].

Language is a structure of symbols used to communicate concepts, either
general concepts belonging to the conceptual schema, or concepts of
individuals belonging to the information base.  As a structure of
communicated symbols, language is the only element of the meaning
triangle that can be stored and processed in a computer system. Forms
of representation as computer data are specialized forms of language.
The conceptual schema and information base correspond to ideas held in
somebody's mind; they must be reduced to a particular form of language
before they can be processed by a computer. The role of computer
processing is to manipulate language strings, including translation
>From one form of language to another.  The final interpretation of data
processed by a computer can be performed only by a person to whom they
are presented externally.

5.0 Roles of the Conceptual Schema

A conceptual schema identifies the types of objects that exist in some
domain of interest and the rules these objects must obey. (See the ISO
technical report TR9007 [2], for an extended treatment.) There is not
just one conceptual schema, but many different ones, each defined by
the particular domain of interest to which it applies. The way its
domain is selected defines a variety of roles for the conceptual
schemas managed by an IRDS.

For the IRDS, the domain of interest consists of the information
systems used by an enterprise, along with the structure and behavior of
the enterprise itself and its surrounding environment. In principle,
the IRDS may be used to record any information about these resources
that an enterprise chooses.  The content is entirely open-ended so that
an enterprise can define anything it wants. An initial role of the IRDS
conceptual schema is to define the information stored in the IRDS,
including these customized contents.

In this role, the IRDS conceptual schema is much like the data
definition facilities of a traditional database system. The IRDS can
always be used as a database management system, the contents of which
happen to be descriptions of other information systems. Unlike some
database systems, IRDS requires the ability to dynamically modify the
definition of its content as it evolves, and it is generally much
richer in the structure of its conceptual schema. In some respects,
however, the IRDS can be regarded as a database that contains
information about information resources or anything else of interest.

Because the subject matter of the IRDS consists of information
resources, much of its content is subject to potential standardization.
Most enterprises rely on standard techniques to model the enterprise
and specify its information and other systems. There is an assortment
of standard models for documenting the requirements of a system and for
defining its logical and physical design. An additional role of the
IRDS conceptual schema is to support the definition of standardized
contents.

Because potentially standard contents can come from many different
sources, the IRDS must provide a generic framework in which any portion
of an information resource description can be standardized by a
responsible organization. Contents which are so general as to apply to
any information system, or which are widely used but not represented by
a formal standards group, are likely to be standardized as an inherent
part of IRDS. An important role of the IRDS conceptual schema
architecture is to supply an overall framework in which particular
standards can be positioned and related to each other.

Certain aspects of an information system are so basic that there is no
escaping them even in a minimal description. One of these is its
conceptual schema, along with the implementation of the conceptual
schema in internal and external interfaces. The techniques the IRDS
supplies for defining its own conceptual schema can be used just as
well to specify the conceptual schema of another system. Using the IRDS
to detail the conceptual schema of an information system, along with
its external and internal interfaces, yields several major advantages.

One compelling advantage from describing conceptual schemas of the
enterprise is the ability to integrate partial views of the
enterprise.  Each information system deals with only a portion of the
enterprise, which typically overlaps portions of the enterprise covered
by other systems.  Each such partial view may adopt its own conventions
or rules for the objects under discussion, and may represent these
rules under a variety of formalisms or languages.  Because the IRDS
breaks the conceptual schema down to its basic elements of meaning, the
IRDS provides the basis for deciphering what is really meant by all
these partial views and for specifying how they relate to each other.

Providing the ability to map views to each other and to automatically
translate between them is a major goal that drives reduction of the
IRDS conceptual schema to its fundamental level. The IRDS conceptual
schema defines a canonical form in which to capture the meaning of all
enterprise views, which can then be expressed in a variety of forms.
Once captured in this way, the conceptual schema becomes a major
resource for the enterprise in its own right. It has many potential
uses not only for supporting information systems, but for
understanding, controlling, and managing the enterprise.

A more technical use of the conceptual schema is to specify the
function of an information system separately from its implementation.
By rigorously separating the "what" of an information system from its
"how," the conceptual schema enables the selection of any technology
that can best perform the job. If the description of the technology is
complete enough, the IRDS can even be used to locate data at run time
and to retrieve or process data using its own standard interfaces.
Carried to this extent, the IRDS becomes a virtual database manager for
all the enterprise information.

6.0 Taxonomy of Conceptual Schemas

The IRDS conceptual schema taxonomy is a system for classifying the
variety of conceptual schemas managed by the IRDS. This taxonomy
establishes several independent dimensions for classifying the IRDS
contents. Following are these basic dimensions:

1. Layers of Schema Definition

2. Types of Model Expressiveness

3. Three Schema Architecture

4. Views and View Integration

5. Life Cycle Phases

6. Model Generality

These dimensions are each independent systems of classification for
possible contents of the IRDS conceptual schema. Because they are
independent, they may be combined with each other to define many
fine-grained subdivisions of the total content. The most fundamental
dimensions are listed first, followed by dimensions useful in the
specific context of an IRDS. The following sections present each of
these dimensions.

6.1 Layers of Schema Definition

Layer 1 - IRDS Definition Schema

The IRDS Definition Schema defines the primitive concepts, object types and
operations, and fundamental lexical and syntactical categories that
define the basic modeling capability of an IRDS. It contains the
primitives used to define the IRDS Normative Schema layer.  All IRDS
modeling constructs are ultimately defined in terms of the theoretical
constructs defined by this layer. These constructs are taken from
philosophy, logic, linguistics, mathematics, computer science and other
disciplines.

This layer supplies a theoretical foundation for the normative schema;
it has no direct operational role within the IRDS. It captures the
basic structures of meaning that implicitly lie behind any attempt to
communicate by either natural or formal languages. It is expected that
there are multiple sets of primitives that can capture these
structures, with each set being internally complete and consistent, but
equivalent to the other sets. Equivalence of sets means that the
primitives of one set can be defined using the primitives of the other,
and vice versa. The IRDS standard will note these equivalent
formulations, but will select and use one of them as the basis for
defining the constructs of the IRDS Normative Schema layer.

The ultimate definition of the modeling primitives in this layer can
only be expressed informally, using as precise a form of language as
possible. All meaning captured by the IRDS is ultimately reducible to
these constructs, but their primitive level would likely make such a
reduction burdensome and inconvenient. No construct is included in this
layer if it can be defined as a combination of other constructs; this
is what it means for a construct to be primitive. To support the
definition of the IRDS Normative Schema and subsequent layers, the
primitives must include an ability to define further constructs that
are abbreviations or macros for combinations of these primitives. A
language is supplied for specifying these macro constructs. This
language, called the Defining Language, is used to define the
constructs of the subsequent normative schema layer.

Example contents: Object, Type, Type-Instance Relation, Proposition,
Event, Symbol, Lambda Abstraction

Layer 2 - IRDS Normative Schema

The IRDS Normative Schema supplies the complete set of formal modeling
constructs available for defining models used with the IRDS. This model
supplies a common interpretation, or semantics, for all modeling
languages used with the IRDS. It supports the unification of models
expressed in different languages, and may be used to translate between
them. The constructs include all those of the IRDS Definition Schema
layer plus additional ones which are not primitive but are defined for
convenience in unifying a wide variety of common models.  The
constructs of the IRDS Normative Schema are fixed as part of any given
version of the IRDS standard.

The constructs belonging to this layer are fully specified as part of
the IRDS standard. There is no specific rule to decide which constructs
must be included in this layer, but at a minimum any construct included
in two or more modeling frameworks is a candidate. The main role of
this layer is to make unification between modeling frameworks more
convenient than unification directly at the layer of primitives would
be. No essential modeling capability can be lost from an incomplete
selection, since all the primitives are automatically included.

Constructs belonging to this layer are defined either directly in terms
of primitives from the IRDS Definition Schema layer, or in a language
that can express all the contents of this layer. The IRDS directly
supports such a language, called the IRDS Normative Language, to permit
the input and mapping definitions for the modeling schemas in layer 3.

Example contents: Attribute, Relationship, Subtype, Aggregation,
Process, Trigger

Layer 3 - Modeling Schemas

Each Modeling Schema defines a framework or system for capturing the
conceptual content of a model. Each such modeling system is a set of
constructs and their definitions that express all or some of the
modeling capability defined by the IRDS Definition Schema layer. Each
such definition can also refer to the predefined constructs supplied by
the IRDS Normative Schema layer.  These models are called modeling
schemas because their domain is concerned with process of modeling
itself, separate from details of any particular domain being
modeled.

Many modeling approaches will be needed to cover the variety of IRDS
application domains. While these modeling approaches may all be
equivalent in some fundamental sense, particular ones may be
considerably more convenient or familiar in specific modeling domains.
Once defined and registered as part of the IRDS, these modeling
frameworks may be used to specify information systems that can still be
integrated with systems defined under other frameworks.

Modeling schemas are defined using a specification language provided by
layer 2. Modeling schemas may be standardized by the user of the IRDS, the
supplier of the IRDS, and by any national or international
standardization group. The modeling schema defined as part of the IRDS
standard will be those that represent widely used approaches but lack a
clearly identifiable standards group to take responsibility for its
definition.

Example contents: E-R Schema, Data Flow Schema, SQL Schema, ANS138,
Programming Language Schema, ISO IRDS, Object Oriented Schema

Layer 4 - Application Schemas

Application Schemas define the types of objects that exist in some
chosen domain, plus the rules that those objects must obey. These
objects refer to both tangible kinds of objects such as airplanes or
drawings, and intangible kinds of objects such as plans or allocations.
Each such schema may be communicated to or from the IRDS by one or more
schema languages, but the IRDS Normative Schema defines an abstract
conceptual meaning for the schema that is language-independent. The
domain over which the schema applies may be specified as part of a
larger domain on which other schemas are defined, and the schema may be
subdivided into subschemas that apply to selected portions of its
domain.

Application domains for real-world problems require extensive type
structures to capture the complexity of their object structure and
behavior. Types are built out of elementary types, which classify
single objects, associations between objects, or changes to objects.
Types asserted about objects define propositions, which are the basic
units of meaning in an information system. Complex propositions can be
built by combining simpler propositions. Types may be specified not
only for the static state of some application world, but for changes
which occur in that world and for processes that define a sequence of
changes.

Example contents: Enterprise schemas, Parts of enterprise schemas

6.2 Types of Model Expressiveness

Static Models vs. Dynamic Models vs. Higher-Order Models

This dimension classifies models according to the expressiveness of
their constructs. Rather than being a classification of models, it is
really a classification of theoretical constructs in the IRDS
Definition Schema layer. Complete models, however, can be built using a
subset of the available constructs. Such models utilize only a selected
portion of the concepts expressible in language, but these concepts may
be adequate for many purposes. The various types of expressiveness can
be used to simplify presentation of concepts.

One type of expressiveness is defined by static objects, which either
do not change or whose change is not described formally by the system.
Objects which do not change include mathematical entities such as
numbers, or the state of some object as recorded at some point in time.
The operations defined on static objects are to assert or test the
truth of propositions recorded about them. The static level is captured
in the data storage structures of an information system.

An additional type of expressiveness is reached when the concept of
change is included.  Changes to objects can be classified under event
types, and event types linked in cause and effect relations. Modeling
the dynamic aspects of a domain specifies the processes which can occur
in them. The dynamic level is captured by operations that simulate or
control the behavior of objects in a domain.

Subsequent types of expressiveness are defined when propositions at
either the static or dynamic level are defined as objects about which
further propositions can be asserted. These include propositions that
modify other propositions, such as stating a level of evidence or mode
of belief, and reasoning about change and time. This level of
complexity is captured in knowledge representation systems and by
systems of modal and temporal logic.

6.3 The Three Schema Architecture

External vs. Conceptual vs. Internal

The conceptual schema has been widely discussed as part of a
three-schema architecture for data management, as derived from the work
of ANSI/SPARC [3][4]. In this architecture, the conceptual schema is at
the middle of a three-level structure, between the external schema,
directed to users of the database, and the internal schema, directed to
internal storage systems of the database.

The IRDS conceptual schema fully supports the three-schema view, and
extends its scope beyond data management to include events and
processes. Any portion of the conceptual schema, including all the
parts defined by the conceptual schema architecture, can be presented
in both external and internal forms. Multiple external or internal
schemas may be defined, so that the same conceptual schema can be
presented in multiple alternative forms either to external users or to
the internal machinery of a computer.

Both the external and internal schemas, in fact, are mappings of the
conceptual schema to a linguistic representation. The basic difference
between them is that the external schema is defined for communication
with an external user, and the internal schema is defined for direct
execution or storage on a machine. Both kinds of presentation may take
many alternative forms, from strings of text to graphical displays to
messages on a network or inside a computer. Both may include events in
addition to static information; a keystroke or mouse click in a user
interface is as as much a linguistic event as a spoken word, and a
process of transforming data in a computer system can simulate events
occurring in an application domain.

The external and internal schemas are not properly a part of the
conceptual schema they communicate, but they are closely related to it.
Given any conceptual schema, the application domain can be expanded to
include not only the objects in the application domain, but the
external or internal forms used to communicate about these objects. The
forms of representation can also be selected as special domains for
detailed description.  A new conceptual schema can be established to
cover the structures of symbols that communicate the original domain.
The meaning triangle discussed earlier can be used to illustrate the
relationship of this new conceptual schema to the original one:

Figure 3.  Conceptual Schema for a Representation.

In the IRDS conceptual schema architecture, the layers of abstraction
are defined by their subject matter, as established by the contents of
the application domain at each layer. The schemas for languages used to
communicate about these domains define a separate subdivision of the
conceptual schema content. By convention, the domain at each
application layer is considered to include the forms of representation
used to communicate about it. Each layer is defined by a core
conceptual schema stripped of all representation issues, plus
additional specialized schemas for each form of external or internal
representation. Following is an illustration of this structure:

Figure 4. External and Internal Subschemas For A Conceptual Schema.

Languages for communicating information are important potential
candidates for standardization. For any such language, both the
information it communicates and the symbols it employs to communicate
the information must be specified. The information communicated defines
the semantics of the language, and the structure of symbols defines the
grammar or syntax of the language.  The semantics can be based on the
core structures of meaning defined by the IRDS conceptual schema, but
each language must specify its own conventions for the mappings between
symbols and the meanings they carry.

Even though the external and internal schemas always include the
definition of a linguistic or representational form, this does not
exhaust their content. Each schema can include further information
about the uses of the interface or what lies behind them. The external
schema, for example, can include information about users of a
particular interface, their skill level, or expectations.  The internal
interface can include extensive information about performance of
machines in manipulating its representations, or any other
implementation-related information. Compiling a specification into a
directly executable form is a special case of mapping from a conceptual
to internal schema, and embeds many assumptions beyond merely
representational ones.  The external interface should be understood as
including all aspects of how an information system relates to its
external environment, and the internal interface should be understood
as encompassing all aspects of the technology base on which it is
implemented.

6.4 Views and View Integration

The conceptual schema for an entire enterprise is large and complex.
Given its scale, many persons must contribute to its development over
many years. As the enterprise changes, its conceptual schema must
evolve to reflect new rules and structures for operating. There is no
global perspective that can capture the many kinds of activities and
components the enterprise includes. To be practical, the conceptual
schema must support the definition of partial views that can be defined
independently of each other, and yet be combined later when their
overlap or relationship to each other is discovered or resolved.

A view is part of a conceptual schema, whose relationship to the rest
of the schema is either known or still unspecified. Like any part of a
conceptual schema, a view can also be presented in many different
linguistic forms, but this mapping is classified under the three-schema
distinction and should not be regarded as defining a separate view. As
defined here, a view is simply a subschema belonging to some larger
schema.

A view can be related to the rest of the conceptual schema in many
different ways. It can select a different but equivalent set of defined
modeling constructs in which to express its information. The modeling
constructs can be used to express its contents as being wholly
dependent on the contents of another view. Separate views that contain
the same information can be mapped to a third view to explain their
precise relationship to each other.

Integration of separate views is major role for the IRDS. The mappings
between views can be chosen as a specialized domain, and detailed
structures of information built to describe the mappings. All
integration between views is specified by relating them to a common set
of underlying concepts. The meaning of these concepts is ultimately
defined by the core set of concepts from the IRDS Definition Schema
Layer.

6.5 Life Cycle Phases

The development of the total enterprise system is an important process
that needs to be managed over time. The development of an information
system can be split into distinct life cycle phases according to the
amount and kind of information a specification contains and the tasks
that must be performed on it. This spectrum extends from front-end
analysis and design through all stages of physical design, operation,
and support for previously deployed versions. The types of system
specifications and the transformations that take them from stage to
stage can be defined as part of an application model for the systems
development process. Many of these transformations require the addition
of information by the systems developer, but some of them can be
partially or wholly automated if the description is complete enough.
The application model can retain a complete forward and backward trace
of the steps by which the specification was generated.

The life cycle needs do not end once the initial development of a
system has ended. Systems evolve in response to changes in requirements
or the most effective solution. The facilities of the IRDS must be
complete enough to handle the version and configuration control needs
of the systems development process. Evolution of a system from one
version to another can be modeled by defining each version as a view
that derives in part from the preceding version. The facilities of the
IRDS conceptual schema are complete enough for an entire existing
information system to be modeled as part of a new system. This provides
a powerful capability for subsuming old systems or for emulating their
functions and interfaces.

6.6 Model Generality

Generic vs. Industry vs. User

Conceptual schemas differ in the generality of the domain over which
they apply. Enterprises are not all different; most enterprises contain
processes and substructures similar to those of other enterprises. As a
practical matter, there is no reason every enterprise should repeat the
specification of portions of its business that match those of other
businesses. Additionally, an enterprise does not exist in isolation.
Many of the processes it conducts include interaction with other
enterprises. To communicate between enterprises, they must share a
portion of their conceptual schemas. The generality of a domain is
measured by the number of times it occurs. A general domain is
encountered many times, either within an enterprise or across many
different enterprises. By sharing conceptual schemas, the cost of
formalizing the conceptual schema for a general domain can be invested
only once but recouped many times. Additionally, the users of a shared
conceptual schema will have already established the basis for their
communication.

The formal construction of the conceptual schema clarifies the precise
meaning of the information that users or enterprises communicate. In
addition, the ability to build and integrate conceptual views provides
a high degree of flexibility in adopting a general schema as a core
definition but extending and customizing it with local definitions.
Because it decouples specifications from the entanglements of language,
technology, and interfaces, a conceptual schema based approach is the
most promising foundation for widespread software reuse.

General models result not merely from omitting details that differ, but
>From analysis that identifies fundamental building blocks that can be
assembled in many different ways. In-depth analysis can result in a
wide assortment of basic concepts that virtually everybody shares in
common. The structures of meaning at the core of the IRDS conceptual
schema are one such example, but other universally shared concepts
include those that describe the physical world and the basic ways that
people organize and plan their activity. An IRDS should not be regarded
as an empty container into which an enterprise must pour everything
that fills it. The initial population of standardized concepts may
become an important measure for an IRDS system.

These standardized concepts can flow from many sources. The concepts
that apply across any enterprise, or are used to communicate between
enterprises, would likely be established by a formal standardization
process. The concepts needed for specific application domains could be
established by industry groups dealing with that domain (e.g.
automotive, aerospace/defense, consumer electronics, etc.). Users can
also represent concepts particular to their enterprise. In specialized
domains for which public standardization is not possible, an enterprise
could standardize its own concepts for use within the enterprise or
with selected partners. A conceptual schema based approach provides a
complete and rigorous foundation for expanding such efforts, and an
IRDS can assist in managing and integrating the models they produce.

7.0 Levels of Description

A conceptual schema supplies basic definitions for the content of an
IRDS, but an IRDS holds more than just conceptual schemas.  A
conceptual schema specifies the fundamental types for classifying
objects in some domain, along with rules that define proper usage of
those types.  To describe the actual contents of a domain, the
conceptual schema must be supplemented by a collection of instances it
applies to.  The ISO technical report TR9007 [2] refers to the
collection of instances as the information base for a conceptual
schema.  In this report, the combination of a conceptual schema and its
associated information base will be defined as the conceptual model for
a domain.

In the ISO technical report TR9007, both the conceptual schema and the
information base are related formally to a universe of discourse
containing the objects they describe.  The universe of discourse is the
subject matter for the entire conceptual model.  This subject-matter
relation is entirely distinct from any internal structure within the
conceptual model.  For example, the internal structure of a conceptual
schema, including all the dimensions of the conceptual schema taxonomy,
is contained within the model, as is the population of objects
identified by the model and related to the schema through the
type-instance relation of the IRDS Definition Schema.  The
subject-matter relation positions the entire conceptual model in its
containing environment, by specifying the universe of objects that it
describes.

The information systems belonging to an enterprise comprise a
conceptual model, which has as its subject matter the people,
processes, entities, or relationships that belong to the enterprise.
For an IRDS, the principal subject matter is ordinarily the information
systems constructed by an enterprise, but may also include the
enterprise itself as necessary to explain or manage its information
systems.  The IRDS is also an information system that may be described
by the same or different IRDS.

The use of information systems to describe other information systems is
a special case of the recursiveness with which the domain of a
conceptual model may be defined.  A new conceptual model is defined
whenever a new collection of objects is identified to serve as its
subject matter.  Once defined, the conceptual model itself, along with
each of its elements, can be identified as objects belonging to a new
universe of discourse.  A new level of conceptual model can be
constructed that has the previous conceptual model as its universe of
discourse.

The recursiveness of domain definition means that any fixed
prescription of levels of IRDS description is overly restrictive.  An
IRDS can be used to describe any information system, regardless of what
that information system describes.  Though normally used to describe
the information systems of an enterprise, an IRDS can also be used to
describe the enterprise itself, or another IRDS.  The description of
one IRDS by another, any number of times, can be useful to deal with
heterogeneous or distributed systems.  To provide upward compatibility,
a newer IRDS could be used to describe an older installed version.  An
IRDS can also be used to describe itself.

A level of description is defined by the pair of a conceptual model and
the domain it describes.  While the number of levels is not fixed, some
basic levels can be defined by their relation to the enterprise and to
the definition of the IRDS itself.  These levels can show the most
likely positioning of an IRDS in an overall information system
architecture.  Working up from the basic level of the enterprise, four
distinct levels can be identified:

  4. IRDS Definition
  3. IRDS                       Example Contents
  2. Information Systems            ...
  1. Enterprise

The Enterprise level consists of the actual people, processes,
entities, and relationships that information systems are ultimately
constructed to describe or control.  The Information Systems level
consist of the databases and processing systems that hold encoded
descriptions of the enterprise objects.  The IRDS level describes these
systems and may establish partial or total control over them.  The IRDS
level can also hold information about the enterprise itself.  The IRDS
level may be split as many times as necessary for one IRDS to describe
another.  The top level, IRDS Definition, defines the structure and
capabilities of an IRDS.  If an IRDS contains a description of itself,
this level consists of the subset of the IRDS that is used to define
itself.  The definition of an IRDS in a standard is another form of
description at this level.

Each IRDS level in this structure can be a complete conceptual model,
containing both the types of a conceptual schema and the instances of a
corresponding information base.  The inclusion of both types and
instances within an upper level is a departure from earlier approaches
to defining information systems levels, such as those of the ISO IRDS
Framework [5] and the draft ISO Reference Model of Data Management
[6].  These define an upper level as containing only the types or
schema for the lower level of a level pair.

The departure from earlier approaches is due to the expansion of IRDS
scope from an Information Resource Dictionary System to a system for
building a comprehensive description of information resources and their
surrounding enterprise environment.  An information systems dictionary
limits the role of an upper level to defining types for a lower level,
but an expanded IRDS supports open-ended description of lower levels by
upper levels, including information about particular instances.
Metadata about both types and instances may be distributed throughout
upper levels.  For example, the directory component of an IRDS
specifies locations or addresses for particular instances belonging to
other systems.  Support for three-schema views might require the IRDS
to record the particular format in which a particular user wants to see
some piece of information.  On the implementation side, a database
might be recorded as residing on a particular host machine having a
particular network address.  An IRDS could also hold a history of
changes to information contained in a base system.  While there is a
natural progression in limiting upper levels to have less information
than lower levels, an expanded IRDS does not enforce this limitation,
and does not exclude instance information from upper levels.  Each
level provides a complete conceptual model of the lower level, and the
description of one level by another can be repeated as many times as
necessary.

8.0 Conceptual Schema of the IRDS

An IRDS can be one of the information systems that belongs to an
enterprise. All capabilities the IRDS provides for describing other
information systems can equally well be used to describe itself. The
IRDS conceptual schema and its external and internal interfaces can be
defined using the same facilities as provided for direct use by the
enterprise. The self application of IRDS facilities is a major
simplification in the architecture of the IRDS: it means that one
generic set of facilities can be furnished to satisfy all needs, and
that IRDS facilities are tested and exercised as an inherent part of
IRDS development.

The core part of the IRDS conceptual schema describes the structure of
schemas managed and administered by the IRDS, including the
composition, organization, behavior, construction, evolution and
presentation of the elements that make up these descriptions. Generic
portions of the IRDS conceptual schema describe means for interpreting
the contents of modeling schemas, application schemas, and even
application instances in a unified manner across all modeled enterprise
domains.  Unified interpretation views include the directory,
dictionary, thesaurus, and encyclopedia. The IRDS conceptual schema
describes how to derive these views from the framework of IRDS models,
including those that represent the business, technology, system, and
implementation aspects of an enterprise and its products.

Other parts of the complete IRDS conceptual schema describe the
external and internal schemas that define the IRDS services and
technology interfaces. External schemas of the IRDS describe
communication interfaces to application programs, other IRDS instances,
or human users. Internal schemas describe the interfaces to storage and
processing systems within the implementation domain of an IRDS. The
external and internal interfaces support a variety of abstraction
levels and modes of interaction.

Multiple IRDS instances may be needed by an enterprise to support
decentralized development and control of conceptual schemas. Each
instance provides conceptual integrity for a designated enterprise
domain. IRDS domains would be selected to provide appropriate islands
of independent development while facilitating the enterprise-wide
integration of understanding between domains. Enterprise policy
concerning the balance between top-down control and bottom-up learning
could also be implemented through the IRDS. One IRDS could be used to
integrate or control the contents of another through their external
services interfaces.

The services interface component of the IRDS encompasses both
traditional import-export mechanisms and a new class of IRDS management
services. These services include, but are not limited to object version
and configuration management, work flow and life cycle control
mechanisms, security policy enforcement, generalized inquiry
facilities, and directory management services. The conceptual schema of
IRDS provides the definition capability to support enforcement of IRDS
management service policies.

The simplest IRDS services interface functionality is data
import-export through file transfer. Enhanced functionality supports
shared access to information resources through a query/ response
dialog. More complex is the interaction between multiple IRDS instances
to identify a common means for communicating about their domains. And
finally, the most complex interaction is between an IRDS and human user
for learning about existing models, or for adding new descriptions of
enterprise knowledge and experience.

9.0 Schema Language Framework

The IRDS conceptual schema must satisfy a wide variety of requirements:
readability by practitioners; a rigorous foundation that would satisfy
theoreticians; and enough expressive power to support all existing
conceptual schema languages. No single language can meet all these
requirements simultaneously. Instead, the IRDS standard permits
different languages to be used to express a conceptual schema:

1. Existing schema languages: Many languages have been implemented in
vendor systems, and many others have been proposed in the research
literature.  These include the ISO IRDS SQL Data Model, Express,
Entity-Relationship diagrams, NIAM diagrams, many vendor languages, and
research prototypes. The IRDS conceptual schema languages must support
and coexist with these languages.

2. Defining language: For theoretical precision, the semantics of the
conceptual schema languages are defined by their mapping to a language
with a model-theoretic semantics, such as a version of predicate
calculus. The defining language is used only to establish the
foundations of the other conceptual schema languages, and no
practitioner is ever required to learn it in order to read or write a
conceptual schema.

3. Normative languages: Since the defining language must have a limited
number of primitives in order to simplify the semantic foundations, it
is likely to be too low-level a language to provide all the features
needed in a practical conceptual schema language. To support all those
features, normative languages are defined. The semantics of a normative
language is defined by its mapping to the IRDS Normative Schema.  It
will be rich enough to include a superset of the semantics of all
existing schema languages.

Of the existing schema languages, Conceptual Graphs [7][8] were
selected as a basis for developing the initial IRDS Normative
Language.  Conceptual graphs are a highly readable graphic language
with a well-defined theoretical basis, a 15-year history of research
publications, world-wide user community, and commercial applications in
production use.  Because conceptual graphs have a direct mapping to
predicate calculus, and since the defining language is a version of
predicate calculus, a normative form of predicate calculus is defined
as an additional normative language.  Other normative languages can be
defined provided they provide a complete mapping to the semantics
defined by the IRDS Normative Schema.  For readability by programmers
and designers who have not been trained in formal logic, a conceptual
schema is definable in a stylized version of some natural language,
such as Structured English or Structured French.  These stylized
languages would have an unambiguous syntax, and their semantics would
be defined by a formal mapping to the IRDS Normative Schema.

Figure 5. Conceptual Schema Languages

The schema languages on the right of the diagram have been used to
express conceptual schemas for various systems. The normative language
must be rich enough to include a superset of the semantics of all of
them. The defining language, however, would be a more primitive version
of predicate calculus, and some constructs in the normative language
may have to map to complex expressions in the defining language.

The solid arrow from the schema languages on the right indicates that
all of their semantics can be fully represented by constructs in the
normative language. The dotted arrow from the normative language to the
schema languages on the right indicates that some constructs in the
normative language might not be mappable into certain schema languages,
since many of them have limited expressive power.

For readability, every construct in the normative language has an
equivalent representation in a stylized natural language, including
versions for English, French, and others. The readability of conceptual
graphs, their great expressive power, and their smooth mapping to and
>From natural languages are among the reasons for proposing them as a
basis for the IRDS normative language.

9.1 Conceptual Graphs as a Normative Language

The normative language is based on conceptual graphs, but it also
supports layers of representations to accommodate type hierarchies,
entity-relationship diagrams, and NIAM diagrams. In effect, each of
these other notations can represent a view of the conceptual schema at
a different level of detail. A type hierarchy has the least amount of
detail. It shows the entity types with their subtype-supertype links.
Entity-relationship diagrams show the permissible relations between
types and the cardinality constraints on the relations. NIAM is
essentially a superset of the two: it combines type hierarchies with
cardinality constraints on relations. By showing inheritance through
the type hierarchy, NIAM can also support object-oriented schemas.

The difference between conceptual graphs and the other graphic
notations is based on the distinction between first-order logic and
higher-order logic. First-order logic describes instances of entities
and relationships, as in the sentence "Bob has a green Volvo". With
quantifiers such as "every" and "some", first-order logic can also
state general principles: "Every vehicle has some color." The following
diagram shows conceptual graphs and predicate calculus notations for
these two sentences:

Figure 6. Conceptual Graph Notation.

In a conceptual graph, boxes represent concepts and circles represent
conceptual relations. The POSS relation represents possession, and the
ATTR relation represents attribute. Conceptual graphs also have a fully
equivalent linear notation that uses square brackets for the boxes and
parentheses for the relations.  Following is the linear notation for
the two graphs in Figure 6:

[PERSON: Bob]->(POSS)->[VOLVO]->(ATTR)->[COLOR: green].

[VEHICLE: @every]->(ATTR)->[COLOR].

Second-order logic makes statements about types of entities and
relationships, such as "CAR is a subtype of VEHICLE" or "VEHICLE has
attribute COLOR." Whenever a fact can be stated in either a first-order
or a second-order form, the second-order version is usually shorter:

First-order:  (Ax)(car(x) -> vehicle(x)).
              (Ax)(Ey)(vehicle(x) -> (color(y) & attribute(x,y))).

Second-order:  car < vehicle.
               has_attribute(vehicle, color).

As these examples illustrate, a simple second-order statement can
express information that requires quantifiers in first-order logic.
Because of the absence of quantifiers, many proofs in this restricted
version of second-order logic can be simpler and faster than in
first-order logic. From the statements that every car is a vehicle and
every vehicle has a color, first-order logic can derive the conclusion
that every car has a color:

(Ax)(Ey)(car(x) -> (color(y) & attribute(x,y))).

This inference requires a multistep proof, but the equivalent inference
in second-order logic follows by the simpler principle of inheritance:
since CAR is a subtype of VEHICLE, all attributes of VEHICLE are
inherited by CAR:

has_attribute(car, color).

Second-order logic with quantifiers over types can become highly
complex, but the restricted version without quantifiers is a concise
and efficient language for expressing many kinds of constraints.

Some graphic notations, such as type hierarchies, entity-relationship
diagrams, and NIAM diagrams, express second-order statements without
quantifiers. Since these notations are useful for many applications,
the normative language does support them. Conceptual graphs are a
general graphic language that can express both first-order and
higher-order statements. Following are first-order graphs for the
statements that every car is a vehicle and every vehicle has some
color:

[CAR: @every]- - -[VEHICLE].

[VEHICLE: @every]->(ATTR)->[COLOR].

The equivalents in second-order graphs are statements about types. As
in predicate calculus notation, the first-order graphs require
quantifiers, but the second-order graphs have no quantifiers:

[TYPE: car]<-(SUBT)<-[TYPE: vehicle].

[TYPE: vehicle]->(HAS-ATTR)->[TYPE: color].

The relation HAS-ATTR between types states that the ATTR relation holds
between instances of those types. These second-order conceptual graphs
map directly to type hierarchies and E-R diagrams: the SUBT relation
represents the subtype links in a type hierarchy, and the HAS-ATTR
relation represents the attribute links in an E-R diagram. Similar
second-order relations in conceptual graphs can be used to represent
every kind of link in an E-R or NIAM diagram.

Since conceptual graphs can represent both first-order and higher-order
statements with or without quantifiers, they can represent all the
constraints that can be expressed in type hierarchies, E-R diagrams,
and NIAM in essentially the same way. But they can also state
constraints that are not expressible in those notations. Some typical
examples include:

If a car has color yellow, it is a taxicab.

No person under the age of 18 may drive a car that has more than 150
horespower.

No vehicle may have more wheels on its front axle than its rear axle.

These sentences use negations, if-then rules, comparisons, and
constants such as 18 or 150. None of them could be represented in E-R
diagrams or NIAM, but all of them could be stated in predicate calculus
or conceptual graphs.

9.2 Mapping Grammars

For each schema language, there must be a pair of mapping grammars that
determine how it is to be mapped to and from conceptual graphs (the
normative language). The next diagram shows the languages and their
mapping grammars.

Figure 7. Mapping Grammars.

Since the normative language is more expressive, the mappings from
right to left are total mappings, but the mappings from left to right
are only partial mappings. Since the normative language can be mapped
into stylized natural languages, it is possible to annotate any of the
translations with comments in natural language.

Suppose, for example, that someone wanted to map E-R diagrams into
NIAM. Since NIAM is a more expressive language, it would be possible to
map everything in E-R into the normative language and then into NIAM:

E-R -> Normative Language -> NIAM

But since NIAM is a more expressive language than E-R, not all the
information can be mapped from NIAM into E-R. Whatever is not
expressible in E-R would therefore be mapped into stylized English:

NIAM -> Normative Language -> E-R + English

The stylized natural languages can therefore supplement the existing
schema languages with comments that are readable by both people and
machines.

10.0 References

[1]  Ogden, C. K., Richards, I. A., The Meaning of Meaning, Harcourt
     Brace Jovanovich, New York, 1989 (First published 1923)

[2]  ISO, Concepts and Terminology for the Conceptual Schema,
     ISO TR9007, 1987

[3]  ANSI/X3/SPARC, Study Group on Data Base Management Systems:
     Interim Report 75-02-08, in ACM SIGMOD Newsletter, Vol 7, No. 2, 1975

[4]  Tsichritzis, D., Klug, A. (eds.), The ANSI/X3/SPARC DBMS
     Framework, Report of Study Group on Database Management Systems,
     AFIPS Press, Montvale, NJ 1977

[5]  ISO, Information Technology - Information Resource Dictionary
     Standard (IRDS) Framework, International Standard ISO/IEC 10027, 1990

[6]  ISO, Information Technology - Reference Model of Data Management,
     Draft International Standard ISO/IEC DIS 10032, 1991

[7]  Sowa, J. F., Conceptual Structures: Information Processing in
     Mind and Machine, Addison-Wesley, Reading, MA, 1984

[8]  Sowa, J. F., "Towards the Expressive Power of Natural Language,"
     in Principles of Semantic Networks (J.F. Sowa, ed.), Morgan
     Kaufmann Publishers, San Mateo, CA, 1991, pp. 157-189