Re: The semantics of new-edi

Fritz Lehmann <flehmann@orion.oac.uci.edu>
Date: Sun, 18 Dec 1994 18:54:52 -0800
From: Fritz Lehmann <flehmann@orion.oac.uci.edu>
Message-id: <199412190254.AA06245@orion.oac.uci.edu>
Newsgroups: bit.listserv.edi-l
Subject: Re: The semantics of new-edi
References: <Chameleon.4.01.1.941218124429.kds@kds.cs.mu.oz.au>
Organization: GRANDAI Software
Apparently-To: ulerydl@arl.mil
Apparently-To: agc@scs.leeds.ac.uk
Apparently-To: cg@cs.umn.edu
Apparently-To: srkb@cs.umbc.edu
Sender: owner-srkb@cs.umbc.edu
Precedence: bulk
In article <Chameleon.4.01.1.941218124429.kds@kds.cs.mu.oz.au>,
Ken Steel  <ksteel@CS.MU.OZ.AU> wrote (on bit.listserv.edi-l):
>Fritz,
>Would you please comment on how you believe we should tackle our
>two great puzzlements of "new-edi":
>1.      How do we devise a set of rules which will enable us
>        to construct a label for any data elements given the
>        semantic definition. The idea is that any programmer
>        or business systems designer anywhere in the world
>        (speaking English) would be able to come up with
>        exactly the same label. I'm thinking in terms of
>        35 characters, but it's flexible.
>2.        Should such a label include/exclude the transaction
>        context in which it is used?
>Fritz, having had the chance to discuss your ontology theories with
>you in person quite recently, you may well have the answer we are
>seeking. However, would you please interpret the answer so my poor
>brain can take it in. :-).
>Regards,
>Ken

     Your idea of developing some means of automatically
generating a code (unambiguously and reliably) for a data
element, from the text decriptions given of the intended concept
by multiple users worldwide, is a first approximation of
what needs to be done, but I'm afraid it is doomed if any
simplistic attempt is made.  Different people will use different
words and the descriptions will have different structures.

     Instead of cooking up a "label" which loses the structure
of the original definition, why not retain it?

     Problems with the simple "unique label" notion:
1. Synonyms and homonyms.  One definer says "part", another
says "component".  Or one says "bank marker" and that is
ambiguous between river bank and financial bank, and even if the
latter can be established from the context, no-one but the
ultra-specialist knows what "marker" actually means.

2.  Loss of structure.  The semantic structure of a description
is an abstract network of concepts and relations (a semantic network).
In natural language this net must be crammed into a linear string
of words -- this is done in numerous ways.  If I define an element as
"The location at which an acknowledgement of lading is received
when the billing address and the shipping address are the same and
the goods are accepted by the carrier." somebody else can define
exactly the same meaning in scores, perhaps hundreds, of different
English sentences.  The only way, mathematically, to create one
reliable code for all of these is to encode the semantic net itself.

3.  Hidden relationships.  People, especially business people, are
forever describing data with noun pairs, without stating the 
relationship between the nouns.  A famous linguistic example:
Do you know what "alligator shoes" means?  Then what are "horse
shoes"?  EDIFACT has "quota value"; how would this differ
from "value quota"? In DE 1001, 580 means "cover note".  
X12 has "haulage movement", "section profile", "slip sheet" etc.
It's often obvious what's meant, but not always and certainly not
to a computer with logical reasoning capability (AI).  Suppose
one of your definers says "consignment labor" -- there's
no clue what that actually means -- the relationship between
"consignment" and "labor" is unstated, hence hidden.

     The way to encode the meaning of data elements is to try to
do it right instead of relying on a shortcut or hack.  That
means that any terse code-label points to its formal meaning
expressed in a knowledge representation language such as conceptual
graphs (or KIF, etc.).  The conceptual graph is a "label" but it is much
more, since it 1. Has homonyms and synonyms eliminated, 2. Preserves
the semantic network structure of the defining phrase, and 3.
makes all relations explict, without hiding them.  No short
string derived ad hoc from English will accomplish this, and
in particular it would not generate the same code for all
people defining the same concept.

>2.        Should such a label include/exclude the transaction
>        context in which it is used?
>Fritz, having had the chance to discuss your ontology theories with
>you in person quite recently, you may well have the answer we are
>seeking. However, would you please interpret the answer so my poor
>brain can take it in. :-).
>Regards,
>Ken

     Conceptual graphs have a "context" mechanism which is directly
suited for this kind of thing.  The "context" contains a set
of background assumptions.  Although it's not necessary technically,
I assume that EDI would exploit this to avoid redundant
descriptions of terms with similar and related content.  (You
have met Gerard Ellis of the Royal Melbourne Institute of 
Technology down the street from you; he's an expert on conceptual
graphs and he knows other experts.)

     Numerous people are working right now on automatically
generating the conceptual graph (or similar semantic network,
canonical logic form, or frame representation) from simple English
descriptions.  For _general_ text, accuracy is perhaps around 60%;
for specialized "sublanguage" text it is now over 90%.
By the time EDI is ready for this, this will be ready for EDI.

     My belief is that even if people are very precise in their
definitions, most definitions of closely related concepts will not
coincide exactly (almost never), which is what led me to propose the
EGG/YOLK theory which Tony Cohn and I presented at CIKM-94.  It's
purpose is to allow translations between defined concepts when
the definitions match pretty well but not precisely.

     Further, I think it's OK to use text-to-conceptual-graph for
novel definitions developed in machine-negotiated EDI (as 
described in my recent EC Workshop lecture which you saw), but
the _standard_ definitions (of widely useful concepts as is now
intended by the Basic Semantic Repository) should be done directly
in conceptual graphs by people who are skilled semantic analysts.
I'm not inclined to let just anyone define the EDI "core".  For
one thing, the analysts will need a pretty good understanding of
the underlying (non EDI) ontologies.


                          Yours truly,   Fritz Lehmann
GRANDAI Software, 4282 Sandburg Way, Irvine, CA 92715, U.S.A.
Tel:(714)-733-0566  Fax:(714)-733-0506  fritz@rodin.wustl.edu
=============================================================