Re: Storing Names Internationally

fritz@rodin.wustl.edu (Fritz Lehmann)
Date: Wed, 12 Apr 95 03:45:18 CDT
From: fritz@rodin.wustl.edu (Fritz Lehmann)
Message-id: <9504120845.AA09010@rodin.wustl.edu>
Newsgroups: comp.databases.theory
Subject: Re: Storing Names Internationally
References: <1995Apr7.134446.4477@inet.d48.lilly.com> <3m4boa$4t@nuhou.aloha.net> <3m629b$70o@bigfoot.ecl.wustl.edu> <3mchvm$lfa@ephor.tusc.com.au>
Organization: Center for Optimization and Semantic Control, Washington University
Apparently-To: srkb@cs.umbc.edu
Apparently-To: fritz@rodin.wustl.edu
Sender: owner-srkb@cs.umbc.edu
Precedence: bulk
In article <3mchvm$lfa@ephor.tusc.com.au>,
Nigel McFarlane <nrm@ephor.tusc.com.au> wrote, in comp.databases.theory:

>|>      If I have failed to account for ANY possible name, of anyone in the
>|> world, please let me know and post corrections here.
>	Amazing ... such an effort demands at least feedback, if not medical
>	attention :-).

You've got the right idea!  In order to form a genuine union of the
data elements/fields/columns which can be used for something like
addresses, you have to try to anticipate anything that might ever arise.
Of course there will eventually be something not anticipated, but to make
this exercise have some value, you need a certain fanaticism about
tracking down every possibility in advance.  That's how the YY-Address
specification grew from an expected two pages to 100 pages (if all the
crosslinking "tags" to other ontologies, standards, address formats and data
dictionaries are included).  Keep in mind: this is not intended to be a data
format; rather, it is a set of semantic classes which may be used in a
metadata schema to annotate any data format in the world dealing with
address information.  The purpose is to fix the meaning of the classes
as carefully as possible, so that once two different systems have been
annotated, automatic translation from one to the other is possible,
even if the systems have totally different formats and conceptualizations
of the domain.  A further goal is to provide formal semantic definitions
in a logical language like Conceptual Graphs, KIF, HOL, etc. so that the 
meaning can be accessed and used by a knowledge-based program.  The
computer will "know" that an American county-equivalent is a 
geographic part of a state-equivalent, and suchlike.

>	I couldn't see where you specify the character set for various
>	name bits.  Do you want to cover the bushmen of the Kalahari Desert
>	whose names can include '!', e.g. nx!au.

Because these classes are semantic, issues of language, syntax, writing,
etc. are excluded and relegated to the metadata specification.  (There
are a few exceptions, where the written form interacts with meaning.)
So yes of course we must account for the bushmen (after all, one could
have a package delivered to a bushman), but the semantic classes don't
need to deal explicitly with spelling or orthography peculiarities,
assuming that nx!au is recognized by a parser as being a name.

>	Did you want to cover Balkan names where in some places the trailing
>	syllable of the surname indicates the sex of the person (someone
>	will correct me here).

Since the version I posted, I added:
------------------
DESCENT-INDICATOR-NAMES (Simonovitch, Simonovna, Ilyich, etc.)
  [Laffal.73 KIN YNG|FL]
    DESCENT-INDICATOR-BETWEEN-NAMES (Abdul bin Said, David Ben Gurion,
          Mustapha ibn Sa'ud, Joshua bar Joseph, Leah beth Ruth, etc.)
------------------
Does my "vitch" and "ovna" address your Balkan concern?  I had not considered
these to be "surnames" as such, thinking of Pyotr Ilyich Tchaikovsky.
But, clearly, Great Minds Think Alike!

>	I couldn't see where you catered for the few who still wish to be
>	known as John Smith, Esquire or Esq. - e.g. Bugs Bunny Esq.

That is not in the NAME section at all, but is in the PERSONAL-TITLE 
addressee qualifier TRADITIONAL-NON-HEREDITARY-TITLE-WORD:
-----------------
TRADITIONAL-NON-HEREDITARY-TITLE-WORD
<qualification by sources of title>
     KNIGHTHOOD (Sir, Lady, Dame, K.G., etc.)
       [Wilkins1668 RC.I.3.A{GENTLEMAN, Esquire, Sir, Madam, Worshipful, }|FL]
     GENDER/MARITAL-HONORIFIC  (Mister, Missus, Miss, Master, Madam)
       --(Warning: In Thailand, MR. refers to the great-grandchild of a king.)
       PLURAL-GENDER/MARITAL-HONORIFIC (Misses, Messrs, Mesdames)
       <dependent encodings>
         ABBREVIAITONS (Mr., Mrs., Ms., Messrs., M., Mlle., Mon Senhor, 
                         Nosso Senhor, etc.)
     PERSONAL-RESPECT-HONORIFIC  (Don, -san, Esq., Sri, etc.)
-----------------

>	Also you note 'St.' as a Name-Preceder, but 'St. John' is also
>	a christian name (pronounced 'sinjin') and the 'St.' is not an
>	abbreviation, and the space is required.

I had not made the distinction between British "St. John Smith" and, say,
Jill St. John, or Viscount St. Davids.  Why should one distinguish these?
Any surname can also be used as a forename.

>	Further Name-Preceders are D' and d' (some frenchman will correct
>	me here) as in D'artangion.(Australians are the natural enemy of
>	the French language ...)

Thanks, I hadn't thought of it.  But note that the _values_ of fields,
shown in parentheses, are just frills.  The main thing is to define a place
for them, semantically.

>	Finally, I hope you never meet a person who requires that the
>	correct written or pictographic form of their name must be
>	captured specifically in blue ink!

So do I.  It is enough to deal with "the musician formerly known as
Prince", Prince Leonard of the Hutt River Province Principality, and
the American who changed his name to a four-digit number!  Time for
that medical attention ....

Thanks for your response, and let me know if you have a need for
this sort of detailed data meaning specification.

>	cheers, Nigel.

                          Yours truly,   Fritz Lehmann
GRANDAI Software, 4282 Sandburg Way, Irvine, CA 92715, U.S.A.
Tel:(714)-733-0566  Fax:(714)-733-0506  fritz@rodin.wustl.edu
=============================================================