Wednesday, July 8, 2009

What's my name? Who am I?

They seem like such simple, obvious questions: What's your name? Who are you? In the "real" world the answers are easy, and online casually they are also easy, but in a hard-core semantic sense, boy are they tough problems. Sure, there is no problem if all you are using a name for is a text label or where the context provides qualifying information, but in a general, abstract sense names and identities are very hard problems.

So, what is my name?

Casually, as you see at the bottom of my blog posts, I am Jack Krupansky. Simple enough.

But... Jack is just my nick name and not suitable for any legal documents. My driver's license and bills and credit cards and financial accounts all have my legal first name, John. So, I am "really" John Krupansky.

Actually, I almost never use John Krupansky. In formal, legal contexts, including my driver's license, bills, voter registration, etc., I always use my middle initial: W. So, legally I refer to myself as John W. Krupansky, with the period.

Actually, my driver's license says: KRUPANSKY, JOHN W, without the period.

And my credit cards say JOHN W KRUPANSKY, also without the period.

Personally, I never abbreviate my first name, but in some contexts my name could also be any of:

  • J. Krupansky
  • J. KRUPANSKY
  • J Krupansky
  • J KRUPANSKY
  • J. W. Krupansky
  • J. W. KRUPANSKY
  • J W Krupansky
  • J W KRUPANSKY
  • Krupansky, J.
  • KRUPANSKY, J.
  • Krupansky, J
  • KRUPANSKY, J
  • Krupansky, J. W.
  • KRUPANSKY, J. W.
  • Krupansky, J W
  • KRUPANSKY, J W

In some contexts, such as publication of a letter or comment, a publisher might abbreviate my last name as:

  • Jack K.
  • John K.
  • John W. K.

Oh, I forgot to mention that my middle W. stands for William. So my birth certificate says John William Krupansky. My passport says:

KRUPANSKY
JOHN WILLIAM

Please note that "J. Krupansky", "J Krupansky" and "J KRUPANSKY" are not necessarily my name. In some contexts the "J" is really an abbreviation for Judge. There are only two examples I know of, but they are (were) real: Judge Robert Brazil Krupansky and Judge Blanche Krupansky. They are not relatives as far as I know. They might be distant relatives, but that is not known.

Did I say that John Krupansky is my name? Well, yes, but it is not only my name. A Web search shows that there are at least two other people who "have" that name, so I cannot technically claim exclusive ownership. There is a John Krupansky from upstate NY or Kentucky and there is a John Joseph Krupansky out there somewhere.

Almost forgot, there was another John Krupansky, even before I was born, a John F. Krupansky or John Frank Krupansky, my grandfather. That may be part of the reason I became known as "Jack". The rest of the reason was that in first grade of elementary school, there were four John's out of 20 kids.

As far as I know, there are no other John W. Krupansky's out there. But, that is not something that we can count on.

You would think that with all of the "intelligence" and horsepower in modern computers that all of these variations could be sorted out with no effort required on our part, but that is not the case. Sure, various pieces of software do have varying degrees of smarts for dealing with names, but the emphasis is on varying.

Each of the various John Krupansky's does indeed have a distinct identity (probably at least social security number, driver's license state and number, and residential address), but automatically mapping from John Krupansky or J Krupansky or Krupansky, J. to each of us is as yet an unsolved problem (in general.)

As far as I know, the Semantic Web and the various Semantic Web technologies as well as the various prototype semantic search engines do not even offer a proposed solution to this problem of mapping an informal textual name reference to a specific identity. In theory, on the Semantic Web there should be a specific concept or URI for each of us Johns or Krupansky, J., for each of our identities. In fact, the situation is so complex that even Google does not offer a name search capability that is able to deal with the simple variations I have detailed here.

Oops, I forgot another variation, back in Europe, there was an accent on the y of Krupansky and you can even use Google to find some of those European Krupansky's. Semantic search needs to be able to handle both the accented and unaccented forms as well as an option for whether to require the accents to match.

The good news, for me personally, is that it does not appear that there is any other Jack Krupansky out there, at least right now.

Oh, and who is Jack Krupanski? Well, it's actually me, but spelled wrong. What computer software knows that?

To some people I am Mr. John Krupansky. Is the Mr. part of my name? Good question.

Almost forgot... there are also people out there who insist that my name is jack krupansky without any capitals. In general, capitalization does not matter, but it can matter when text is being parsed to be indexed and software is attempting to recognize names.

At this stage, I think we need to consider the following for any semantic web:

  1. Ultimately, each person needs to have a unique URI that represents their identity.
  2. That identity needs to include all of the name components, such as first name, middle name, last name, suffix, title, nick name, etc. as attributes.
  3. Each of the various forms of your name needs to have its own URI. That should include misspellings, for example, Jack Krupanski. That also includes variations in titles and suffixes.
  4. There should be RDF for many-to-many mappings between the various identities for each name form and the potential identities that share that name form, so that given a name form the possible identities can be examined and given an identity the possible name forms can be examined.
  5. Whether in a UI or an API, given a name form, it should be possible to examine the various name forms that might be equivalent.
  6. Have the concept of preferred name form. But there could be multiple preferred forms, such as nick name vs. legal name.

Back to the headline question, for any legal context I always use John W. Krupansky. But, sometimes, I actually run into a form that does not request a middle initial, so then I am John Krupansky. Even then, legal contexts tend to include one or more of social security number, drivers license state and number, and residential address. Still, it feels odd using a form of name that I know is not unique.

In non-legal contexts, such as random social networking web sites, I almost always use Jack Krupansky. I do the same for business cards as well, although I have thought of switching to using my legal first name on business cards.

My resume has John William Krupansky plus Jack Krupansky and happens to use John W. Krupansky in the copyright notice.

The other answer to the question is that I respond by asking what field format you need my name in (and whether it is for a "legal" context.) Actually, I usually respond with Jack Krupansky and then optionally revise to John if it becomes clear that it is a legal context.

In any case, I am dubious when I run into a single field such as name, author, or creator that doesn't seem to care what form a name is in. That is fine for famous names, but for everybody else it is a recipe for confusion. The solution is to require the identity URI for the person and to have a convenient UI for looking up names.

If it was up to me, I would bad simple text name fields. Or maybe not ban them but require a validation rule that checks for uniqueness and then automatically maps to the true identity.

-- Jack Krupansky

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home