Tuesday, August 4, 2009

Timestamping knowledge in the Semantic Web

Context is an important element of knowledge. Time is an important element of context. If we really want to understand a piece of knowledge, we need to know its context and its timing in the flow of events. Data and knowledge needs to be timestamped.

There is no single time or timestamp for any piece of knowledge. Various timings include:

  • When an observation was made, when the raw observation data was captured. This may be from a hardware sensor monitoring the real physical world, a process monitoring some data stream, or even a user interface.
  • When the raw observation data was analyzed to derive the nominal observation, the nominal knowledge.
  • When was the knowledge stored.
  • When was the knowledge validated.
  • When was the knowledge published or otherwise made available.
  • When the knowledge was calculated from other knowledge.
  • When is the validity of the observation expected to expire.

In some cases, the raw observation data might be preserved and re-analyzed at a later data with "improved" analytic capabilities and the nominal knowledge re-generated. In such cases there would then be multiple pieces of knowledge for each observation, each qualified by the time of analysis or re-analysis.

In some cases there may be latency between the raw sensor capture of the data and the reading of that raw data from the sensor device by the computational device that will record that sensor data. Typically that latency will be too small to matter, but for high-speed capture sequences it may be significant. Two separate timestamps may be needed. Or, a discrete timestamp for each processing step along the way.

A piece of knowledge may have been captured from multiple sources, so we need to represent the distinct sources and their distinct timings as well. Collectively they may still represent a single logical observation. An example might be a 3-D camera which is really multiple cameras.

One could also link a number of discrete but simultaneous observations, such as all cameras in a given area, so that collectively they can be considered a single super observation. That overall super observation can have its own timestamps, but there also needs to be a way to drill down to get all of the component timestamps.

The timing of capture by multiple sources may be close enough to be considered the same time, or maybe enough time had elapsed to suggest that they were different observations. Actually, they are different observations in any case, but the issue is whether they are equivalent, or more precisely equivalent in some particular sense. This concept of sense of equivalence needs to be explored more fully.

Each observation station may have its own timepiece and they may not be synchronized. One solution might be to suggest that timepiece synchronization should be a standard protocol when two or more devices are exchanging information that is time-sensitive. Maybe the local time is recorded and then a delta time is recorded for any data that is transferred between two devices.

Calculated data is especially problematic because each of the elements of data used in the calculation may have its own timestamps. The implication is that each piece of calculated data should have an element trail that references each of those pieces of knowledge used in the calculation so that they can be examined later if the data needs to be audited.

Now, how all of these timestamps would be represented and stored in the Semantic Web is another matter entirely and left for further contemplation.

-- Jack Krupansky

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home