TOC 
Network Working GroupL. Masinter
Internet-DraftAdobe
Intended status: InformationalOctober 22, 2010
Expires: April 25, 2011 


The 'tdb' and 'duri' URI schemes, based on dated URIs
draft-masinter-dated-uri-07

Abstract

This document defines two URI schemes. The first, 'duri' (standing for "dated URI"), allows indicating a URI as of a particular date (and time). This allows explicit reference to the "time of retrieval", similar to the way in which bibliographic references containing URIs are used.

The second scheme, 'tdb' ( standing for "Thing Described By"), provides a way of using a way of minting URIs for anything that can be described, with the ability to fix the description to a given date or time. The 'tdb' URI scheme may reduce the need to define define new URN namespaces merely for the purpose of creating stable identifiers for concepts or abstractions: it provides a ready means for identifying "non-information resources" by semantic indirection -- a way of creating a URI for anything.

Note

This document is not a product of any working group. Many of the ideas here have been discussed since 2001. This document has been discussed on the mailing list <uri@w3.org>. Previous versions have couched 'tdb' and 'tdb' as URN namespaces.

Status of this Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

This Internet-Draft will expire on April 25, 2011.

Copyright Notice

Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.



Table of Contents

1.  Overview and Requirements
    1.1.  Persistent identifiers
    1.2.  URIs for abstractions
2.  Syntax
    2.1.  'duri' Syntax
    2.2.  tdb Syntax
    2.3.  encoded-URI encoding
    2.4.  Timestamp syntax
3.  Semantics
    3.1.  'duri' Semantics
    3.2.  'tdb' Semantics
    3.3.  Timestamp Semantics
4.  Use as a Locator
5.  Hierarchy
6.  Additional Considerations
    6.1.  Embedded URI schemes
    6.2.  Useful timestamps
    6.3.  Free assignment
    6.4.  Resolution
    6.5.  Why Names with Semantics?
    6.6.  Avoiding MetaData
    6.7.  Avoiding 'duri' and 'tdb'
    6.8.  'tdb' and levels of indirection
7.  URI Specification Templates
    7.1.  'duri' Scheme Template
    7.2.  tdb Scheme Template
8.  IANA considerations
9.  Security Considerations
10.  Acknowledgements
11.  References
    11.1.  Normative References
    11.2.  Informative References
§  Author's Address




 TOC 

1.  Overview and Requirements

The URI schemes defined here address several related problems:



 TOC 

1.1.  Persistent identifiers

[RFC1737] (Sollins, K., “Functional Requirements for Uniform Resource Names,” December 1994.) defines several requirements for Uniform Resource Names. In particular, it requires "persistence":

Persistence: It is intended that the lifetime of a URN be permanent. That is, the URN will be globally unique forever, and may well be used as a reference to a resource well beyond the lifetime of the resource it identifies or of any naming authority involved in the assignment of its name.

Many people have wondered how to create globally unique and persistent identifiers. There are a number of URI schemes and URN namespaces already registered. However, an absolute guarantee of both uniqueness and persistence is very difficult.

In some cases, the guarantee of persistence comes through a promise of good management practice, such as is encouraged in "Cool URLs don't change" (Berners-Lee, T., “Cool URIs don't change,” 1998.) [COOL]. However, relying on promise of good management practice is not the same as having a design that guarantees reliability independent of actual administrative practice.

A primary design goal for URIs is that they are intended to mean the same thing, no matter in what context they appear: a "Uniform" way to Identify a Resource. However, even when URIs have Uniform meaning from the point of view of the source of the reference, they don't guarantee stability over time. Despite best efforts and intentions, identifying information can change in unpredictable ways: domain names can disappear or be reassigned, name assigning organizations can change structure, responsibility, disappear, merge, or change in unpredictable ways.

There is a significant dependence in the interpretation of many URNs with the concept of "naming authority". The authority is presumably some individual or organization both to insure uniqueness of assignment and also to help with understanding the meaning of the link between the name and the named.

However, authorities, whether individuals or organizations, have a lifetime, and must be consulted at some point to understand the bindings. The functioning of names as unique identifiers and holders of meaning depends on having a reliable infrastructure of consulting the authority or the authorities records to determine the thing referenced.



 TOC 

1.2.  URIs for abstractions

The description of URIs [RFC3986] (Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifiers (URI): Generic Syntax,” January 2005.) describes a range for 'Resource' that is quite broad:

This specification does not limit the scope of what might be a resource; rather, the term "resource" is used in a general sense for whatever might be identified by a URI. Familiar examples include an electronic document, an image, a source of information with a consistent purpose (e.g., "today's weather report for Los Angeles"), a service (e.g., an HTTP-to-SMS gateway), and a collection of other resources. A resource is not necessarily accessible via the Internet; e.g., human beings, corporations, and bound books in a library can also be resources. Likewise, abstract concepts can be resources, such as the operators and operands of a mathematical equation, the types of a relationship (e.g., "parent" or "employee"), or numeric values (e.g., zero, one, and infinity).

One might use a URI such as "mailto:" email address to identify a person, or a "http:" URI to identify an abstract comment. However, this leaves the question of how one might identify, within the same context, both the system mailbox and the person to which it is assigned, or the web page at a http URI and the concept it describes. The 'tdb' URI scheme allows ready assignment of URIs for abstractions that are distinguished from the media content that describes them.

The goal, then, of the 'tdb' URI scheme is to provide a mechanism which is, at the same time:

permanent: The identity of the resource identified is not subject to reinterpretation over time.
explicitly bound: The mechanism by which the identified resource can be determined is explicitly included in the URI.
useful for non-networked items: Allows identification of resources outside the network: people, organizations, abstract concepts.
no administration: The mechanism does not depend on reliable administrative processes of authorities for either assignment or interpretation.



 TOC 

2.  Syntax



 TOC 

2.1.  'duri' Syntax

A 'duri' URI takes the form:

     duri:<timestamp>:<encoded-URI>

where <timestamp> is s sequence of digits representing a date and time (Section 2.4 (Timestamp syntax)) and <encoded-URI> is an absolute URI-reference [RFC3986] (Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifiers (URI): Generic Syntax,” January 2005.) in which any reserved character other than "/" have been percent-encoded (Section 2.3 (encoded-URI encoding)). Note that the URI which has been encoded MAY include a fragment identifier.



 TOC 

2.2.  tdb Syntax

A 'tdb' URI takes a similar form:

     tdb:<timestamp>:<encoded-URI>

with the same syntax.



 TOC 

2.3.  encoded-URI encoding

The following characters must be encoded within <encoded-URI>:



 TOC 

2.4.  Timestamp syntax

A timestamp in these URI schemes consists of a restricted subset of date times, as per [RFC3339] (Klyne, G. and C. Newman, “Date an Time on the Internet: Timestamps,” July 2002.). The goal is to allow relatively short expressions with no ambiguity, but also with arbitrary precision.

  timestamp = date [ "T" time "Z" ]
  date       =date-fullyear [ "-" date-month [ "-" date-mday ]]
  time       = time-hour  [ ":" time-minute
               [ ":" time-second [ time-secfrac ]]]

where non-terminals "date-fullyear", "date-month", "date-mday", "time-hour", "time-minute", "time-second", "time-secfrac" are taken from [RFC3339] (Klyne, G. and C. Newman, “Date an Time on the Internet: Timestamps,” July 2002.). The goal was to minimize the amount of precision needed, while retaining the possibility of generating timestamps that are exactly compatible with [RFC3339] (Klyne, G. and C. Newman, “Date an Time on the Internet: Timestamps,” July 2002.) "date-time" non-terminal.



 TOC 

3.  Semantics



 TOC 

3.1.  'duri' Semantics

It is traditional in convention references and citations in printed works to include the date of publication; this practice serves the important purpose that the context of the naming can be determined.

The meaning of a 'duri' URI is "the resource that was identified by the <encoded-URI> (after hex decoding) at the date(time) given".

For example, "duri:2001:http://www.ietf.org" is a persistent identifier to "http://www.ietf.org" as of 2001. A 'duri' URI may not be a resource locator in a practical sense: the time of location has not yet arrived or has passed.



 TOC 

3.2.  'tdb' Semantics

The 'tdb' URI scheme is intended to be useful for describing entities, concepts, abstractions, and other items which may not themselves be network accessible resources, but have been at some point described by network accessible resources.

A 'tdb' URI is intended to be used where the <encoded-URI> identifies a 'document' (something a person could read, peruse, understand) or a fragment thereof, where the document describes some thing or concept. The 'tdb' URI itself then identifies the subject of that document. It is common practice to give a reference for a concept by including a pointer to a document, segment, phrase that defines the concept; 'tdb' attempts to capture this practice in URI space.

For example, one might use "tdb:2008:http://www.ietf.org" as a persistent identifier for the Internet Engineering Task Force, as described by the "http://www.ietf.org" in 2008.

The 'tdb' URI scheme differs from other URI or URN methods for identifying abstractions because the designation of what is actually identified by the 'tdb' doesn't depend on knowing the intention of the "assigner" of the identifier. Unlike "tag", "info", "cid", "mid" or related schemes, the identification is not dependent on the context of use. The 'tdb' URI scheme can be thought of as giving a way to invoke a level of semantic indirection to URI resolution.

While one could imagine using 'tdb' without a date, it would leave the possibility that a reference that is unambiguous at one time might become ambiguous at some other time. There are two ways that the date is useful for 'tdb' URIs: it fixes the time of access of the resource, for variable descriptions, and it fixes the time of interpretation, for descriptions whose meaning (in natural language) might vary.



 TOC 

3.3.  Timestamp Semantics

It is traditional in convention references and citations in printed works to include the date of publication; this practice serves the important purpose that the context of the naming can be determined.

While one could imagine using 'tdb' without a timestamp, it would leave the possibility that a reference that is unambiguous at one time might become ambiguous at some other time. There are two ways that the date is useful for 'tdb': it fixes the time of access of the resource, for variable descriptions, and it fixes the time of interpretation, for descriptions whose meaning (in natural language) might vary. While normally, in a literary work in natural language which makes a reference to another work, both the reference itself and the work referenced are dated, e.g., a footnote in an article written in 1967 might talk about a "private communication" which itself had a date. The difference between a URI and a conventional literary reference is the desire to be able to extract the URI from its context and still retain its meaning.

The meaning of a timestamp is the interval specified by the granularity of the time range indicated, in the UTC time zone, as described in [RFC3339] (Klyne, G. and C. Newman, “Date an Time on the Internet: Timestamps,” July 2002.). If necessary, timestamps can include times and even fractional times, so that a generator of 'duri' or 'tdb' URIs can be arbitrarily precise.

If there is any ambiguity of the resource within the range of time indicated (for example, if the timestamp consists only of a year, and the resource changes over the course of the year), then the resource state as of the very last instant of the range indicated should be used.

Timestamps are allowed to be specified with as much precision as needed. This keeps most 'duri' and 'tdb' URIs relatively short.



 TOC 

4.  Use as a Locator

A 'duri' URI is not directly useful as a resource locator, since many resources vary their content over time.

A 'tdb' URI is not a resource locator in a practical sense, since it explicitly requires human interpretation. However, it allows one to know that a resource was described at some point in time; whether the description is still available, or whether that description is still meaningful, is not guaranteed.



 TOC 

5.  Hierarchy

For 'tdb', the "thing described by" a resource may bear little relationship to the "thing described by" a relative pointer, so the 'tdb' URI scheme seems to have no use cases for using "/" as a hierarchical delimiter.

However, 'duri' URIs can often be used with relative URI references with some amount of reliability. Note, however, that double-encoding of previously encoded URI characters will cause some problems.



 TOC 

6.  Additional Considerations



 TOC 

6.1.  Embedded URI schemes

The intent of 'duri' and 'tdb' is to use them with embedded URI references that identify documents or document fragments. That is, they are most useful for "information resources".

For example, use with a "http" URI can be used to refer to a web page or the subject of a web as it was described at the given time. This can be a way of referring to a web site at some time in the past, or an organization that has changed, merged, split, or disappeared.

Local systems that have known-to-be unique host names can use "file" URIs with 'tdb', for example,

    tdb:20010814142327:file://this.example.com/c|/temp/test.txt

since this use is primarily focused on providing a unique way of identifying an abstraction, even if the referent of the abstraction is not widely known. (Using 'file:' URIs in this way without a fully qualified domain name would not be appropriate, because the interpretation is not uniform.)

One might consider using 'tdb' with a "data" URI to designate concepts that can be described uniquely briefly inline. For example,

     tdb:2001:data:,The%20US%20president

names the concept described by the (text/plain) string "The US president" at the very last instant of 2001. Of course, this practice is only useful if the referent of the data is (or was at the time) completely unique. Since "data" does not contain a way to designate content-language, the string in question would have to not be ambiguous as to its language. In the case of 'data', there is no assigning authority at all; the interpretation of the 'tdb' depend on the interpreting community.

Using 'tdb' or 'duri' with an embedded 'urn:' might not seem to be too useful, But it might be useful where the assignment of names in a URN namespace are not, in practice, permanent, or that one might want to refer to the assignment as of a given date. In this case, it is possible to use a "urn" within a 'duri', e.g.,

      duri:2000:urn:ietf:std:50

might be used to refer to "the document that the IETF considered to be STD 50, as of the last instant of 2000".

For 'tdb', many URIs identify resources which do not clearly describe anything at all. The "home page" for an organization isn't nearly as good a resource to use to describe an organization as the organization's "about" page. But it is up to the minter of the 'tdb' URI to choose wisely.



 TOC 

6.2.  Useful timestamps

Timestamps far in the future are suspect, because the future content of a description resource cannot usually be reliably predicted. Timestamps which preceed the availability of the description resource should not be used either. For example, using a http URI with a timestamp before the description resource is also not recommended.

However, although these practices are not recommended, there is no assurance that they haven't been used; by itself, a 'tdb' URI by itself does not constitute an assertion that the description resource was available or assigned at the date specified.

Note that the use of the "very last instant" allows for the conventional bibliographic convention that a work published in 2009 can use "2009" as the date string, to refer to the work in the year of publication.



 TOC 

6.3.  Free assignment

Because of the many possible schemes that can be used in the <encoded-URI> portion, there should be no difficulty in almost any computational process being able to assign 'duri' or 'tdb' URIs at will. Of course, it is necessary for there to be some resource which is available at some point in time, and to have a clock which is accurate to the granularity of the frequency of assignment.



 TOC 

6.4.  Resolution

There are no direct resolution servers or processes for 'duri' or 'tdb' URIs. However, a 'duri' URI might be "resolvable" in the sense that a resource that was accessed at a point in time might have the result of that access cached or archived in an Internet archive service. See, for example, the "Internet Archive" project [archive] (Kahle, B., “Preserving the Internet,” March 1997.). And a 'tdb' URI is "resolvable" in the sense that the description resource can be accessed and interpreted.

Clients without access to an Internet archive service might take the decoded <encoded-URI> of a 'duri' and attempt resolution of *that* identifier. This will give an approximation whose reliability depends on the what has happened in the time since the date indicated.



 TOC 

6.5.  Why Names with Semantics?

There are a number of URI and URN schemes that create otherwise unbound "names", where the scheme only provides for uniqueness, with some other agent or process or context providing the authority to interpret the meaning of the identifier at some point in the future. 'duri' and 'tdb' is different, in that it is the agreement between the describer (the agent creating the URI) and the receiver of the URI (the agent interpreting the URI) to agree upon the semantics without any reference to any third party.



 TOC 

6.6.  Avoiding MetaData

One might consider the timestamp in a 'duri' or 'tdb' URI to be just one piece of additional metadata about the URI, and consider adding other pieces of metadata as annotation.

However, the use of the timestamp is intended primarily as a mechanism of accomplishing uniqueness over time. No other bit of metadata or description readily fills that purpose. Further, the date is not descriptive (an assertion about the URI) but merely refining.



 TOC 

6.7.  Avoiding 'duri' and 'tdb'

Many applications of URIs already provide a context of timestamp. For example, one could imagine a hypertext system where the URIs contained within a document were intended to refer to the resources as of the date of the enclosing document. This would be a reasonable interpretation of URIs within an Internet archive system, for example.

Some applications of URIs already implicitly use the level of interpretive indirection that is explicit with 'tdb', For example, within an ontology language definition, the URIs used for abstract concepts, individuals and so forth are generally considered the "thing described by" the URI.

In addition, the 'application/rdf+xml' Media Type [RFC3870] (Swartz, A., “application/rdf+xml Media Type Registration,” September 2004.) uses the fragment identifier resolution as an explicit way of identifying abstract concepts that are described by an RDF document.



 TOC 

6.8.  'tdb' and levels of indirection

The 'tdb' scheme introduces a level of semantic indirection. The puzzles and confusions about use and mention, name and reference, and levels of indirection have been puzzling and amusing for quite a while.

"It's long," said the Knight, "but it's very, very beautiful. Everybody that hears me sing it--either it brings tears into their eyes, or else--"
"Or else what?" said Alice, for the Knight had made a sudden pause.
"Or else it doesn't, you know. The name of the song is called 'Haddock's Eyes.'"
"Oh, that's the name of the song, is it?" Alice said, trying to feel interested.
"No, you don't understand," the knight said, looking a little vexed. "That's what the name is called. The name really is 'The Aged Aged Man.'"
"Then I ought to have said 'That's what the song is called'?" Alice corrected herself.
"No, you oughtn't: that's quite another thing! The song is called 'Ways and Means': but that's only what it's called, you know!"
"Well, what is the song, then?" said Alice, who was by this time completely bewildered.
"I was coming to that," the Knight said. "The song really is 'A-sitting On A Gate': and the tune's my own invention." [LOOK] (Carroll, L., “Through the Looking Glass,” 1872.)



 TOC 

7.  URI Specification Templates



 TOC 

7.1.  'duri' Scheme Template

URI scheme name:
duri
Status:
permanent
URI scheme syntax:
Briefly, the syntax is tdb:<timestamp>:<encoded-URI>
The syntax is described in this document.
URI scheme semantics:
A URI as of a particular time. Semantics are described in detail in this document.
Encoding considerations:
'duri' URIs consist of a prefix followed by another URI, and should have the same encoding considerations as others. Note discussion of double-encoding.
Applications/protocols that use this URI scheme name:
Limited: this scheme was originally developed as a "thought experiment", although there is some discussion of using it with Memento [MEMENTO] (Memento Development Group, “Memento: Adding Time to the Web,” 2010.).
Interoperability considerations:
The actual interoperability with Internet archiving services needs further exploration.
Security considerations:
See Section 9 (Security Considerations) of this document.
Contact:
Larry Masinter tdb:2010:http://larry.masinter.net
Author/Change controller:
as above
References:
See References of this document.



 TOC 

7.2.  tdb Scheme Template

URI scheme name:
tdb
Status:
permanent
URI scheme syntax:
Briefly, the syntax is tdb:<timestamp>:<encoded-URI>
The syntax is described in this document.
URI scheme semantics:
Semantic indirection at indicated date. Semantics are described in detail in this document.
Encoding considerations:
'tdb' URIs consist of a prefix followed by another URI, and should have the same encoding considerations as others. Note discussion of double-encoding possibilities.
Applications/protocols that use this URI scheme name:
Limited: This scheme was originally designed as a "thought experiment", as a way resolve some of the use/mention ambiguities in semantic web applications that wish to "denote" concepts and other ideas and not just access resources over the Internet.
Interoperability considerations:
Existing semantic web applications may have other means of fixing meaning at a particular time or semantic indirection, and do not fix description by time.
Security considerations:
See Section 9 (Security Considerations) of this document.
Contact:
Larry Masinter tdb:2010:http://larry.masinter.net
Author/Change controller:
as above
References:
See References of this document.



 TOC 

8.  IANA considerations

This document includes two URI scheme registrations (Section 7 (URI Specification Templates) that should be entered into the IANA registry of URI schemes as a permanent registration (once approved).



 TOC 

9.  Security Considerations

'tdb' identifiers are not any more reliable because they have dates. URIs don't contain enough information to supply the authority for deciding what was or wasn't at a given URI at a given date.



 TOC 

10.  Acknowledgements

There have been many discussions over several years on the relationship of URLs, URNs, URIs, resources and resource identifiers, with many contributions. Particular thanks to Alfred Hines, Herbert Van de Sompel, Al Gilman, Aaron Swartz, Brian McBride, Stuart Williams, Michael Mealling, Ray Denenberg and Pat Hayes.



 TOC 

11.  References



 TOC 

11.1. Normative References

[RFC3339] Klyne, G. and C. Newman, “Date an Time on the Internet: Timestamps,” RFC 3339, July 2002.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifiers (URI): Generic Syntax,” RFC 3986, January 2005.
[namespaces] Bray, T., Hollander, D., and A. Layman, “Namespaces in XML,” W3C Recommendation REC-xml-names, January 1999.


 TOC 

11.2. Informative References

[COOL] Berners-Lee, T., “Cool URIs don't change,” 1998.
[LOOK] Carroll, L., “Through the Looking Glass,” 1872.
[MEMENTO] Memento Development Group, “Memento: Adding Time to the Web,” 2010.
[RFC1737] Sollins, K., “Functional Requirements for Uniform Resource Names,” RFC 1737, December 1994.
[RFC3870] Swartz, A., “application/rdf+xml Media Type Registration,” RFC 3870, September 2004.
[archive] Kahle, B., “Preserving the Internet,” Scientific American , March 1997.


 TOC 

Author's Address

  Larry Masinter
  Adobe
  345 Park Ave
  San Jose, CA 95110
  US
Phone:  +1 408 536 3024
Email:  LMM@acm.org
URI:  http://larry.masinter.net