TOC 
Network Working GroupL. Masinter
Internet-DraftAdobe Systems
Intended status: InformationalFebruary 04, 2009
Expires: August 8, 2009 


"duri" and "tdb" URN namespaces based on dated URIs
draft-masinter-dated-uri-05

Status of this Memo

This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on August 8, 2009.

Copyright Notice

Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document.

Abstract

This document defines two namespaces of URNs, based on using a timestamp with an (encoded) URI. The results are namespaces in which names are readily assigned, offer the persistence of reference that is required by URNs, but do not require a stable authority to assign the name. The first namespace ("duri") is used to refer to URI-identified resources as they appeared at a particular time. The second namespace ("tdb") is useful as a way of creating URNs that refer to physical objects or even abstractions that are not themselves networked resources.

The definition of these namespaces may reduce the need to define new URN namespaces merely for the purpose of creating stable identifiers. In addition, they provide a ready means for identifying "non-information resources" by semantic indirection.

Note

This document is not a product of any working group. Many of the ideas here have been discussed since 2001. This document has been discussed on the mailing list <uri@w3.org>.



Table of Contents

1.  Overview and Requirements
    1.1.  Easy URN assignment
    1.2.  Persistent identifiers
    1.3.  URIs for abstractions
2.  Namespace definitions
    2.1.  "duri" namespace
    2.2.  "tdb" namespace
3.  Encoding URIs
    3.1.  Characters that must be encoded
    3.2.  Hierarchy and unencoded /
4.  Dates
5.  Additional Considerations
    5.1.  Embedded URI schemes
    5.2.  Why dates for both tdb and duri?
    5.3.  Useful dates
    5.4.  Free assignment
    5.5.  Resolution
    5.6.  Why Names with Semantics?
    5.7.  Avoiding MetaData
    5.8.  Avoiding duri and tdb
    5.9.  tdb and levels of indirection
6.  URN Specification Templates
    6.1.  duri Specification Template
    6.2.  tdb Specification Template
7.  IANA considerations
8.  Security Considerations
9.  Acknowledgements
10.  References
    10.1.  Normative References
    10.2.  Informative References
§  Author's Address




 TOC 

1.  Overview and Requirements

The URN namespaces defined here solve several related problems.



 TOC 

1.1.  Easy URN assignment

The URN specification [7] (Sollins, K., “Functional Requirements for Uniform Resource Names,” December 1994.) allows for many URN namespaces, and many have been registered. However, obtaining an appropriate URN in any of the currently defined URN namespaces may be difficult: a number of URN namespace registrations have been accompanied by comments that no other URN namespace was available for the class of documents for which identifiers were wanted.



 TOC 

1.2.  Persistent identifiers

RFC 1737 (Sollins, K., “Functional Requirements for Uniform Resource Names,” December 1994.) [7] defines several requirements for Uniform Resource Names. In particular, it requires "persistence":

Persistence: It is intended that the lifetime of a URN be permanent. That is, the URN will be globally unique forever, and may well be used as a reference to a resource well beyond the lifetime of the resource it identifies or of any naming authority involved in the assignment of its name.

Many people have wondered how to create globally unique and persistent identifiers. There are a number of URI schemes and URN namespaces already registered. However, an absolute guarantee of both uniqueness and persistence is very difficult.

In some cases, the guarantee of persistence comes through a promise of good management practice, such as is encouraged in "Cool URLs don't change" (Berners-Lee, T., “Cool URIs don't change,” 1998.) [6]. However, relying on promise of good management practice is not the same as having a design that guarantees reliability independent of actual administrative practice.

A primary design goal for URIs is that they are intended to mean the same thing, no matter in what context they appear: a "Uniform" way to Identify a Resource. However, even when URIs have Uniform meaning from the point of view of the source of the reference, they don't guarantee stability over time. Despite best efforts and intentions, identifying information can change in unpredictable ways: domain names can disappear or be reassigned, name assigning organizations can change structure, responsibility, disappear, merge, or change in unpredictable ways.

There is a significant dependence in the interpretation of many URNs with the concept of "naming authority". The authority is presumably some individual or organization both to insure uniqueness of assignment and also to help with understanding the meaning of the link between the name and the named.

However, authorities, whether individuals or organizations, have a lifetime, and must be consulted at some point to understand the bindings. The functioning of names as unique identifiers and holders of meaning depends on having a reliable infrastructure of consulting the authority or the authorities records to determine the thing referenced.



 TOC 

1.3.  URIs for abstractions

The description of URIs [3] (Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifiers (URI): Generic Syntax,” January 2005.) describes a range for 'Resource' that is quite broad:

This specification does not limit the scope of what might be a resource; rather, the term "resource" is used in a general sense for whatever might be identified by a URI. Familiar examples include an electronic document, an image, a source of information with a consistent purpose (e.g., "today's weather report for Los Angeles"), a service (e.g., an HTTP-to-SMS gateway), and a collection of other resources. A resource is not necessarily accessible via the Internet; e.g., human beings, corporations, and bound books in a library can also be resources. Likewise, abstract concepts can be resources, such as the operators and operands of a mathematical equation, the types of a relationship (e.g., "parent" or "employee"), or numeric values (e.g., zero, one, and infinity).

One might use a URI such as "mailto:" email address to identify a person, or a "http:" URI to identify an abstract comment. However, this leaves the question of how one might identify, within the same context, both the system mailbox and the person to which it is assigned, or the web page at a http URI and the person or concept it describes. The "tdb" URN scheme allows ready assignment of URIs for abstractions that are distinguished from the media content that describes them.

The goal, then, of the "tdb" URN scheme proposed below is to provide a mechanism which is, at the same time:

permanent: The identity of the resource identified is not subject to reinterpretation over time.
explicitly bound: The mechanism by which the identified resource can be determined is explicitly included in the URI.
useful for non-networked items: Allows identification of resources outside the network: people, organizations, abstract concepts.
no administration: The mechanism does not depend on reliable administrative processes of authorities for either assignment or interpretation.



 TOC 

2.  Namespace definitions



 TOC 

2.1.  "duri" namespace

It is traditional in convention references and citations in printed works to include the date of publication; this practice serves the important purpose that the context of the naming can be determined.

The "duri" URN namespace takes the form:

     urn:duri:<date>:<encoded-URI>

where <date> is a digit string corresponding to a date (Section 4 (Dates)), and an <encoded-URI> is an absolute URI-reference [3] (Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifiers (URI): Generic Syntax,” January 2005.) in which any character excluded from URN syntax has been escaped (Section 3 (Encoding URIs)).

The meaning of a duri is "the resource (or fragment) that was identified by the <encoded-URI> (after hex decoding) at the very last instant of the date(time) given".

For example, 'urn:duri:2001:http://www.ietf.org' is a persistent identifier to 'http://www.ietf.org' as of the very last instant of the year 2001. A duri may not be a resource locator in a practical sense if the time of location has not yet arrived or has passed. However, is an acceptable resource identifier, and fulfills all of the requirements for URNs [7] (Sollins, K., “Functional Requirements for Uniform Resource Names,” December 1994.).



 TOC 

2.2.  "tdb" namespace

The second URN namespace defined is a parallel space which is useful for describing entities, concepts, abstractions, and other items which are not themselves network accessible resources, but have been at some point described by network accessible resources.

The "tdb" namespace designates the "thing described by" a resource at a given URI at the given time. This URN namespace is described by 'tdb', e.g.,

        urn:tdb:<date>:<encoded-URI>

with the same syntactic rules as 'duri'.

The intent is to use the inversion of "is a document about". It is common practice to give a reference for a concept by including a pointer to a document, segment, phrase that defines the concept. "tdb" attempts to capture this practice in URI space.

For example, "urn:tdb:2001:http://www.ietf.org" can be used to designate the Internet Engineering Task Force organization, at least as it was described by or referenced by its home page at the last instant of 2001.

The "tdb" namespace differs from most other URN methods for identifying abstractions because the designation of what is actually identified by the tdb doesn't depend on knowing the intention of the "assigner" of the identifier. Unlike many of the alternatives proposed, the identification is not dependent on the context of use.

The "tdb" namespace can be thought of as adding a level of semantic indirection to URI resolution. While one could imagine using 'tdb' without a date, it would leave the possibility that a reference that is unambiguous at one time might become ambiguous at some other time. There are two ways that the date is useful for "tdb" URNs: it fixes the time of access of the resource, for variable descriptions, and it fixes the time of interpretation, for descriptions whose meaning (in natural language) might vary.



 TOC 

3.  Encoding URIs

Both "duri" and "tdb" URN namespaces require that some characters in the URI references be encoded.



 TOC 

3.1.  Characters that must be encoded

The characters that must be encoded are:



 TOC 

3.2.  Hierarchy and unencoded /

The URN specification [7] (Sollins, K., “Functional Requirements for Uniform Resource Names,” December 1994.) discourages the use of "/" in URNs because, in general, there is no good interpretation of hierarchy and relative URIs for assigned names. However, for the particular case of duris (at least), there seems to be no good reason to avoid the "/" because it corresponds fairly naturally (in many cases) to the hierarchy of the original space.

Note that because of this, "duri" URNs can actually be used with relative URI references with some amount of reliability. (The double encoding of previously encoded URI characters causes some problems.)



 TOC 

4.  Dates

A <date> is a simple expression of date, optional time, with arbitrary precision. The goal is to allow relatively short expressions of dates with no ambiguity, and with arbitrary precision.

  date = year [ month [ day [ hour [ minute [ second [ fraction ]]]]]]

   year     = 4digit
   month    = 2digit
   day      = 2digit
   hour     = 2digit
   minute   = 2digit
   second   = 2digit
   fraction = *digit

The representation of a date or time refers to the very last instant of the given date/time range at the resolution supplied, so that 1999 and 1999123123595999999 are equivalent. If necessary, "dates" can include times and even fractional times, so that a generator of duris can be arbitrarily precise.

Dates are interpreted relative to International Atomic Time (TAI) [4] (Bureau International des Poids et Mesures, “International Atomic Time,” .). The syntax and semantics are similar to those in RFC 2550 (Glassman, S., Manasse, M., and J. Mogul, “Y10K and Beyond,” April 1 1999.) [8]; in particular, using TAI avoids ambiguity about time zones and difficulties with leap seconds.



 TOC 

5.  Additional Considerations



 TOC 

5.1.  Embedded URI schemes

The intent of "duri" and "tdb" is to use them with embedded URI references that identify documents or document fragments. That is, they are most useful for information resources.

For example, use with a "http" URI can be used to refer to a web page or the subject of a web site at a given time. This can be a way of referring to a web site at some time in the past, or an organization that has changed or merged.

Local systems that have known-to-be unique host names can use "file" URIs with "tdb", for example,

    urn:tdb:20010814142327:file://this.example.com/c|/temp/test.txt

since this use is primarily focused on providing a unique way of identifying an abstraction, even if the referent of the abstraction is not widely known. (Using 'file:' URIs in this way without a fully qualified domain name is not recommended, because the interpretation is not uniform.)

Some URI schemes are more problematic. For example, using tdb or duri with an embedded 'urn:' might not seem to be too useful. But it might be useful where the assignment of names in a URN namespace are not, in practice, permanent, or that one might want to refer to the assignment as of a given date. In this case, it is possible to use a "urn" within a "duri", e.g.,

      urn:duri:2000:urn:ietf:std:50

might be used to refer to "the document that was STD 50 in effect as of the last instant of 2000". [2] (Moats, R., “A URN Namespace for IETF Documents,” August 1999.)

One might consider using "tdb" with "data" to designate concepts that can be described uniquely briefly inline. For example,

     urn:tdb:2001:data:,The%2520US%2520president

names the concept described by the (text/plain) string "The US president" at the very last instant of 2001. (Note the awkward double quoting of space as "%20" and then the "%" as "%25".) Of course, this practice is only useful if the referent of the data is (or was at the time) completely unique. Since "data" does not contain a way to designate content-language, the string in question would have to not be ambiguous as to its language. In the case of 'data', there is no assigning authority at all; the interpretation of the 'tdb' URN depend on the interpreting community.



 TOC 

5.2.  Why dates for both tdb and duri?

Some have suggested that perhaps only one of "tdb" or "duri" might have a date parameter, e.g., allow "tdb:<encoded-URI>" as a URI scheme, and "urn:duri:<date>:tdb:<encoded-URI>" if one really wanted to fix the date of interpretation. While doing this might be possible, it seems that the usage pattern which allows the date to be omitted when needed satisfies the instances where the resource doing the description is permanent to stand.

There are actually two dates to consider, with "tdb". There is the date that the resource is obtained, and there is the date that the description it makes is read, understood, and used to denote. Normally in a literary work in natural language which makes a reference to another work, both the reference itself and the work referenced are dated, e.g., a footnote in an article written in 1967 might talk about a "private communication" which itself had a date. The difference between a URI and a conventional literary reference is the desire to be able to extract the URI from its context and still retain its meaning.



 TOC 

5.3.  Useful dates

Dates far in the future are suspect, because the meaning of the duri or tdb cannot readily be determined in advance reliably. Dates whose range ends before assignment of the resource to the embedded URI SHOULD NOT be used, because the meaning of the reference is left in question. For example, using http URIs with a date range which ends before a web service was available at the given URI doesn't make much sense.

However, although these practices are NOT RECOMMENDED, there is no assurance that they haven't been used; by itself, a duri/tdb does not constitute an assertion that the encoded-URI was available or assigned at the date specified.

Note that the use of the "very last instant" allows for the conventional bibliographic convention that a work published in 2009 can use "2009" as the date string, to refer to the work in the year of publication.



 TOC 

5.4.  Free assignment

Because of the many possible schemes that can be used in the <encoded- URI> portion, there should be no difficulty in almost any computational process being able to assign duris or tdbs at will. Of course, it is necessary for there to be some resource which is available at some point in time, and to have a clock which is accurate to the granularity of the frequency of assignment.



 TOC 

5.5.  Resolution

There no accurate resolution servers for duri or tdb URNs. However, duri might be "resolvable" in the sense that a resource that was accessed at a point in time might have the result of that access cached or archived in an Internet archive service. See, for example, the "Internet Archive" project [9] (Kahle, B., “Preserving the Internet,” March 1997.). A "tdb" is only tesolvable in the sense that if the corresponding duri can be esolved, it may be possible that the result can be accessed and interpreted.

Clients without access to an Internet archive service might take the decoded <encoded-URI> of a duri and attempt resolution of *that* identifier. This will give an approximation whose reliability depends on the amount of time elapsed since the date indicated.



 TOC 

5.6.  Why Names with Semantics?

There are a number of proposals for URN schemes that create otherwise unbound "names", where the URN scheme only provides for uniqueness. Neither "duri" nor "tdb" intrinsically have the property that the names assigned are without any resolution semantics. This is intentional; it's difficult to create names that carry no semantics whatsoever about the authority that assigned the name and the intention of the authority for what the name should designate.



 TOC 

5.7.  Avoiding MetaData

One might consider the date in a duri/tdb to be just one piece of additional metadata about the encoded-URI, and consider adding other pieces of metadata as annotation.

However, the use of the date in a duri/tdb is intended primarily as a mechanism of accomplishing uniqueness over time. No other bit of metadata or description readily fills that purpose. Further, the date is not descriptive (an assertion about the encoded-URI) but merely refining.



 TOC 

5.8.  Avoiding duri and tdb

Many applications of URIs already provide a context of date. For example, one could imagine a hypertext system where the URIs contained within a document were intended to refer to the resources as of the date of the enclosing document. This would be a reasonable interpretation of URIs within an Internet archive system, for example.

And some applications of URIs arguably already contain the level of interpretive indirection that is explicit with "tdb". For example, one might consider the use of URIs as namespace names within XML [5] (Bray, T., Hollander, D., and A. Layman, “Namespaces in XML,” January 1999.) as a reference to the "thing described by" the URI used.



 TOC 

5.9.  tdb and levels of indirection

The "tdb" scheme introduces a level of semantic indirection. The puzzles and confusions about use and mention, name and reference, and levels of indirection have been puzzling and amusing for quite a while.

"It's long," said the Knight, "but it's very, very beautiful. Everybody that hears me sing it--either it brings tears into their eyes, or else--"
"Or else what?" said Alice, for the Knight had made a sudden pause.
"Or else it doesn't, you know. The name of the song is called 'Haddock's Eyes.'"
"Oh, that's the name of the song, is it?" Alice said, trying to feel interested.
"No, you don't understand," the knight said, looking a little vexed. "That's what the name is called. The name really is 'The Aged Aged Man.'"
"Then I ought to have said 'That's what the song is called'?" Alice corrected herself.
"No, you oughtn't: that's quite another thing! The song is called 'Ways and Means': but that's only what it's called, you know!"
"Well, what is the song, then?" said Alice, who was by this time completely bewildered.
"I was coming to that," the Knight said. "The song really is 'A-sitting On A Gate': and the tune's my own invention." [10] (Carroll, L., “Through the Looking Glass,” 1872.)



 TOC 

6.  URN Specification Templates



 TOC 

6.1.  duri Specification Template

Namespace ID:
"duri" requested.
Registration Information:
Registration Version: 1 Registration Date: 2001-08-19
Declared registrant of the namespace:
Larry Masinter
Declaration of syntactic structure:
Briefly, the syntax is urn:duri:<date>:<encoded-URI>
The syntax is described in this document.
Relevant ancillary documentation:
(See References of this document)
Identifier uniqueness considerations:
Uniqueness is guaranteed by the structure of adding a designation of a specific instant to a URI. However, URIs with ambiguous interpretation at any given instant (e.g., "file" URIs without a given host name) will not be unique.
Identifier persistence considerations:
The designation of a dated URI is completely persistent for all time.
Process of identifier assignment:
Any date can be used with any URI independently by anyone.
Process of identifier resolution:
Identifiers can only be resolved approximately. See Section 5.5 (Resolution).
Conformance with URN Syntax:
Note that the use of "/" for hierarchy, while discouraged in the URN specification, is allowed in duris.
Rules for Lexical Equivalent:
For dates, YYYY is equivalent to YYYY01, YYYYMM is equivalent to YYYYMM01, while YYYYMMDD is equivalent to YYYYMMDD0... followed by any number of 0's. In considering equivalence of the encoded URI, if two duris with equivalent dates contain lexically equivalent URIs, the duris are equivalent.
Validation mechanism:
Dates should be reasonable and meet the syntactic requirements. The URI encoded within should meet the syntactic requirements of the URI scheme used.
Scope:
Global.



 TOC 

6.2.  tdb Specification Template

Namespace ID:
"tdb" requested.
Registration Information:
Registration Version: 1 Registration Date: 2002-04-01
Declared registrant of the namespace:
Larry Masinter
Declaration of syntactic structure:
Briefly, the syntax is urn:tdb:<date>:<encoded-URI>
The syntax is described in Section 2.1 ("duri" namespace).
Relevant ancillary documentation:
(See References of this document)
Identifier uniqueness considerations:
Uniqueness is guaranteed by the structure of adding a designation of a specific instant to a URI. However, URIs with ambiguous interpretation at any given instant (e.g., "file" URIs without a given host name) will not be unique.
Identifier persistence considerations:
The designation of a dated URI is completely persistent for all time, although the intent of a resource that is no longer available will be hard to discern.
Process of identifier assignment:
Any date can be used with any URI independently by anyone. However, assigning an identifier to a non-networked resource such as a person or abstract concept requires (at least conceptually) first creating a networked resource that uniquely describes the target, and then constructing the duri using the URI of the description.
Process of identifier resolution:
Resolution of "tdb" identifiers requires interpreting the resource identified by the corresponding "duri". See Section 2.2 ("tdb" namespace) of this document.
Rules for Lexical Equivalent:
As with "duri"; see Section 6.1 (duri Specification Template).
Conformance with URN Syntax:
As with "duri"; see Section 6.1 (duri Specification Template).
Validation mechanism:
As with "duri", see Section 6.1 (duri Specification Template).
Scope:
Global.


 TOC 

7.  IANA considerations

This document includes two URN NID registrations (Section 6.1 (duri Specification Template) and Section 6.2 (tdb Specification Template)) that should be entered into the IANA registry of URN NIDs.



 TOC 

8.  Security Considerations

"duri" and "tdb" identifiers are not any more reliable because they have dates. URIs don't contain enough information to supply the authority for deciding what was or wasn't at a given URI at a given date.



 TOC 

9.  Acknowledgements

There have been many discussions over several years on the relationship of URLs, URNs, URIs, resources and resource identifiers, with many contributions. Particular thanks to Aaron Swartz, Brian McBride and Stuart Williams, Michael Mealling, and many others.



 TOC 

10.  References



 TOC 

10.1. Normative References

[1] Moats, R., “URN Syntax,” RFC 2141, May 1997.
[2] Moats, R., “A URN Namespace for IETF Documents,” RFC 2648, August 1999.
[3] Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifiers (URI): Generic Syntax,” RFC 3986, January 2005.
[4] Bureau International des Poids et Mesures, “International Atomic Time.”
[5] Bray, T., Hollander, D., and A. Layman, “Namespaces in XML,” W3C Recommendation REC-xml-names, January 1999.


 TOC 

10.2. Informative References

[6] Berners-Lee, T., “Cool URIs don't change,” 1998.
[7] Sollins, K., “Functional Requirements for Uniform Resource Names,” RFC 1737, December 1994.
[8] Glassman, S., Manasse, M., and J. Mogul, “Y10K and Beyond,” RFC 2550, April 1 1999.
[9] Kahle, B., “Preserving the Internet,” Scientific American , March 1997.
[10] Carroll, L., “Through the Looking Glass,” 1872.


 TOC 

Author's Address

  Larry Masinter
  Adobe Systems
  345 Park Ave
  San Jose, CA 95110
  US
Phone:  +1 408 536 3024
Email:  LMM@acm.org
URI:  http://larry.masinter.net