< draft-ietf-urn-syntax-04.txt   draft-ietf-urn-syntax-05.txt >
Internet-Draft Ryan Moats Internet-Draft Ryan Moats
draft-ietf-urn-syntax-04.txt AT&T draft-ietf-urn-syntax-05.txt AT&T
Expires in six months March 1997 Expires in six months March 1997
URN Syntax URN Syntax
Filename: draft-ietf-urn-syntax-04.txt Filename: draft-ietf-urn-syntax-05.txt
Status of This Memo Status of This Memo
This document is an Internet-Draft. Internet-Drafts are working This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups. Note that other groups may also areas, and its working groups. Note that other groups may also
distribute working documents as Internet-Drafts. distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other months and may be updated, replaced, or obsoleted by other
skipping to change at page 2, line 10 skipping to change at page 2, line 10
it easy to map other namespaces (which share the properties of URNs) it easy to map other namespaces (which share the properties of URNs)
into URN-space. Therefore, the URN syntax provides a means to encode into URN-space. Therefore, the URN syntax provides a means to encode
character data in a form that can be sent in existing protocols, character data in a form that can be sent in existing protocols,
transcribed on most keyboards, etc. transcribed on most keyboards, etc.
2. Syntax 2. Syntax
All URNs have the following syntax (phrases enclosed in quotes are All URNs have the following syntax (phrases enclosed in quotes are
REQUIRED): REQUIRED):
<URN> ::= "urn:" <NID> ":" <NSS> URN ::= "urn:" NID ":" NSS
where <NID> is the Namespace Identifier, and <NSS> is the Namespace where NID is the Namespace Identifier, and NSS is the Namespace
Specific String. The leading "urn:" sequence is case-insensitive. Specific String. The leading "urn:" sequence is case-insensitive.
The Namespace ID determines the _syntactic_ interpretation of the The Namespace ID determines the _syntactic_ interpretation of the
Namespace Specific String (as discussed in [1]). Namespace Specific String (as discussed in [1]).
RFC 1630 [2] and RFC 1737 [3] each presents additional considerations RFC 1630 [2] and RFC 1737 [3] each presents additional considerations
for URN encoding, which have implications as far as limiting syntax. for URN encoding, which have implications as far as limiting syntax.
On the other hand, the requirement to support existing legacy naming On the other hand, the requirement to support existing legacy naming
systems has the effect of broadening syntax. Thus, we discuss the systems has the effect of broadening syntax. Thus, we discuss the
acceptable syntax for both the Namespace Identifier and the Namespace acceptable syntax for both the Namespace Identifier and the Namespace
Specific String separately. Specific String separately.
2.1 Namespace Identifier Syntax 2.1 Namespace Identifier Syntax
The following is the syntax for the Namespace Identifier. To (a) be The following is the syntax for the Namespace Identifier. To (a) be
consistent with all potential resolution schemes and (b) not put any consistent with all potential resolution schemes and (b) not put any
undue constraints on any potential resolution scheme, the syntax for undue constraints on any potential resolution scheme, the syntax for
the Namespace Identifier is: the Namespace Identifier is:
<NID> ::= <let-num> [ 1,31<let-num-hyp> ] NID ::= let-num [ 1*31let-num-hyp ]
<let-num-hyp> ::= <upper> | <lower> | <number> | "-"
<let-num> ::= <upper> | <lower> | <number> let-num-hyp ::= letter / number / "-"
<upper> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | let-num ::= letter / number
"I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" |
"Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" |
"Y" | "Z"
<lower> ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | letter ::= %x41..5A / %x61..7A
"i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
"q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
"y" | "z"
<number> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | number ::= %x30..39
"8" | "9"
This is slightly more restrictive that what is stated in [4] (which This is slightly more restrictive that what is stated in [4] (which
allows the characters "." and "+"). Further, the Namespace allows the characters "." and "+"). Further, the Namespace
Identifier is case insensitive, so that "ISBN" and "isbn" refer to Identifier is case insensitive, so that "ISBN" and "isbn" refer to
the same namespace. the same namespace.
To avoid confusion with the "urn:" identifier, the NID "urn" is To avoid confusion with the "urn:" identifier, the NID "urn" is
reserved and MUST NOT be used. reserved and MUST NOT be used.
2.2 Namespace Specific String Syntax 2.2 Namespace Specific String Syntax
As required by RFC 1737, there is a single canonical representation As required by RFC 1737, there is a single canonical representation
of the NSS portion of an URN. The format of this single canonical of the NSS portion of an URN. The format of this single canonical
form follows: form follows:
<NSS> ::= 1*<URN chars> NSS ::= 1*URN_chars
<URN chars> ::= <trans> | "%" <hex> <hex> URN_chars ::= trans / ("%" hex hex)
<trans> ::= <upper> | <lower> | <number> | <other> | <reserved> trans ::= letter / number / other / reserved
<hex> ::= <number> | "A" | "B" | "C" | "D" | "E" | "F" | hex ::= number / %x41..46 / %x61..66
"a" | "b" | "c" | "d" | "e" | "f"
<other> ::= "(" | ")" | "+" | "," | "-" | "." | other ::= "(" / ")" / "+" / "," / "-" / "." /
":" | "=" | "@" | ";" | "$" | ":" / "=" / "@" / ";" / "$" / "_" /
"_" | "!" | "*" | "'" "!" / "*" / "'"
Depending on the rules governing a namespace, valid identifiers in a Depending on the rules governing a namespace, valid identifiers in a
namespace might contain characters that are not members of the URN namespace might contain characters that are not members of the URN
character set above (<URN chars>). Such strings MUST be translated character set above (URN_chars). Such strings MUST be translated
into canonical NSS format before using them as protocol elements or into canonical NSS format before using them as protocol elements or
otherwise passing them on to other applications. Translation is done otherwise passing them on to other applications. Translation is done
by encoding each character outside the URN character set as a by encoding each character outside the URN character set as a
sequence of one to six octets using UTF-8 encoding [5], and the sequence of one to six octets using normalized UTF8 [5], and the
encoding of each of those octets as "%" followed by two characters encoding of each of those octets as "%" followed by two characters
from the <hex> character set above. The two characters give the from the hex character set above. The two characters give the
hexadecimal representation of that octet. hexadecimal representation of that octet.
2.3 Reserved characters 2.3 Reserved characters
The remaining character set left to be discussed above is the The remaining character set left to be discussed above is the
reserved character set, which contains various characters reserved reserved character set, which contains various characters reserved
from normal use. The reserved character set follows, with a from normal use. The reserved character set follows, with a
discussion on the specifics of why each character is reserved. discussion on the specifics of why each character is reserved.
The reserved character set is: The reserved character set is:
<reserved> ::= '%" | "/" | "?" | "#" reserved ::= '%" / "/" / "?" / "#"
2.3.1 The "%" character 2.3.1 The "%" character
The "%" character is reserved in the URN syntax for introducing the The "%" character is reserved in the URN syntax for introducing the
escape sequence for an octet. Literal use of the "%" character in a escape sequence for an octet. Literal use of the "%" character in a
namespace must be encoded using "%25" in URNs for that namespace. namespace must be encoded using "%25" in URNs for that namespace.
The presence of an "%" character in an URN MUST be followed by two The presence of an "%" character in an URN MUST be followed by two
characters from the <hex> character set. characters from the <hex> character set.
Namespaces MAY designate one or more characters from the URN Namespaces MAY designate one or more characters from the URN
skipping to change at page 4, line 38 skipping to change at page 4, line 25
these characters are RESERVED for future developments. Namespace these characters are RESERVED for future developments. Namespace
developers SHOULD NOT use these characters in unencoded form, but developers SHOULD NOT use these characters in unencoded form, but
rather use the appropriate %-encoding for each character. rather use the appropriate %-encoding for each character.
2.4 Excluded characters 2.4 Excluded characters
The following list is included only for the sake of completeness. The following list is included only for the sake of completeness.
Any octets/characters on this list are explicitly NOT part of the URN Any octets/characters on this list are explicitly NOT part of the URN
character set, and if used in an URN, MUST be %encoded: character set, and if used in an URN, MUST be %encoded:
<excluded> ::= octets 1-32 (1-20 hex) | "\" | """ | "&" | "<" excluded ::= octets 1-32 (1-20 hex) / "\" / """ /
| ">" | "[" | "]" | "^" | "`" | "{" | "|" | "}" | "~" "&" / "<" / ">" / "[" / "]" / "^" /
| octets 127-255 (7F-FF hex) "`" / "{" / "|" / "}" / "~" /
octets 127-255 (7F-FF hex)
In addition, octet 0 (0 hex) should NEVER be used, in either In addition, octet 0 (0 hex) should NEVER be used, in either
unencoded or %-encoded form. unencoded or %-encoded form.
An URN ends when an octet/character from the excluded character set An URN ends when an octet/character from the excluded character set
(<excluded>) is encountered. The character from the excluded (excluded) is encountered. The character from the excluded character
character set is NOT part of the URN. set is NOT part of the URN.
3. Support of existing legacy naming systems and new naming systems 3. Support of existing legacy naming systems and new naming systems
Any namespace (existing or newly-devised) that is proposed as an Any namespace (existing or newly-devised) that is proposed as an
URN-namespace and fulfills the criteria of URN-namespaces MUST be URN-namespace and fulfills the criteria of URN-namespaces MUST be
expressed in this syntax. If names in these namespaces contain expressed in this syntax. If names in these namespaces contain
characters other than those defined for the URN character set, they characters other than those defined for the URN character set, they
MUST be translated into canonical form as discussed in section 2.2. MUST be translated into canonical form as discussed in section 2.2.
4. URN presentation and transport 4. URN presentation and transport
 End of changes. 20 change blocks. 
35 lines changed or deleted 26 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/