idnits 2.17.1 

draft-iab-identifier-comparison-07.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the
     document.

  == There are 2 instances of lines with private range IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x,
     198.51.100.x or 203.0.113.x.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 421: '...dentity of an Internet host, it SHOULD...'
     RFC 2119 keyword, line 423: '...#.#.#.#") form.  The host SHOULD check...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (December 15, 2012) is 4148 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'RFC5890' is mentioned on line 550, but not defined

  == Outdated reference: A later version (-09) exists of
     draft-iab-privacy-considerations-03

  == Outdated reference: A later version (-09) exists of
     draft-ietf-precis-problem-statement-08

  -- Obsolete informational reference (is this intentional?): RFC 3490
     (Obsoleted by RFC 5890, RFC 5891)


     Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                     D. Thaler, Ed.
3	Internet-Draft                                                 Microsoft
4	Intended status: Informational                         December 15, 2012
5	Expires: June 18, 2013

7	         Issues in Identifier Comparison for Security Purposes
8	                 draft-iab-identifier-comparison-07.txt

10	Abstract

12	   Identifiers such as hostnames, URIs, and email addresses are often
13	   used in security contexts to identify security principals and
14	   resources.  In such contexts, an identifier supplied via some
15	   protocol is often compared against some policy to make security
16	   decisions such as whether the principal may access the resource, what
17	   level of authentication or encryption is required, etc.  If the
18	   parties involved in a security decision use different algorithms to
19	   compare identifiers, then failure scenarios ranging from denial of
20	   service to elevation of privilege can result.  This document provides
21	   a discussion of these issues that designers should consider when
22	   defining identifiers and protocols, and when constructing
23	   architectures that use multiple protocols.

25	Status of this Memo

27	   This Internet-Draft is submitted in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF).  Note that other groups may also distribute
32	   working documents as Internet-Drafts.  The list of current Internet-
33	   Drafts is at http://datatracker.ietf.org/drafts/current/.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   This Internet-Draft will expire on June 18, 2013.

42	Copyright Notice

44	   Copyright (c) 2012 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents
49	   (http://trustee.ietf.org/license-info) in effect on the date of
50	   publication of this document.  Please review these documents
51	   carefully, as they describe your rights and restrictions with respect
52	   to this document.  Code Components extracted from this document must
53	   include Simplified BSD License text as described in Section 4.e of
54	   the Trust Legal Provisions and are provided without warranty as
55	   described in the Simplified BSD License.

57	Table of Contents

59	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
60	     1.1.  Canonicalization . . . . . . . . . . . . . . . . . . . . .  4
61	   2.  Security Uses  . . . . . . . . . . . . . . . . . . . . . . . .  5
62	     2.1.  Types of Identifiers . . . . . . . . . . . . . . . . . . .  6
63	     2.2.  False Positives and Negatives  . . . . . . . . . . . . . .  7
64	     2.3.  Hypothetical Example . . . . . . . . . . . . . . . . . . .  8
65	   3.  Common Identifiers . . . . . . . . . . . . . . . . . . . . . .  9
66	     3.1.  Hostnames  . . . . . . . . . . . . . . . . . . . . . . . .  9
67	       3.1.1.  IPv4 Literals  . . . . . . . . . . . . . . . . . . . . 10
68	       3.1.2.  IPv6 Literals  . . . . . . . . . . . . . . . . . . . . 11
69	       3.1.3.  Internationalization . . . . . . . . . . . . . . . . . 12
70	       3.1.4.  Resolution for comparison  . . . . . . . . . . . . . . 12
71	     3.2.  Ports and Service Names  . . . . . . . . . . . . . . . . . 13
72	     3.3.  URIs . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
73	       3.3.1.  Scheme component . . . . . . . . . . . . . . . . . . . 15
74	       3.3.2.  Authority component  . . . . . . . . . . . . . . . . . 15
75	       3.3.3.  Path component . . . . . . . . . . . . . . . . . . . . 16
76	       3.3.4.  Query component  . . . . . . . . . . . . . . . . . . . 16
77	       3.3.5.  Fragment component . . . . . . . . . . . . . . . . . . 16
78	       3.3.6.  Resolution for comparison  . . . . . . . . . . . . . . 17
79	     3.4.  Email Address-like Identifiers . . . . . . . . . . . . . . 17
80	   4.  General Issues . . . . . . . . . . . . . . . . . . . . . . . . 18
81	     4.1.  Conflation . . . . . . . . . . . . . . . . . . . . . . . . 18
82	     4.2.  Internationalization . . . . . . . . . . . . . . . . . . . 18
83	     4.3.  Scope  . . . . . . . . . . . . . . . . . . . . . . . . . . 19
84	     4.4.  Temporality  . . . . . . . . . . . . . . . . . . . . . . . 20
85	   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 20
86	   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 21
87	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 21
88	   8.  Informative References . . . . . . . . . . . . . . . . . . . . 21
89	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 24

91	1.  Introduction

93	   In computing and the Internet, various types of "identifiers" are
94	   used to identify humans, devices, content, etc.  Before discussing
95	   security issues, we first give some background on some typical
96	   processes involving identifiers.

98	   As depicted in Figure 1, there are multiple processes relevant to our
99	   discussion.
100	   1.  An identifier must first be generated.  If the identifier is
101	       intended to be unique, the generation process includes some
102	       mechanism, such as allocation by a central authority, to help
103	       ensure uniqueness.  However the notion of "unique" involves
104	       determining whether a putative identifier matches any other
105	       already-allocated identifier.  As we will see, for many types of
106	       identifiers, this is not simply an exact binary match.

108	       As a result of generating the identifier, it is often stored in
109	       two locations: with the requester or "holder" of the identifier,
110	       and with some repository of identifiers (e.g., DNS).  For
111	       example, if the identifier was allocated by a central authority,
112	       the repository might be that authority.  If the identifier
113	       identifies a device or content on a device, the repository might
114	       be that device.
115	   2.  The identifier must be distributed, either by the holder of the
116	       identifier or by a repository of identifiers, to others who could
117	       use the identifier.  This distribution might be electronic, but
118	       sometimes it is via other channels such as voice, business card,
119	       billboard, or other form of advertisement.  The identifier itself
120	       might be distributed directly, or it might be used to generate a
121	       portion of another type of identifier that is then distributed.
122	       For example, a URI or email address might include a server name,
123	       and hence distributing the URI or email address also inherently
124	       distributes the server name.
125	   3.  The identifier must be used by some party.  Generally the user
126	       supplies the identifier which is (directly or indirectly) sent to
127	       the repository of identifiers.  For example, using an email
128	       address to send email to the holder of an identifier may result
129	       in the email arriving at the holder's email server which has
130	       access to the mail stores.

132	       The repository of identifiers must then attempt to match the
133	       user-supplied identifier with an identifier in its repository.

135	                            +------------+
136	                            |  Holder of |     1. Generation
137	                            | identifier +<---------+
138	                            +----+-------+          |
139	                                 |                  | Match
140	                                 |                  v/
141	                                 |          +-------+-------+
142	                                 +----------+ Repository of |
143	                                 |          |  identifiers  |
144	                                 |          +-------+-------+
145	                 2. Distribution |                  ^\
146	                                 |                  | Match
147	                                 v                  |
148	                       +---------+-------+          |
149	                       |      User of    |          |
150	                       |    identifier   +----------+
151	                       +-----------------+    3. Use

153	                       Typical Identifier Processes

155	                                 Figure 1

157	   One key aspect is that the identifier values passed in generation,
158	   distribution, and use, may all be different forms.  For example,
159	   generation might be exchanged in printed form, distribution done via
160	   voice, and use done electronically.  As such, the match process can
161	   be complicated.

163	   Furthermore, in many uses, the relationship between holder,
164	   repositories, and users may be more involved.  For example, when a
165	   hierarchy of web caches exist, each cache is itself a repository of a
166	   sort, and the match process is usually intended to be the same as on
167	   the origin server.

169	1.1.  Canonicalization

171	   Perhaps the most common algorithm for comparison involves first
172	   converting each identifier to a canonical form (a process known as
173	   "canonicalization" or "normalization"), and then testing the
174	   resulting canonical representations for bitwise equality.  In so
175	   doing, it is thus critical that all entities involved agree on the
176	   same canonical form and use the same canonicalization algorithm so
177	   that the overall comparison process is also the same.

179	   Note that in some contexts, such as in internationalization, the
180	   terms "canonicalization" and "normalization" have a precise meaning.
181	   In this document, however, we use these terms synonymously in their
182	   more generic form, to mean conversion to some standard form.

184	   While the most common method of comparison includes canonicalization,
185	   comparison can also be done by defining an equivalence algorithm,
186	   where no single form is canonical.  However in most cases, a
187	   canonical form is useful for other purposes, such as output, and so
188	   in such cases defining a canonical form suffices to define a
189	   comparison method.

191	2.  Security Uses

193	   Identifiers such as hostnames, URIs, and email addresses are used in
194	   security contexts to identify principals and resources as well as
195	   other security parameters such as types and values of claims.  Those
196	   identifiers are then used to make security decisions based on an
197	   identifier supplied via some protocol.  For example:
198	   o  Authentication: a protocol might match a security principal
199	      identifier to look up expected keying material, and then match
200	      keying material.
201	   o  Authorization: a protocol might match a resource name to look up
202	      an access control list (ACL), and then look up the security
203	      principal identifier (or a surrogate for it) in that ACL.
204	   o  Accounting: a system might create an accounting record for a
205	      security principal identifier or resource name, and then might
206	      later need to match a supplied identifier to (for example) add new
207	      filtering rules based on the records in order to stop an attack.

209	   If the parties involved in a security decision use different matching
210	   algorithms for the same identifiers, then failure scenarios ranging
211	   from denial of service to elevation of privilege can result, as we
212	   will see.

214	   This is especially complicated in cases involving multiple parties
215	   and multiple protocols.  For example, there are many scenarios where
216	   some form of "security token service" is used to grant to a requester
217	   permission to access a resource, where the resource is held by a
218	   third party that relies on the security token service (see Figure 2).
219	   The protocol used to request permission (e.g., Kerberos or OAuth) may
220	   be different from the protocol used to access the resource (e.g.,
221	   HTTP).  Opportunities for security problems arise when two protocols
222	   define different comparison algorithms for the same type of
223	   identifier, or when a protocol is ambiguously specified and two
224	   endpoints (e.g., a security token service and a resource holder)
225	   implement different algorithms within the same protocol.

227	        +----------+
228	        | security |
229	        |  token   |
230	        | service  |
231	        +----------+
232	             ^
233	             | 1. supply credentials and
234	             | get token for resource
235	             |                                             +--------+
236	        +----------+  2. supply token and access resource  |resource|
237	        |requester |=------------------------------------->| holder |
238	        +----------+                                       +--------+

240	                         Simple Security Exchange

242	                                 Figure 2

244	   In many cases the situation is more complex.  With certificates, the
245	   name in a certificate gets compared against names in ACLs or other
246	   things.  In the case of web site security, the name in the
247	   certificate gets compared to a portion of the URI that a user may
248	   have typed into a browser.  The fact that many different people are
249	   doing the typing, on many different types of systems, complicates the
250	   problem.

252	   Add to this the certificate enrollment step, and the certificate
253	   issuance step, and two more parties have an opportunity to adjust the
254	   encoding or worse, the software that supports them might make changes
255	   that the parties are unaware are happening.

257	2.1.  Types of Identifiers

259	   In this document we will refer to the following types of identifiers:

261	   o  Absolute: identifiers that can be compared byte-by-byte for
262	      equality.  Two identifiers that have different bytes are defined
263	      to be different.  For example, binary IP addresses are in this
264	      class.
265	   o  Definite: identifiers that have a well-defined comparison
266	      algorithm on which all parties agree.  For example, URI scheme
267	      names are required to be ASCII and are defined to match in a case-
268	      insensitive way; the comparison is thus definite since all parties
269	      agree on how to do a case-insensitive match among ASCII strings.
270	   o  Indefinite: identifiers that have no single comparison algorithm
271	      on which all parties agree.  For example, human names are in this
272	      class.  Everyone might want the comparison to be tailored for
273	      their locale, for some definition of locale.  In some cases, there
274	      may be limited subsets of parties that might be able to agree
275	      (e.g., ASCII users might all agree on a common comparison
276	      algorithm whereas users of other Latin scripts, such as Turkish,
277	      may not), but identifiers often tend to leak out of such limited
278	      environments.

280	2.2.  False Positives and Negatives

282	   It is first worth discussing in more detail the effects of errors in
283	   the comparison algorithm.  A "false positive" results when two
284	   identifiers compare as if they were equal, but in reality refer to
285	   two different objects (e.g., security principals or resources).  When
286	   privilege is granted on a match, a false positive thus results in an
287	   elevation of privilege, for example allowing execution of an
288	   operation that should not have been permitted otherwise.  When
289	   privilege is denied on a match (e.g., matching an entry in a block/
290	   deny list or a revocation list), a permissible operation is denied.
291	   At best, this can cause worse performance (e.g., a cache miss, or
292	   forcing redundant authentication), and at worst can result in a
293	   denial of service.

295	   A "false negative" results when two identifiers that in reality refer
296	   to the same thing compare as if they were different, and the effects
297	   are the reverse of those for false positives.  That is, when
298	   privilege is granted on a match, the result is at best worse
299	   performance and at worst a denial of service; when privilege is
300	   denied on a match, elevation of privilege results.

302	   Figure 3 summarizes these effects.

304	                  | "Grant on match"       | "Deny on match"
305	   ---------------+------------------------+-----------------------
306	   False positive | Elevation of privilege | Denial of service
307	   ---------------+------------------------+-----------------------
308	   False negative | Denial of service      | Elevation of privilege
309	   ---------------+------------------------+-----------------------

311	                Worst Effects of False Positives/Negatives

313	                                 Figure 3

315	   Elevation of privilege is almost always seen as far worse than denial
316	   of service.  Hence, for URIs for example, Section 6.1 of [RFC3986]
317	   states: "comparison methods are designed to minimize false negatives
318	   while strictly avoiding false positives".

320	   Thus URIs were defined with a "grant privilege on match" paradigm in
321	   mind, where it is critical to prevent elevation of privilege while
322	   minimizing denial of service.  Using URIs in a "deny privilege on
323	   match" system can thus be problematic.

325	2.3.  Hypothetical Example

327	   In this example, both security principals and resources are
328	   identified using URIs.  Foo Corp has paid example.com for access to
329	   the Stuff service.  Foo Corp allows its employees to create accounts
330	   on the Stuff service.  Alice gets the account
331	   "http://example.com/Stuff/FooCorp/alice" and Bob gets
332	   "http://example.com/Stuff/FooCorp/bob".  It turns out, however, that
333	   Foo Corp's URI canonicalizer includes URI fragment components in
334	   comparisons whereas example.com's does not, and Foo Corp does not
335	   disallow the # character in the account name.  So Chuck, who is a
336	   malicious employee of Foo Corp, asks to create an account at
337	   example.com with the name alice#stuff.  Foo Corp's URI logic checks
338	   its records for accounts it has created with stuff and sees that
339	   there is no account with the name alice#stuff.  Hence, in its
340	   records, it associates the account alice#stuff with Chuck and will
341	   only issue tokens good for use with
342	   "http://example.com/Stuff/FooCorp/alice#stuff" to Chuck.

344	   Chuck, the attacker, goes to a security token service at Foo Corp and
345	   asks for a security token good for
346	   "http://example.com/Stuff/FooCorp/alice#stuff".  Foo Corp issues the
347	   token since Chuck is the legitimate owner (in Foo Corp's view) of the
348	   alice#stuff account.  Chuck then submits the security token in a
349	   request to "http://example.com/Stuff/FooCorp/alice".

351	   But example.com uses a URI canonicalizer that, for the purposes of
352	   checking equality, ignores fragments.  So when example.com looks in
353	   the security token to see if the requester has permission from Foo
354	   Corp to access the given account it successfully matches the URI in
355	   the security token, "http://example.com/Stuff/FooCorp/alice#stuff",
356	   with the requested resource name
357	   "http://example.com/Stuff/FooCorp/alice".

359	   Leveraging the inconsistencies in the canonicalizers used by Foo Corp
360	   and example.com, Chuck is able to successfully launch an elevation of
361	   privilege attack and access Alice's resource.

363	   Furthermore, consider an attacker using a similar corporation such as
364	   "foocorp" (or any variation containing a non-ASCII character that
365	   some humans might expect to represent the same corporation).  If the
366	   resource holder treats them as different, but the security token
367	   service treats them as the same, then again elevation of privilege
368	   can occur.

370	3.  Common Identifiers

372	   In this section, we walk through a number of common types of
373	   identifiers and discuss various issues related to comparison that may
374	   affect security whenever they are used to identify security
375	   principals or resources.  These examples illustrate common patterns
376	   that may arise with other types of identifiers.

378	3.1.  Hostnames

380	   Hostnames (composed of dot-separated labels) are commonly used either
381	   directly as identifiers, or as components in identifiers such as in
382	   URIs and email addresses.  Another example is in [RFC5280], sections
383	   7.2 and 7.3 (and updated in section 3 of
384	   [I-D.ietf-pkix-rfc5280-clarifications]), which specify use in X.509
385	   Public Key Infrastructure certificates.

387	   In this section we discuss a number of issues in comparing strings
388	   that appear to be some form of hostname.

390	   It is first worth pointing out that the term "hostname" itself is
391	   often ambiguous, and hence it is important that any use clarify which
392	   definition is intended.  Some examples of definitions include:
393	   a.  A Fully-Qualified Domain Name (FQDN),
394	   b.  An FQDN that is associated with address records in the DNS,
395	   c.  The leftmost label in an FQDN, or
396	   d.  The leftmost label in an FQDN that is associated with address
397	       records.

399	   The use of different definitions in different places results in
400	   questions such as whether "example" and "example.com" are considered
401	   equal or not.

403	   Section 3 of [RFC6055] discusses the differences between a "hostname"
404	   vs. a "DNS name", where the former is a subset of the latter by using
405	   a restricted set of characters.  If one canonicalizer uses the "DNS
406	   name" definition whereas another uses a "hostname" definition, a name
407	   might be valid in the former but invalid in the latter.  As long as
408	   invalid identifiers are denied privilege, this difference will not
409	   result in elevation of privilege.

411	   [IAB1123] briefly discusses issues with the ambiguity around whether
412	   a label will be "alphabetic", including among other issues, how
413	   "alphabetic" should be interpreted in an internationalized
414	   environment, and whether a hostname can be interpreted as an IP
415	   address.  We explore this last issue in more detail below.

417	3.1.1.  IPv4 Literals

419	   [RFC1123] section 2.1 states:

421	      Whenever a user inputs the identity of an Internet host, it SHOULD
422	      be possible to enter either (1) a host domain name or (2) an IP
423	      address in dotted-decimal ("#.#.#.#") form.  The host SHOULD check
424	      the string syntactically for a dotted-decimal number before
425	      looking it up in the Domain Name System.

427	   and

429	      This last requirement is not intended to specify the complete
430	      syntactic form for entering a dotted-decimal host number; that is
431	      considered to be a user-interface issue.

433	   In specifying the inet_addr() API, the POSIX standard [IEEE-1003.1]
434	   defines "IPv4 dotted decimal notation" as allowing not only strings
435	   of the form "10.0.1.2", but also allows octal and hexadecimal, and
436	   addresses with less than four parts.  For example, "10.0.258",
437	   "0xA000001", and "012.0x102" all represent the same IPv4 address in
438	   standard "IPv4 dotted decimal" notation.  We will refer to this as
439	   the "loose" syntax of an IPv4 address literal.

441	   In section 6.1 of [RFC3493] getaddrinfo() is defined to support the
442	   same (loose) syntax as inet_addr():

444	      If the specified address family is AF_INET or AF_UNSPEC, address
445	      strings using Internet standard dot notation as specified in
446	      inet_addr() are valid.

448	   In contrast, section 6.3 of the same RFC states, specifying
449	   inet_pton():

451	      If the af argument of inet_pton() is AF_INET, the src string shall
452	      be in the standard IPv4 dotted-decimal form: ddd.ddd.ddd.ddd where
453	      "ddd" is a one to three digit decimal number between 0 and 255.
454	      The inet_pton() function does not accept other formats (such as
455	      the octal numbers, hexadecimal numbers, and fewer than four
456	      numbers that inet_addr() accepts).

458	   As shown above, inet_pton() uses what we will refer to as the
459	   "strict" form of an IPv4 address literal.  Some platforms also use
460	   the strict form with getaddrinfo() when the AI_NUMERICHOST flag is
461	   passed to it.

463	   Both the strict and loose forms are standard forms, and hence a
464	   protocol specification is still ambiguous if it simply defines a
465	   string to be in the "standard IPv4 dotted decimal form".  And, as a
466	   result of these differences, names such as "10.11.12" are ambiguous
467	   as to whether they are an IP address or a hostname, and even
468	   "10.11.12.13" can be ambiguous because of the "SHOULD" in RFC 1123
469	   above making it optional whether to treat it as an address or a name.

471	   Protocols and data formats that can use addresses in string form for
472	   security purposes need to resolve these ambiguities.  For example,
473	   for the host component of URIs, section 3.2.2 of [RFC3986] resolves
474	   the first ambiguity by only allowing the strict form, and the second
475	   ambiguity by specifying that it is considered an IPv4 address
476	   literal.  New protocols and data formats should similarly consider
477	   using the strict form rather than the loose form in order to better
478	   match user expectations.

480	   A string might be valid under the "loose" definition, but invalid
481	   under the "strict" definition.  As long as invalid identifiers are
482	   denied privilege, this difference will not result in elevation of
483	   privilege.  Some protocols, however, use strings that can be either
484	   an IP address literal or a hostname.  Such strings are at best
485	   Definite identifiers, and often turn out to be Indefinite
486	   identifiers.  (See Section 4.1 for more discussion.)

488	   Furthermore, when strings can contain non-ASCII characters, they can
489	   contain other characters that may look like dots or digits to a human
490	   viewing and/or entering the identifier, especially to one who might
491	   expect digits to appear in his or her native script.

493	3.1.2.  IPv6 Literals

495	   IPv6 addresses similarly have a wide variety of alternate but
496	   semantically identical string representations, as defined in section
497	   2.2 of [RFC4291] and section 2 of [I-D.ietf-6man-uri-zoneid].  As
498	   discussed in section 3.2.5 of [RFC5952], this fact causes problems in
499	   security contexts if comparison (such as in X.509 certificates), is
500	   done between strings rather than between the binary representations
501	   of addresses.

503	   [RFC5952] recently specified a recommended canonical string format as
504	   an attempt to solve this problem, but it may not be ubiquitously
505	   supported at present.  And, when strings can contain non-ASCII
506	   characters, the same issues (and more, since hexadecimal and colons
507	   are allowed) arise as with IPv4 literals.

509	   Whereas (binary) IPv6 addresses are Absolute identifiers, IPv6
510	   address literals are Definite identifiers, since string-to-address
511	   conversion for IPv6 address literals is unambiguous.

513	3.1.3.  Internationalization

515	   The IETF policy on character sets and languages [RFC2277] requires
516	   support for UTF-8 in protocols, and as a result many protocols now do
517	   support non-ASCII characters.  When a hostname is sent in a UTF-8
518	   field, there are a number of ways it may be encoded.  For example,
519	   hostname labels might be encoded directly in UTF-8, or might first be
520	   Punycode-encoded [RFC3492] or even percent-encoded from UTF-8.

522	   For example, in URIs, [RFC3986] section 3.2.2 specifically allows for
523	   the use of percent-encoded UTF-8 characters in the hostname, as well
524	   as the use of IDNA encoding [RFC3490] using the Punycode algorithm.

526	   Percent-encoding is unambiguous for hostnames since the percent
527	   character cannot appear in the strict definition of a "hostname",
528	   though it can appear in a DNS name.

530	   Punycode-encoded labels (or "A-labels") on the other hand can be
531	   ambiguous if hosts are actually allowed to be named with a name
532	   starting with "xn--", and false positives can result.  While this may
533	   be extremely unlikely for normal scenarios, it nevertheless provides
534	   a possible vector for an attacker.

536	   A hostname comparator thus needs to decide whether a Punycode-encoded
537	   label should or should not be considered a valid hostname label, and
538	   if so, then whether it should match a label encoded in some other
539	   form such as a percent-encoded Unicode label (U-label).

541	   For example, Section 3 of "Transport Layer Security (TLS) Extensions"
542	   [RFC6066], states:

544	      "HostName" contains the fully qualified DNS hostname of the
545	      server, as understood by the client.  The hostname is represented
546	      as a byte string using ASCII encoding without a trailing dot.
547	      This allows the support of internationalized domain names through
548	      the use of A-labels defined in [RFC5890].  DNS hostnames are case-
549	      insensitive.  The algorithm to compare hostnames is described in
550	      [RFC5890], Section 2.3.2.4.

552	   For some additional discussion of security issues that arise with
553	   internationalization, see [TR36].

555	3.1.4.  Resolution for comparison

557	   Some systems (specifically Java URLs [JAVAURL]) use the rule that if
558	   two hostnames resolve to the same IP address(es) then the hostnames
559	   are considered equal.  That is, the canonicalization algorithm
560	   involves name resolution with an IP address being the canonical form.

562	   For example, if resolution was done via DNS, and DNS contained:

564	   example.com.  IN A 10.0.0.6
565	   example.net.  CNAME example.com.
566	   example.org.  IN A 10.0.0.6

568	   then the algorithm might treat all three names as equal, even though
569	   the third name might refer to a different entity.

571	   With the introduction of dynamic IP addresses, private IP addresses,
572	   multiple IP addresses per name, multiple address families (e.g., IPv4
573	   vs. IPv6), devices that roam to new locations, commonly deployed DNS
574	   tricks that result in the answer depending on factors such as the
575	   requester's location and the load on the server whose address is
576	   returned, etc., this method of comparison cannot be relied upon.
577	   There is no guarantee that two names for the same host will resolve
578	   the name to the same IP addresses, nor that the addresses resolved
579	   refer to the same entity such as when the names resolve to private IP
580	   addresses, nor even that the system has connectivity (and the
581	   willingness to wait for the delay) to resolve names at the time the
582	   answer is needed.  The lifetime of the identifier, and of any cached
583	   state from a previous resolution, also affects security (see
584	   Section 4.4).

586	   In addition, a comparison mechanism that relies on the ability to
587	   resolve identifiers such as hostnames to other identifies such as IP
588	   addresses leaks information about security decisions to outsiders if
589	   these queries are publicly observable.  (See
590	   [I-D.iab-privacy-considerations] for a deeper discussion of
591	   information disclosure.)

593	   Finally, it is worth noting that resolving two identifiers to
594	   determine if they refer to the same entity can be thought of as a use
595	   of such identifiers, as opposed to actually comparing the identifiers
596	   themselves, which is the focus of this document.

598	3.2.  Ports and Service Names

600	   Port numbers and service names are discussed in depth in [RFC6335].
601	   Historically, there were port numbers, service names used in SRV
602	   records, and mnemonic identifiers for assigned port numbers (known as
603	   port "keywords" at [IANA-PORT]).  The latter two are now unified, and
604	   various protocols use one or more of these types in strings.  For
605	   example, the common syntax used by many URI schemes allows port
606	   numbers but not service names.  Some implementations of the
607	   getaddrinfo() API support strings that can be either port numbers or
608	   port keywords (but not service names).

610	   For protocols that use service names that must be resolved, the
611	   issues are the same as those for resolution of addresses in
612	   Section 3.1.4.  In addition, Section 5.1 of [RFC6335] clarifies that
613	   service names/port keywords must contain at least one letter.  This
614	   prevents confusion with port numbers in strings where both are
615	   allowed.

617	3.3.  URIs

619	   This section looks at issues related to using URIs for security
620	   purposes.  For example, [RFC5280], section 7.4, specifies comparison
621	   of URIs in certificates.  Examples of URIs in security token-based
622	   access control systems include WS-*, SAML-P and OAuth WRAP.  In such
623	   systems, a variety of participants in the security infrastructure are
624	   identified by URIs.  For example, requesters of security tokens are
625	   sometimes identified with URIs.  The issuers of security tokens and
626	   the relying parties who are intended to consume security tokens are
627	   frequently identified by URIs.  Claims in security tokens often have
628	   their types defined using URIs and the values of the claims can also
629	   be URIs.

631	   Also, when a URI is embedded in plain text (e.g., an email message),
632	   there is an additional concern because there is no termination
633	   criterion for a URI.  For example, consider
634	   http://unicode.org/cldr/utility/list-unicodeset.jsp?a=a&amp;g=gc.
635	   Some applications that detect URIs will stop before the first '.' in
636	   the path, while others go to last '.', and yet others may stop at the
637	   ';'.  As another point of comparison, Section 2.37 of [EE] (a
638	   standard for history citations) specifies the use of a space after a
639	   URI and before the punctuation.

641	   URIs are defined with multiple components, each of which has its own
642	   rules.  We cover each in turn below.  However, it is also important
643	   to note that there exist multiple comparison algorithms.  [RFC3986]
644	   section 6.2 states:

646	      A variety of methods are used in practice to test URI equivalence.
647	      These methods fall into a range, distinguished by the amount of
648	      processing required and the degree to which the probability of
649	      false negatives is reduced.  As noted above, false negatives
650	      cannot be eliminated.  In practice, their probability can be
651	      reduced, but this reduction requires more processing and is not
652	      cost-effective for all applications.
653	      If this range of comparison practices is considered as a ladder,
654	      the following discussion will climb the ladder, starting with
655	      practices that are cheap but have a relatively higher chance of
656	      producing false negatives, and proceeding to those that have
657	      higher computational cost and lower risk of false negatives.

659	   The ladder approach has both pros and cons.  On the pro side, it
660	   allows some uses to optimize for security, and other uses to optimize
661	   for cost, thus allowing URIs to be applicable to a wide range of
662	   uses.  A disadvantage is that when different approaches are taken by
663	   different components in the same system using the same identifiers,
664	   the inconsistencies can result in security issues.

666	3.3.1.  Scheme component

668	   [RFC3986] defines URI schemes as being case-insensitive ASCII and in
669	   section 6.2.2.1 specifies that scheme names should be normalized to
670	   lower-case characters.

672	   New schemes can be defined over time.  In general two URIs with an
673	   unrecognized scheme cannot be safely compared, however.  This is
674	   because the canonicalization and comparison rules for the other
675	   components may vary by scheme.  For example, a new URI scheme might
676	   have a default port of X, and without that knowledge, a comparison
677	   algorithm cannot know whether "example.com" and "example.com:X"
678	   should be considered to match in the authority component.  Hence for
679	   security purposes, it is safest for unrecognized schemes to be
680	   treated as invalid identifiers.  However, if the URIs are only used
681	   with a "grant access on match" paradigm then unrecognized schemes can
682	   be supported by doing a generic case-sensitive comparison, at the
683	   expense of some false negatives.

685	3.3.2.  Authority component

687	   The authority component is scheme-specific, but many schemes follow a
688	   common syntax that allows for userinfo, host, and port.

690	3.3.2.1.  Host

692	   Section 3.1 discussed issues with hostnames in general.  In addition,
693	   [RFC3986] section 3.2.2 allows future changes using the IPvFuture
694	   production.  As with IPv4 and IPv6 literals, IPvFuture formats may
695	   have issues with multiple semantically identical string
696	   representations, and may also be semantically identical to an IPv4 or
697	   IPv6 address.  As such, false negatives may be common if IPvFuture is
698	   used.

700	3.3.2.2.  Port

702	   See discussion in Section 3.2.

704	3.3.2.3.  Userinfo

706	   [RFC3986] defines the userinfo production that allows arbitrary data
707	   about the user of the URI to be placed before '@' signs in URIs.  For
708	   example: "http://alice:bob:chuck@example.com/bar" has the value
709	   "alice:bob:chuck" as its userinfo.  When comparing URIs in a security
710	   context, one must decide whether to treat the userinfo as being
711	   significant or not.  Some URI comparison services for example treat
712	   "http://alice:ick@example.com" and "http://example.com" as being
713	   equal.

715	   When the userinfo is treated as being significant, it has additional
716	   considerations (e.g., whether it is case-sensitive or not) which we
717	   cover in Section 3.4.

719	3.3.3.  Path component

721	   [RFC3986] supports the use of path segment values such as "./" or
722	   "../" for relative URIs.  Strictly speaking, including such path
723	   segment values in a fully qualified URI is syntactically illegal but
724	   [RFC3986] section 4.1 nevertheless defines an algorithm to remove
725	   them.

727	   Unless a scheme states otherwise, the path component is defined to be
728	   case-sensitive.  However, if the resource is stored and accessed
729	   using a filesystem using case-insensitive paths, there will be many
730	   paths that refer to the same resource.  As such, false negatives can
731	   be common in this case.

733	3.3.4.  Query component

735	   There is the question as to whether "http://example.com/foo",
736	   "http://example.com/foo?", and "http://example.com/foo?bar" are each
737	   considered equal or different.

739	   Similarly, it is unspecified whether the order of values matters.
740	   For example, should "http://example.com/blah?ick=bick&foo=bar" be
741	   considered equal to "http://example.com/blah?foo=bar&ick=bick"?  And
742	   if a domain name is permitted to appear in a query component (e.g.,
743	   in a reference to another URI), the same issues in Section 3.1 apply.

745	3.3.5.  Fragment component

747	   Some URI formats include fragment identifiers.  These are typically
748	   handles to locations within a resource and are used for local
749	   reference.  A classic example is the use of fragments in HTTP URIs
750	   where a URI of the form "http://example.com/blah.html#ick" means
751	   retrieve the resource "http://example.com/blah.html" and, once it has
752	   arrived locally, find the HTML anchor named ick and display that.

754	   So, for example, when a user clicks on the link
755	   "http://example.com/blah.html#baz" a browser will check its cache by
756	   doing a URI comparison for "http://example.com/blah.html" and, if the
757	   resource is present in the cache, a match is declared.

759	   Hence comparisons for security purposes typically ignore the fragment
760	   component and treat all fragments as equal to the full resource.
761	   However, if one were actually trying to compare the piece of a
762	   resource that was identified by the fragment identifier, ignoring it
763	   would result in potential false positives.

765	3.3.6.  Resolution for comparison

767	   As with Section 3.1.4 for hostnames, it may be tempting to define a
768	   URI comparison algorithm based on whether they resolve to the same
769	   content.  Similar problems exist, however, including content that
770	   dynamically changes over time or based on factors such as the
771	   requester's location, potential lack of external connectivity at the
772	   time/place comparison is done, potentially undesirable delay
773	   introduced, etc.

775	   In addition, as noted in Section 3.1.4, resolution leaks information
776	   about security decisions to outsiders if the queries are publicly
777	   observable.

779	3.4.  Email Address-like Identifiers

781	   Section 3.4.1 of [RFC5322] defines the syntax of an email address-
782	   like identifier, and Section 3.2 of [RFC6532] updates it to support
783	   internationalization.  [RFC5280], section 7.5, further discusses the
784	   use of internationalized email addresses in certificates.

786	   [RFC6532] use in certificates points to [RFC6530], where Section 13
787	   of that document contains a discussion of many issues resulting from
788	   internationalization.

790	   Email address-like identifiers have a local part and a domain part.
791	   The issues with the domain part are essentially the same as with
792	   hostnames, covered earlier.

794	   The local part is left for each domain to define.  People quite
795	   commonly use email addresses as usernames with web sites such as
796	   banks or shopping sites, but the site doesn't know whether
797	   foo@example.com is the same person as FOO@example.com.  Thus email
798	   address-like identifiers are typically Indefinite identifiers.

800	   To avoid false positives, some security mechanisms (such as
801	   [RFC5280]) compare the local part using an exact match.  Hence, like
802	   URIs, email address-like identifiers are designed for use in grant-
803	   on-match security schemes, not in deny-on-match schemes.

805	   Furthermore, if a mailbox is stored and accessed using a fileystem
806	   using case-insensitive paths, there may be many paths that refer to
807	   the same mailbox.  As such, false negatives can be common in this
808	   case.

810	4.  General Issues

812	4.1.  Conflation

814	   There are a number of examples (some in the preceding sections) of
815	   strings that conflate two types of identifiers, using some heuristic
816	   to try to determine which type of identifier is given.  Similarly,
817	   two ways of encoding the same type of identifier might be conflated
818	   within the same string.

820	   Some examples include:
821	   1.  A string that might be an IPv4 address literal or an IPv6 address
822	       literal
823	   2.  A string that might be an IP address literal or a hostname
824	   3.  A string that might be a port number or a service name
825	   4.  A DNS label that might be literal or be Punycode-encoded

827	   Strings that allow such conflation can only be considered Definite if
828	   there exists a well-defined rule to determine which identifier type
829	   is meant.  One way to do so is to ensure that the valid syntax for
830	   the two is disjoint (e.g., distinguishing IPv4 vs. IPv6 address
831	   literals by the use of colons in the latter).  A second way to do so
832	   is to define a precedence rule that results in some identifiers being
833	   inaccessible via a conflated string (e.g., a host literally named
834	   "xn--de-jg4avhby1noc0d" may be inaccessible due to the "xn--" prefix
835	   denoting the use of Punycode encoding).  In some cases, such
836	   inaccessible space may be reserved so that the actual set of
837	   identifiers in use are unambiguous.  For example, Section 2.5.5.2 of
838	   [RFC4291] defines a range of the IPv6 address space for representing
839	   IPv4 addresses.

841	4.2.  Internationalization

843	   In addition to the issues with hostnames discussed in Section 3.1.3,
844	   there are a number of internationalization issues that apply to many
845	   types of Definite and Indefinite identifiers.

847	   First, there is no DNS mechanism for identifying whether non-
848	   identical strings would be seen by a human as being equivalent.
849	   There are problematic examples even with ASCII (Basic Latin) strings
850	   including regional spelling variations such as "color" and "colour"
851	   and many non-English cases including partially-numeric strings in
852	   Arabic script contexts, Chinese strings in Simplified and Traditional
853	   forms, and so on.  Attempts to produce such alternate forms
854	   algorithmically could produce false positives and hence have an
855	   adverse affect on security.

857	   Second, some strings are visually confusable with others, and hence
858	   if a security decision is made by a user based on visual inspection,
859	   many opportunities for false positives exist.  As such, using visual
860	   inspection for security is unreliable.  In addition to the security
861	   issues, visual confusability also adversely affects the usability of
862	   identifiers distributed via visual mediums.  Similar issues can arise
863	   with audible confusability when using audio (e.g., for radio
864	   distribution, accessibility to the blind, etc.) in place of a visual
865	   medium.

867	   Determining whether a string is a valid identifier should typically
868	   be done after, or as part of, canonicalization.  Otherwise an
869	   attacker might use the canonicalization algorithm to inject (e.g.,
870	   via percent encoding, NFKC, or non-shortest-form UTF-8) delimiters
871	   such as '@' in an email address-like identifier, or a '.' in a
872	   hostname.

874	   Any case-insensitive comparisons need to define how comparison is
875	   done, since such comparisons may vary by locale of the endpoint.  As
876	   such, using case-insensitive comparisons in general often result in
877	   identifiers being either Indefinite or, if the legal character set is
878	   restricted (e.g., to ASCII), then Definite.

880	   See also [WEBER] for a more visual discussion of many of these
881	   issues.

883	   Finally, the set of permitted characters and the canonical form of
884	   the characters (and hence the canonicalization algorithm) sometimes
885	   varies by protocol today, even when the intent is to use the same
886	   identifier, such as when one protocol passes identifiers to the
887	   other.  See [I-D.ietf-precis-problem-statement] for further
888	   discussion.

890	4.3.  Scope

892	   Another issue arises when an identifier (e.g., "localhost",
893	   "10.11.12.13", etc.) is not globally unique.  [RFC3986] Section 1.1
894	   states:

896	      URIs have a global scope and are interpreted consistently
897	      regardless of context, though the result of that interpretation
898	      may be in relation to the end-user's context.  For example,
899	      "http://localhost/" has the same interpretation for every user of
900	      that reference, even though the network interface corresponding to
901	      "localhost" may be different for each end-user: interpretation is
902	      independent of access.

904	   Whenever a non-globally-unique identifier is passed to another entity
905	   outside of the scope of uniqueness, it will refer to a different
906	   resource, and can result in a false positive.  This problem is often
907	   addressed by using the identifier together with some other unique
908	   identifier of the context.  For example "alice" may uniquely identify
909	   a user within a system, but must be used with "example.com" (as in
910	   "alice@example.com") to uniquely identify the context outside of that
911	   system.

913	   It is also worth noting that non-globally-scoped IPv6 addresses can
914	   be written with, or otherwise associated with, a "zone ID" to
915	   identify the context (see [RFC4007] for more information).  However,
916	   zone IDs are only unique within a host, so they typically narrow,
917	   rather than expand, the scope of uniqueness of the resulting
918	   identifier.

920	4.4.  Temporality

922	   Often identifiers are not unique across all time, but have some
923	   lifetime associated with them after which they may be reassigned to
924	   another entity.  For example, bob@example.com might be assigned to an
925	   employee of the Example company, but if he leaves and another Bob is
926	   later hired, the same identifier might be reused.  As another
927	   example, IP address 203.0.113.1 might be assigned to one subscriber,
928	   and then later reassigned to another subscriber.  Security issues can
929	   arise if updates are not made in all entities that store the
930	   identifier (e.g., in an access control list as discussed in
931	   Section 2, or in a resolution cache as discussed in Section 3.1.4).
932	   This issue is similar to the issue of scope discussed in Section 4.3,
933	   except that the scope of uniqueness is temporal rather than
934	   topological.

936	5.  Security Considerations

938	   This entire document is about security considerations.

940	   To minimize elevation of privilege issues, any system that requires
941	   the ability to use both deny and allow operations within the same
942	   identifier space should avoid the use of Indefinite identifiers in
943	   security comparisons.

945	   To minimize future security risks, any new identifiers being designed
946	   should specify an Absolute or Definite comparison algorithm, and if
947	   extensibility is allowed (e.g., as new schemes in URIs allow) then
948	   the comparison algorithm should remain invariant so that unrecognized
949	   extensions can be compared.  That is, security risks can be reduced
950	   by specifying the comparison algorithm, making sure to resolve any
951	   ambiguities pointed out in this document (e.g., "standard dotted
952	   decimal").

954	   Some issues (such as unrecognized extensions) can be mitigated by
955	   treating such identifiers as invalid.  Validity checking of
956	   identifiers is further discussed in [RFC3696].

958	   Perhaps the hardest issues arise when multiple protocols are used
959	   together, such as in the figure in Section 2, where the two protocols
960	   are defined or implemented using different comparison algorithms.
961	   When constructing an architecture that uses multiple such protocols,
962	   designers should pay attention to any differences in comparison
963	   algorithms among the protocols, in order to fully understand the
964	   security risks.  An area for future work is how to deal with such
965	   security risks in current systems.

967	6.  Acknowledgements

969	   Yaron Goland contributed to the discussion on URIs.  Patrik Faltstrom
970	   contributed to the background on identifiers.  John Klensin
971	   contributed text in a number of different sections.  Additional
972	   helpful feedback and suggestions came from Bernard Aboba, Leslie
973	   Daigle, Mark Davis, Russ Housley, Christian Huitema, Magnus Nystrom,
974	   and Chris Weber.

976	7.  IANA Considerations

978	   This document requires no actions by the IANA.

980	8.  Informative References

982	   [EE]       Mills, E., "Evidence Explained: Citing History Sources
983	              from Artifacts to Cyberspace", 2007.

985	   [I-D.iab-privacy-considerations]
986	              Cooper, A., Tschofenig, H., Aboba, B., Peterson, J.,
987	              Morris, J., Hansen, M., and R. Smith, "Privacy
988	              Considerations for Internet Protocols",
989	              draft-iab-privacy-considerations-03 (work in progress),
990	              July 2012.

992	   [I-D.ietf-6man-uri-zoneid]
993	              Carpenter, B., Cheshire, S., and R. Hinden, "Representing
994	              IPv6 Zone Identifiers in Address Literals and Uniform
995	              Resource Identifiers", draft-ietf-6man-uri-zoneid-06 (work
996	              in progress), December 2012.

998	   [I-D.ietf-pkix-rfc5280-clarifications]
999	              Yee, P., "Updates to the Internet X.509 Public Key
1000	              Infrastructure Certificate and Certificate Revocation List
1001	              (CRL) Profile", draft-ietf-pkix-rfc5280-clarifications-11
1002	              (work in progress), November 2012.

1004	   [I-D.ietf-precis-problem-statement]
1005	              Blanchet, M. and A. Sullivan, "Stringprep Revision and
1006	              PRECIS Problem Statement",
1007	              draft-ietf-precis-problem-statement-08 (work in progress),
1008	              September 2012.

1010	   [IAB1123]  IAB, "The interpretation of rules in the ICANN gTLD
1011	              Applicant Guidebook", February 2012, <http://www.iab.org/
1012	              documents/correspondence-reports-documents/2012-2/
1013	              iab-statement-the-interpretation-of-rules-in-the-icann-
1014	              gtld-applicant-guidebook>.

1016	   [IANA-PORT]
1017	              IANA, "PORT NUMBERS", June 2011,
1018	              <http://www.iana.org/assignments/port-numbers>.

1020	   [IEEE-1003.1]
1021	              IEEE and The Open Group, "The Open Group Base
1022	              Specifications, Issue 6 IEEE Std 1003.1, 2004 Edition",
1023	              IEEE Std 1003.1, 2004.

1025	   [JAVAURL]  Oracle, "Class URL, Java(TM) Platform, Standard Ed. 7",
1026	              2011, <http://docs.oracle.com/javase/7/docs/api/java/net/
1027	              URL.html>.

1029	   [RFC1123]  Braden, R., "Requirements for Internet Hosts - Application
1030	              and Support", STD 3, RFC 1123, October 1989.

1032	   [RFC2277]  Alvestrand, H., "IETF Policy on Character Sets and
1033	              Languages", BCP 18, RFC 2277, January 1998.

1035	   [RFC3490]  Faltstrom, P., Hoffman, P., and A. Costello,
1036	              "Internationalizing Domain Names in Applications (IDNA)",
1037	              RFC 3490, March 2003.

1039	   [RFC3492]  Costello, A., "Punycode: A Bootstring encoding of Unicode
1040	              for Internationalized Domain Names in Applications
1041	              (IDNA)", RFC 3492, March 2003.

1043	   [RFC3493]  Gilligan, R., Thomson, S., Bound, J., McCann, J., and W.
1044	              Stevens, "Basic Socket Interface Extensions for IPv6",
1045	              RFC 3493, February 2003.

1047	   [RFC3696]  Klensin, J., "Application Techniques for Checking and
1048	              Transformation of Names", RFC 3696, February 2004.

1050	   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
1051	              Resource Identifier (URI): Generic Syntax", STD 66,
1052	              RFC 3986, January 2005.

1054	   [RFC4007]  Deering, S., Haberman, B., Jinmei, T., Nordmark, E., and
1055	              B. Zill, "IPv6 Scoped Address Architecture", RFC 4007,
1056	              March 2005.

1058	   [RFC4291]  Hinden, R. and S. Deering, "IP Version 6 Addressing
1059	              Architecture", RFC 4291, February 2006.

1061	   [RFC5280]  Cooper, D., Santesson, S., Farrell, S., Boeyen, S.,
1062	              Housley, R., and W. Polk, "Internet X.509 Public Key
1063	              Infrastructure Certificate and Certificate Revocation List
1064	              (CRL) Profile", RFC 5280, May 2008.

1066	   [RFC5322]  Resnick, P., Ed., "Internet Message Format", RFC 5322,
1067	              October 2008.

1069	   [RFC5952]  Kawamura, S. and M. Kawashima, "A Recommendation for IPv6
1070	              Address Text Representation", RFC 5952, August 2010.

1072	   [RFC6055]  Thaler, D., Klensin, J., and S. Cheshire, "IAB Thoughts on
1073	              Encodings for Internationalized Domain Names", RFC 6055,
1074	              February 2011.

1076	   [RFC6066]  Eastlake, D., "Transport Layer Security (TLS) Extensions:
1077	              Extension Definitions", RFC 6066, January 2011.

1079	   [RFC6335]  Cotton, M., Eggert, L., Touch, J., Westerlund, M., and S.
1080	              Cheshire, "Internet Assigned Numbers Authority (IANA)
1081	              Procedures for the Management of the Service Name and
1082	              Transport Protocol Port Number Registry", BCP 165,
1083	              RFC 6335, August 2011.

1085	   [RFC6530]  Klensin, J. and Y. Ko, "Overview and Framework for
1086	              Internationalized Email", RFC 6530, February 2012.

1088	   [RFC6532]  Yang, A., Steele, S., and N. Freed, "Internationalized
1089	              Email Headers", RFC 6532, February 2012.

1091	   [TR36]     Unicode Consortium, "Unicode Security Considerations",
1092	              Unicode Technical Report 36, August 2004.

1094	   [WEBER]    Weber, C., "Attacking Software Globalization", March 2010,
1095	              <http://www.lookout.net/files/
1096	              Chris_Weber_Character%20Transformations%20v1.7_IUC33.pdf>.

1098	Author's Address

1100	   Dave Thaler (editor)
1101	   Microsoft Corporation
1102	   One Microsoft Way
1103	   Redmond, WA  98052
1104	   USA

1106	   Phone: +1 425 703 8835
1107	   Email: dthaler@microsoft.com