idnits 2.17.1 

draft-iab-identifier-comparison-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the
     document.

  == There are 2 instances of lines with private range IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x,
     198.51.100.x or 203.0.113.x.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 417: '...dentity of an Internet host, it SHOULD...'
     RFC 2119 keyword, line 419: '...#.#.#.#") form.  The host SHOULD check...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (October 20, 2012) is 4205 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'RFC5890' is mentioned on line 546, but not defined

  == Outdated reference: A later version (-06) exists of
     draft-ietf-6man-uri-zoneid-04

  == Outdated reference: A later version (-11) exists of
     draft-ietf-pkix-rfc5280-clarifications-10

  == Outdated reference: A later version (-09) exists of
     draft-ietf-precis-problem-statement-08

  -- Obsolete informational reference (is this intentional?): RFC 3490
     (Obsoleted by RFC 5890, RFC 5891)


     Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                     D. Thaler, Ed.
3	Internet-Draft                                                 Microsoft
4	Intended status: Informational                          October 20, 2012
5	Expires: April 23, 2013

7	         Issues in Identifier Comparison for Security Purposes
8	                 draft-iab-identifier-comparison-05.txt

10	Abstract

12	   Identifiers such as hostnames, URIs, and email addresses are often
13	   used in security contexts to identify security principals and
14	   resources.  In such contexts, an identifier supplied via some
15	   protocol is often compared against some policy to make security
16	   decisions such as whether the principal may access the resource, what
17	   level of authentication or encryption is required, etc.  If the
18	   parties involved in a security decision use different algorithms to
19	   compare identifiers, then failure scenarios ranging from denial of
20	   service to elevation of privilege can result.

22	Status of this Memo

24	   This Internet-Draft is submitted in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF).  Note that other groups may also distribute
29	   working documents as Internet-Drafts.  The list of current Internet-
30	   Drafts is at http://datatracker.ietf.org/drafts/current/.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   This Internet-Draft will expire on April 23, 2013.

39	Copyright Notice

41	   Copyright (c) 2012 IETF Trust and the persons identified as the
42	   document authors.  All rights reserved.

44	   This document is subject to BCP 78 and the IETF Trust's Legal
45	   Provisions Relating to IETF Documents
46	   (http://trustee.ietf.org/license-info) in effect on the date of
47	   publication of this document.  Please review these documents
48	   carefully, as they describe your rights and restrictions with respect
49	   to this document.  Code Components extracted from this document must
50	   include Simplified BSD License text as described in Section 4.e of
51	   the Trust Legal Provisions and are provided without warranty as
52	   described in the Simplified BSD License.

54	Table of Contents

56	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
57	     1.1.  Canonicalization . . . . . . . . . . . . . . . . . . . . .  4
58	   2.  Security Uses  . . . . . . . . . . . . . . . . . . . . . . . .  5
59	     2.1.  Types of Identifiers . . . . . . . . . . . . . . . . . . .  6
60	     2.2.  False Positives and Negatives  . . . . . . . . . . . . . .  7
61	     2.3.  Hypothetical Example . . . . . . . . . . . . . . . . . . .  8
62	   3.  Common Identifiers . . . . . . . . . . . . . . . . . . . . . .  9
63	     3.1.  Hostnames  . . . . . . . . . . . . . . . . . . . . . . . .  9
64	       3.1.1.  IPv4 Literals  . . . . . . . . . . . . . . . . . . . . 10
65	       3.1.2.  IPv6 Literals  . . . . . . . . . . . . . . . . . . . . 11
66	       3.1.3.  Internationalization . . . . . . . . . . . . . . . . . 12
67	       3.1.4.  Resolution for comparison  . . . . . . . . . . . . . . 12
68	     3.2.  Ports and Service Names  . . . . . . . . . . . . . . . . . 13
69	     3.3.  URIs . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
70	       3.3.1.  Scheme component . . . . . . . . . . . . . . . . . . . 15
71	       3.3.2.  Authority component  . . . . . . . . . . . . . . . . . 15
72	       3.3.3.  Path component . . . . . . . . . . . . . . . . . . . . 16
73	       3.3.4.  Query component  . . . . . . . . . . . . . . . . . . . 16
74	       3.3.5.  Fragment component . . . . . . . . . . . . . . . . . . 16
75	       3.3.6.  Resolution for comparison  . . . . . . . . . . . . . . 17
76	     3.4.  Email Address-like Identifiers . . . . . . . . . . . . . . 17
77	   4.  General Conflation Issues  . . . . . . . . . . . . . . . . . . 18
78	   5.  General Internationalization Issues  . . . . . . . . . . . . . 18
79	   6.  General Scope Issues . . . . . . . . . . . . . . . . . . . . . 19
80	   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 20
81	   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 21
82	   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 21
83	   10. Informative References . . . . . . . . . . . . . . . . . . . . 21
84	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 23

86	1.  Introduction

88	   In computing and the Internet, various types of "identifiers" are
89	   used to identify humans, devices, content, etc.  Before discussing
90	   security issues, we first give some background on some typical
91	   processes involving identifiers.

93	   As depicted in Figure 1, there are multiple processes relevant to our
94	   discussion.
95	   1.  An identifier must first be generated.  If the identifier is
96	       intended to be unique, the generation process includes some
97	       mechanism, such as allocation by a central authority, to help
98	       ensure uniqueness.  However the notion of "unique" involves
99	       determining whether a putative identifier matches any other
100	       already-allocated identifier.  As we will see, for many types of
101	       identifiers, this is not simply an exact binary match.

103	       As a result of generating the identifier, it is often stored in
104	       two locations: with the requester or "holder" of the identifier,
105	       and with some repository of identifiers (e.g., DNS).  For
106	       example, if the identifier was allocated by a central authority,
107	       the repository might be that authority.  If the identifier
108	       identifies a device or content on a device, the repository might
109	       be that device.
110	   2.  The identifier must be distributed, either by the holder of the
111	       identifier or by a repository of identifiers, to others who could
112	       use the identifier.  This distribution might be electronic, but
113	       sometimes it is via other channels such as voice, business card,
114	       billboard, or other form of advertisement.  The identifier itself
115	       might be distributed directly, or it might be used to generate a
116	       portion of another type of identifier that is then distributed.
117	       For example, a URI or email address might include a server name,
118	       and hence distributing the URI or email address also inherently
119	       distributes the server name.
120	   3.  The identifier must be used by some party.  Generally the user
121	       supplies the identifier which is (directly or indirectly) sent to
122	       the repository of identifiers.  For example, using an email
123	       address to send email to the holder of an identifier may result
124	       in the email arriving at the holder's email server which has
125	       access to the mail stores.

127	       The repository of identifiers must then attempt to match the
128	       user-supplied identifier with an identifier in its repository.

130	                            +------------+
131	                            |  Holder of |     1. Generation
132	                            | identifier +<---------+
133	                            +----+-------+          |
134	                                 |                  | Match
135	                                 |                  v/
136	                                 |          +-------+-------+
137	                                 +----------+ Repository of |
138	                                 |          |  identifiers  |
139	                                 |          +-------+-------+
140	                 2. Distribution |                  ^\
141	                                 |                  | Match
142	                                 v                  |
143	                       +---------+-------+          |
144	                       |      User of    |          |
145	                       |    identifier   +----------+
146	                       +-----------------+    3. Use

148	                       Typical Identifier Processes

150	                                 Figure 1

152	   One key aspect is that the identifier values passed in generation,
153	   distribution, and use, may all be different forms.  For example,
154	   generation might be exchanged in printed form, distribution done via
155	   voice, and use done electronically.  As such, the match process can
156	   be complicated.

158	   Furthermore, in many uses, the relationship between holder,
159	   repositories, and users may be more involved.  For example, when a
160	   hierarchy of web caches exist, each cache is itself a repository of a
161	   sort, and the match process is usually intended to be the same as on
162	   the origin server.

164	1.1.  Canonicalization

166	   Perhaps the most common algorithm for comparison involves first
167	   converting each identifier to a canonical form (a process known as
168	   "canonicalization" or "normalization"), and then testing . the
169	   resulting canonical representations for bitwise equality.  In so
170	   doing, it is thus critical that all entities involved agree on the
171	   same canonical form and use the same canonicalization algorithm so
172	   that the overall comparison process is also the same.

174	   Note that in some contexts, such as in internationalization, the
175	   terms "canonicalization" and "normalization" have a precise meaning.
176	   In this document, however, we use these terms synonymously in their
177	   more generic form, to mean conversion to some standard form.

179	   While the most common method of comparison includes canonicalization,
180	   comparison can also be done by defining an equivalence algorithm,
181	   where no single form is canonical.  However in most cases, a
182	   canonical form is useful for other purposes, such as output, and so
183	   in such cases defining a canonical form suffices to define a
184	   comparison method.

186	2.  Security Uses

188	   Identifiers such as hostnames, URIs, and email addresses are used in
189	   security contexts to identify principals and resources as well as
190	   other security parameters such as types and values of claims.  Those
191	   identifiers are then used to make security decisions based on an
192	   identifier supplied via some protocol.  For example:
193	   o  Authentication: a protocol might match a security principal
194	      identifier to look up expected keying material, and then match
195	      keying material.
196	   o  Authorization: a protocol might match a resource name to look up
197	      an access control list (ACL), and then look up the security
198	      principal identifier (or a surrogate for it) in that ACL.
199	   o  Accounting: a system might create an accounting record for a
200	      security principal identifier or resource name, and then might
201	      later need to match a supplied identifier to allow (for example)
202	      law enforcement to follow up based on the records, or add new
203	      filtering rules based on the records in order to stop an attack.

205	   If the parties involved in a security decision use different matching
206	   algorithms for the same identifiers, then failure scenarios ranging
207	   from denial of service to elevation of privilege can result, as we
208	   will see.

210	   This is especially complicated in cases involving multiple parties
211	   and multiple protocols.  For example, there are many scenarios where
212	   some form of "security token service" is used to grant to a requester
213	   permission to access a resource, where the resource is held by a
214	   third party that relies on the security token service (see Figure 2).
215	   The protocol used to request permission (e.g., Kerberos or OAuth) may
216	   be different from the protocol used to access the resource (e.g.,
217	   HTTP).  Opportunities for security problems arise when two protocols
218	   define different comparison algorithms for the same type of
219	   identifier, or when a protocol is ambiguously specified and two
220	   endpoints (e.g., a security token service and a resource holder)
221	   implement different algorithms within the same protocol.

223	        +----------+
224	        | security |
225	        |  token   |
226	        | service  |
227	        +----------+
228	             ^
229	             | 1. supply credentials and
230	             | get token for resource
231	             |                                             +--------+
232	        +----------+  2. supply token and access resource  |resource|
233	        |requester |=------------------------------------->| holder |
234	        +----------+                                       +--------+

236	                         Simple Security Exchange

238	                                 Figure 2

240	   In many cases the situation is more complex.  With certificates, the
241	   name in a certificate gets compared against names in ACLs or other
242	   things.  In the case of web site security, the name in the
243	   certificate gets compared to a portion of the URI that a user may
244	   have typed into a browser.  The fact that many different people are
245	   doing the typing, on many different types of systems, complicates the
246	   problem.

248	   Add to this the certificate enrollment step, and the certificate
249	   issuance step, and two more parties have an opportunity to adjust the
250	   encoding or worse, the software that supports them might make changes
251	   that the parties are unaware are happening.

253	2.1.  Types of Identifiers

255	   In this document we will refer to the following types of identifiers:

257	   o  Absolute: identifiers that can be compared byte-by-byte for
258	      equality.  Two identifiers that have different bytes are defined
259	      to be different.  For example, binary IP addresses are in this
260	      class.
261	   o  Definite: identifiers that have a well-defined comparison
262	      algorithm on which all parties agree.  For example, URI scheme
263	      names are required to be ASCII and are defined to match in a case-
264	      insensitive way; the comparison is thus definite since all parties
265	      agree on how to do a case-insensitive match among ASCII strings.
266	   o  Indefinite: identifiers that have no single comparison algorithm
267	      on which all parties agree.  For example, human names are in this
268	      class.  Everyone might want the comparison to be tailored for
269	      their locale, for some definition of locale.  In some cases, there
270	      may be limited subsets of parties that might be able to agree
271	      (e.g., ASCII users might all agree on a common comparison
272	      algorithm whereas users of other Latin scripts, such as Turkish,
273	      may not), but identifiers often tend to leak out of such limited
274	      environments.

276	2.2.  False Positives and Negatives

278	   It is first worth discussing in more detail the effects of errors in
279	   the comparison algorithm.  A "false positive" results when two
280	   identifiers compare as if they were equal, but in reality refer to
281	   two different objects (e.g., security principals or resources).  When
282	   privilege is granted on a match, a false positive thus results in an
283	   elevation of privilege, for example allowing execution of an
284	   operation that should not have been permitted otherwise.  When
285	   privilege is denied on a match (e.g., matching an entry in a block/
286	   deny list or a revocation list), a permissible operation is denied.
287	   At best, this can cause worse performance (e.g., a cache miss, or
288	   forcing redundant authentication), and at worst can result in a
289	   denial of service.

291	   A "false negative" results when two identifiers that in reality refer
292	   to the same thing compare as if they were different, and the effects
293	   are the reverse of those for false positives.  That is, when
294	   privilege is granted on a match, the result is at best worse
295	   performance and at worst a denial of service; when privilege is
296	   denied on a match, elevation of privilege results.

298	   Figure 3 summarizes these effects.

300	                  | "Grant on match"       | "Deny on match"
301	   ---------------+------------------------+-----------------------
302	   False positive | Elevation of privilege | Denial of service
303	   ---------------+------------------------+-----------------------
304	   False negative | Denial of service      | Elevation of privilege
305	   ---------------+------------------------+-----------------------

307	                    Effect of False Positives/Negatives

309	                                 Figure 3

311	   Elevation of privilege is almost always seen as far worse than denial
312	   of service.  Hence, for URIs for example, Section 6.1 of [RFC3986]
313	   states: "comparison methods are designed to minimize false negatives
314	   while strictly avoiding false positives".

316	   Thus URIs were defined with a "grant privilege on match" paradigm in
317	   mind, where it is critical to prevent elevation of privilege while
318	   minimizing denial of service.  Using URIs in a "deny privilege on
319	   match" system can thus be problematic.

321	2.3.  Hypothetical Example

323	   In this example, both security principals and resources are
324	   identified using URIs.  Foo Corp has paid example.com for access to
325	   the Stuff service.  Foo Corp allows its employees to create accounts
326	   on the Stuff service.  Alice gets the account
327	   "http://example.com/Stuff/FooCorp/alice" and Bob gets
328	   "http://example.com/Stuff/FooCorp/bob".  It turns out, however, that
329	   Foo Corp's URI canonicalizer includes URI fragment components in
330	   comparisons whereas example.com's does not, and Foo Corp does not
331	   disallow the # character in the account name.  So Chuck, who is a
332	   malicious employee of Foo Corp, asks to create an account at
333	   example.com with the name alice#stuff.  Foo Corp's URI logic checks
334	   its records for accounts it has created with stuff and sees that
335	   there is no account with the name alice#stuff.  Hence, in its
336	   records, it associates the account alice#stuff with Chuck and will
337	   only issue tokens good for use with
338	   "http://example.com/Stuff/FooCorp/alice#stuff" to Chuck.

340	   Chuck, the attacker, goes to a security token service at Foo Corp and
341	   asks for a security token good for
342	   "http://example.com/Stuff/FooCorp/alice#stuff".  Foo Corp issues the
343	   token since Chuck is the legitimate owner (in Foo Corp's view) of the
344	   alice#stuff account.  Chuck then submits the security token in a
345	   request to "http://example.com/Stuff/FooCorp/alice".

347	   But example.com uses a URI canonicalizer that, for the purposes of
348	   checking equality, ignores fragments.  So when example.com looks in
349	   the security token to see if the requester has permission from Foo
350	   Corp to access the given account it successfully matches the URI in
351	   the security token, "http://example.com/Stuff/FooCorp/alice#stuff",
352	   with the requested resource name
353	   "http://example.com/Stuff/FooCorp/alice".

355	   Leveraging the inconsistencies in the canonicalizers used by Foo Corp
356	   and example.com, Chuck is able to successfully launch an elevation of
357	   privilege attack and access Alice's resource.

359	   Furthermore, consider an attacker using a similar corporation such as
360	   "foocorp" (or any variation containing a non-ASCII character that
361	   some humans might expect to represent the same corporation).  If the
362	   resource holder treats them as different, but the security token
363	   service treats them as the same, then again elevation of privilege
364	   can occur.

366	3.  Common Identifiers

368	   In this section, we walk through a number of common types of
369	   identifiers and discuss various issues related to comparison that may
370	   affect security whenever they are used to identify security
371	   principals or resources.  These examples illustrate common patterns
372	   that may arise with other types of identifiers.

374	3.1.  Hostnames

376	   Hostnames (composed of dot-separated labels) are commonly used either
377	   directly as identifiers, or as components in identifiers such as in
378	   URIs and email addresses.  Another example is in [RFC5280], sections
379	   7.2 and 7.3 (and updated in section 3 of
380	   [I-D.ietf-pkix-rfc5280-clarifications]), which specify use in
381	   certificates.

383	   In this section we discuss a number of issues in comparing strings
384	   that appear to be some form of hostname.

386	   It is first worth pointing out that the term itself is often
387	   ambiguous, and hence it is important that any use clarify which
388	   definition is intended.  Some examples of definitions include:
389	   a.  A Fully-Qualified Domain Name (FQDN),
390	   b.  An FQDN that is associated with address records,
391	   c.  The leftmost label in an FQDN, or
392	   d.  The leftmost label in an FQDN that is associated with address
393	       records.

395	   The use of different definitions in different places results in
396	   questions such as whether "example" and "example.com" are considered
397	   equal or not.

399	   Section 3 of [RFC6055] discusses the differences between a "hostname"
400	   vs. a "DNS name", where the former is a subset of the latter by using
401	   a restricted set of characters.  If one canonicalizer uses the "DNS
402	   name" definition whereas another uses a "hostname" definition, a name
403	   might be valid in the former but invalid in the latter.  As long as
404	   invalid identifiers are denied privilege, this difference will not
405	   result in elevation of privilege.

407	   [IAB1123] briefly discusses issues with the ambiguity around whether
408	   a label will be "alphabetic", including among other issues, how
409	   "alphabetic" should be interpreted in an internationalized
410	   environment, and whether a hostname can be interpreted as an IP
411	   address.  We explore this last issue in more detail below.

413	3.1.1.  IPv4 Literals

415	   [RFC1123] section 2.1 states:

417	      Whenever a user inputs the identity of an Internet host, it SHOULD
418	      be possible to enter either (1) a host domain name or (2) an IP
419	      address in dotted-decimal ("#.#.#.#") form.  The host SHOULD check
420	      the string syntactically for a dotted-decimal number before
421	      looking it up in the Domain Name System.

423	   and

425	      This last requirement is not intended to specify the complete
426	      syntactic form for entering a dotted-decimal host number; that is
427	      considered to be a user-interface issue.

429	   In specifying the inet_addr() API, the POSIX standard [IEEE-1003.1]
430	   defines "IPv4 dotted decimal notation" as allowing not only strings
431	   of the form "10.0.1.2", but also allows octal and hexadecimal, and
432	   addresses with less than four parts.  For example, "10.0.258",
433	   "0xA000001", and "012.0x102" all represent the same IPv4 address in
434	   standard "IPv4 dotted decimal" notation.  We will refer to this as
435	   the "loose" syntax of an IPv4 address literal.

437	   In section 6.1 of [RFC3493] getaddrinfo() is defined to support the
438	   same (loose) syntax as inet_addr():

440	      If the specified address family is AF_INET or AF_UNSPEC, address
441	      strings using Internet standard dot notation as specified in
442	      inet_addr() are valid.

444	   In contrast, section 6.3 of the same RFC states, specifying
445	   inet_pton():

447	      If the af argument of inet_pton() is AF_INET, the src string shall
448	      be in the standard IPv4 dotted-decimal form: ddd.ddd.ddd.ddd where
449	      "ddd" is a one to three digit decimal number between 0 and 255.
450	      The inet_pton() function does not accept other formats (such as
451	      the octal numbers, hexadecimal numbers, and fewer than four
452	      numbers that inet_addr() accepts).

454	   As shown above, inet_pton() uses what we will refer to as the
455	   "strict" form of an IPv4 address literal.  Some platforms also use
456	   the strict form with getaddrinfo() when the AI_NUMERICHOST flag is
457	   passed to it.

459	   Both the strict and loose forms are standard forms, and hence a
460	   protocol specification is still ambiguous if it simply defines a
461	   string to be in the "standard IPv4 dotted decimal form".  And, as a
462	   result of these differences, names such as "10.11.12" are ambiguous
463	   as to whether they are an IP address or a hostname, and even
464	   "10.11.12.13" can be ambiguous because of the "SHOULD" in RFC 1123
465	   above making it optional whether to treat it as an address or a name.

467	   Protocols and data formats that can use addresses in string form for
468	   security purposes need to resolve these ambiguities.  For example,
469	   for the host component of URIs, section 3.2.2 of [RFC3986] resolves
470	   the first ambiguity by only allowing the strict form, and the second
471	   ambiguity by specifying that it is considered an IPv4 address
472	   literal.  New protocols and data formats should similarly consider
473	   using the strict form rather than the loose form in order to better
474	   match user expectations.

476	   A string might be valid under the "loose" definition, but invalid
477	   under the "strict" definition.  As long as invalid identifiers are
478	   denied privilege, this difference will not result in elevation of
479	   privilege.  Some protocols, however, use strings that can be either
480	   an IP address literal or a hostname.  Such strings are at best
481	   Definite identifiers, and often turn out to be Indefinite
482	   identifiers.  (See Section 4 for more discussion.)

484	   Furthermore, when strings can contain non-ASCII characters, they can
485	   contain other characters that may look like dots or digits to a human
486	   viewing and/or entering the identifier, especially to one who might
487	   expect digits to appear in his or her native script.

489	3.1.2.  IPv6 Literals

491	   IPv6 addresses similarly have a wide variety of alternate but
492	   semantically identical string representations, as defined in section
493	   2.2 of [RFC4291] and section 2 of [I-D.ietf-6man-uri-zoneid].  As
494	   discussed in section 3.2.5 of [RFC5952], this fact causes problems in
495	   security contexts if comparison (such as in X.509 certificates), is
496	   done between strings rather than between the binary representations
497	   of addresses.

499	   [RFC5952] recently specified a recommended canonical string format as
500	   an attempt to solve this problem, but it may not be ubiquitously
501	   supported at present.  And, when strings can contain non-ASCII
502	   characters, the same issues (and more, since hexadecimal and colons
503	   are allowed) arise as with IPv4 literals.

505	   Whereas (binary) IPv6 addresses are Absolute identifiers, IPv6
506	   address literals are Definite identifiers, since string-to-address
507	   conversion for IPv6 address literals is unambiguous.

509	3.1.3.  Internationalization

511	   The IETF policy on character sets and languages [RFC2277] requires
512	   support for UTF-8 in protocols, and as a result many protocols now do
513	   support non-ASCII characters.  When a hostname is sent in a UTF-8
514	   field, there are a number of ways it may be encoded.  For example,
515	   hostname labels might be encoded directly in UTF-8, or might first be
516	   Punycode-encoded [RFC3492] or even percent-encoded from UTF-8.

518	   For example, in URIs, [RFC3986] section 3.2.2 specifically allows for
519	   the use of percent-encoded UTF-8 characters in the hostname, as well
520	   as the use of IDNA encoding [RFC3490] using the Punycode algorithm.

522	   Percent-encoding is unambiguous for hostnames since the percent
523	   character cannot appear in the strict definition of a "hostname",
524	   though it can appear in a DNS name.

526	   Punycode-encoded labels (or "A-labels") on the other hand can be
527	   ambiguous if hosts are actually allowed to be named with a name
528	   starting with "xn--", and false positives can result.  While this may
529	   be extremely unlikely for normal scenarios, it nevertheless provides
530	   a possible vector for an attacker.

532	   A hostname comparator thus needs to decide whether a Punycode-encoded
533	   label should or should not be considered a valid hostname label, and
534	   if so, then whether it should match a label encoded in some other
535	   form such as a percent-encoded Unicode label (U-label).

537	   For example, Section 3 of "Transport Layer Security (TLS) Extensions"
538	   [RFC6066], states:

540	      "HostName" contains the fully qualified DNS hostname of the
541	      server, as understood by the client.  The hostname is represented
542	      as a byte string using ASCII encoding without a trailing dot.
543	      This allows the support of internationalized domain names through
544	      the use of A-labels defined in [RFC5890].  DNS hostnames are case-
545	      insensitive.  The algorithm to compare hostnames is described in
546	      [RFC5890], Section 2.3.2.4.

548	   For some additional discussion of security issues that arise with
549	   internationalization, see [TR36].

551	3.1.4.  Resolution for comparison

553	   Some systems (specifically Java URLs [JAVAURL]) use the rule that if
554	   two hostnames resolve to the same IP address(es) then the hostnames
555	   are considered equal.  That is, the canonicalization algorithm
556	   involves name resolution with an IP address being the canonical form.

558	   For example, if resolution was done via DNS, and DNS contained:

560	   example.com.  IN A 10.0.0.6
561	   example.net.  CNAME example.com.
562	   example.org.  IN A 10.0.0.6

564	   then the algorithm might treat all three names as equal, even though
565	   the third name might refer to a different entity.

567	   With the introduction of dynamic IP addresses, private IP addresses,
568	   multiple IP addresses per name, multiple address families (e.g., IPv4
569	   vs. IPv6), devices that roam to new locations, commonly deployed DNS
570	   tricks that result in the answer depending on factors such as the
571	   requester's location and the load on the server whose address is
572	   returned, etc., this method of comparison cannot be relied upon.
573	   There is no guarantee that two names for the same host will resolve
574	   the name to the same IP addresses, nor that the addresses resolved
575	   refer to the same entity such as when the names resolve to private IP
576	   addresses, nor even that the system has connectivity (and the
577	   willingness to wait for the delay) to resolve names at the time the
578	   answer is needed.

580	   In addition, a comparison mechanism that relies on the ability to
581	   resolve identifiers such as hostnames to other identifies such as IP
582	   addresses leaks information about security decisions to outsiders if
583	   these queries are publicly observable.

585	   Finally, it is worth noting that resolving two identifiers to
586	   determine if they refer to the same entity can be thought of as a use
587	   of such identifiers, as opposed to actually comparing the identifiers
588	   themselves, which is the focus of this document.

590	3.2.  Ports and Service Names

592	   Port numbers and service names are discussed in depth in [RFC6335].
593	   Historically, there were port numbers, service names used in SRV
594	   records, and mnemonic identifiers for assigned port numbers (known as
595	   port "keywords" at [IANA-PORT]).  The latter two are now unified, and
596	   various protocols use one or more of these types in strings.  For
597	   example, the common syntax used by many URI schemes allows port
598	   numbers but not service names.  Some implementations of the
599	   getaddrinfo() API support strings that can be either port numbers or
600	   port keywords (but not service names).

602	   For protocols that use service names that must be resolved, the
603	   issues are the same as those for resolution of addresses in
604	   Section 3.1.4.  In addition, Section 5.1 of [RFC6335] clarifies that
605	   service names/port keywords must contain at least one letter.  This
606	   prevents confusion with port numbers in strings where both are
607	   allowed.

609	3.3.  URIs

611	   This section looks at issues related to using URIs for security
612	   purposes.  For example, [RFC5280], section 7.4, specifies comparison
613	   of URIs in certificates.  Examples of URIs in security token-based
614	   access control systems include WS-*, SAML-P and OAuth WRAP.  In such
615	   systems, a variety of participants in the security infrastructure are
616	   identified by URIs.  For example, requesters of security tokens are
617	   sometimes identified with URIs.  The issuers of security tokens and
618	   the relying parties who are intended to consume security tokens are
619	   frequently identified by URIs.  Claims in security tokens often have
620	   their types defined using URIs and the values of the claims can also
621	   be URIs.

623	   Also, when a URI is embedded in plain text (e.g., an email message),
624	   there is an additional concern because there is no termination
625	   criterion for a URI.  For example, consider
626	   http://unicode.org/cldr/utility/list-unicodeset.jsp?a=a&amp;g=gc.
627	   Some applications that detect URIs will stop before the first '.' in
628	   the path, while others go to last '.', and yet others may stop at the
629	   ';'.  As another point of comparison, Section 2.37 of [EE] (a
630	   standard for history citations) specifies the use of a space after a
631	   URI and before the punctuation.

633	   URIs are defined with multiple components, each of which has its own
634	   rules.  We cover each in turn below.  However, it is also important
635	   to note that there exist multiple comparison algorithms.  [RFC3986]
636	   section 6.2 states:

638	      A variety of methods are used in practice to test URI equivalence.
639	      These methods fall into a range, distinguished by the amount of
640	      processing required and the degree to which the probability of
641	      false negatives is reduced.  As noted above, false negatives
642	      cannot be eliminated.  In practice, their probability can be
643	      reduced, but this reduction requires more processing and is not
644	      cost-effective for all applications.
645	      If this range of comparison practices is considered as a ladder,
646	      the following discussion will climb the ladder, starting with
647	      practices that are cheap but have a relatively higher chance of
648	      producing false negatives, and proceeding to those that have
649	      higher computational cost and lower risk of false negatives.

651	   The ladder approach has both pros and cons.  On the pro side, it
652	   allows some uses to optimize for security, and other uses to optimize
653	   for cost, thus allowing URIs to be applicable to a wide range of
654	   uses.  A disadvantage is that when different approaches are taken by
655	   different components in the same system using the same identifiers,
656	   the inconsistencies can result in security issues.

658	3.3.1.  Scheme component

660	   [RFC3986] defines URI schemes as being case-insensitive ASCII and in
661	   section 6.2.2.1 specifies that scheme names should be normalized to
662	   lower-case characters.

664	   New schemes can be defined over time.  In general two URIs with an
665	   unrecognized scheme cannot be safely compared, however.  This is
666	   because the canonicalization and comparison rules for the other
667	   components may vary by scheme.  For example, a new URI scheme might
668	   have a default port of X, and without that knowledge, a comparison
669	   algorithm cannot know whether "example.com" and "example.com:X"
670	   should be considered to match in the authority component.  Hence for
671	   security purposes, it is safest for unrecognized schemes to be
672	   treated as invalid identifiers.  However, if the URIs are only used
673	   with a "grant access on match" paradigm then unrecognized schemes can
674	   be supported by doing a generic case-sensitive comparison, at the
675	   expense of some false negatives.

677	3.3.2.  Authority component

679	   The authority component is scheme-specific, but many schemes follow a
680	   common syntax that allows for userinfo, host, and port.

682	3.3.2.1.  Host

684	   Section 3.1 discussed issues with hostnames in general.  In addition,
685	   [RFC3986] section 3.2.2 allows future changes using the IPvFuture
686	   production.  As with IPv4 and IPv6 literals, IPvFuture formats may
687	   have issues with multiple semantically identical string
688	   representations, and may also be semantically identical to an IPv4 or
689	   IPv6 address.  As such, false negatives may be common if IPvFuture is
690	   used.

692	3.3.2.2.  Port

694	   See discussion in Section 3.2.

696	3.3.2.3.  Userinfo

698	   [RFC3986] defines the userinfo production that allows arbitrary data
699	   about the user of the URI to be placed before '@' signs in URIs.  For
700	   example: "http://alice:bob:chuck@example.com/bar" has the value
701	   "alice:bob:chuck" as its userinfo.  When comparing URIs in a security
702	   context, one must decide whether to treat the userinfo as being
703	   significant or not.  Some URI comparison services for example treat
704	   "http://alice:ick@example.com" and "http://example.com" as being
705	   equal.

707	   When the userinfo is treated as being significant, it has additional
708	   considerations (e.g., whether it is case-sensitive or not) which we
709	   cover in Section 3.4.

711	3.3.3.  Path component

713	   [RFC3986] supports the use of path segment values such as "./" or
714	   "../" for relative URIs.  Strictly speaking, including such path
715	   segment values in a fully qualified URI is syntactically illegal but
716	   [RFC3986] section 4.1 nevertheless defines an algorithm to remove
717	   them.

719	   Unless a scheme states otherwise, the path component is defined to be
720	   case-sensitive.  However, if the resource is stored and accessed
721	   using a filesystem using case-insensitive paths, there will be many
722	   paths that refer to the same resource.  As such, false negatives can
723	   be common in this case.

725	3.3.4.  Query component

727	   There is the question as to whether "http://example.com/foo",
728	   "http://example.com/foo?", and "http://example.com/foo?bar" are each
729	   considered equal or different.

731	   Similarly, it is unspecified whether the order of values matters.
732	   For example, should "http://example.com/blah?ick=bick&foo=bar" be
733	   considered equal to "http://example.com/blah?foo=bar&ick=bick"?  And
734	   if a domain name is permitted to appear in a query component (e.g.,
735	   in a reference to another URI), the same issues in Section 3.1 apply.

737	3.3.5.  Fragment component

739	   Some URI formats include fragment identifiers.  These are typically
740	   handles to locations within a resource and are used for local
741	   reference.  A classic example is the use of fragments in HTTP URIs
742	   where a URI of the form "http://example.com/blah.html#ick" means
743	   retrieve the resource "http://example.com/blah.html" and, once it has
744	   arrived locally, find the HTML anchor named ick and display that.

746	   So, for example, when a user clicks on the link
747	   "http://example.com/blah.html#baz" a browser will check its cache by
748	   doing a URI comparison for "http://example.com/blah.html" and, if the
749	   resource is present in the cache, a match is declared.

751	   Hence comparisons for security purposes typically ignore the fragment
752	   component and treat all fragments as equal to the full resource.
753	   However, if one were actually trying to compare the piece of a
754	   resource that was identified by the fragment identifier, ignoring it
755	   would result in potential false positives.  For example, there is at
756	   least one well known site today (Twitter) that requires the fragment
757	   component in order to uniquely identify a user profile.

759	3.3.6.  Resolution for comparison

761	   As with Section 3.1.4 for hostnames, it may be tempting to define a
762	   URI comparison algorithm based on whether they resolve to the same
763	   content.  Similar problems exist, however, including content that
764	   dynamically changes over time or based on factors such as the
765	   requester's location, potential lack of external connectivity at the
766	   time/place comparison is done, potentially undesirable delay
767	   introduced, etc.

769	   In addition, as noted in Section 3.1.4, resolution leaks information
770	   about security decisions to outsiders if the queries are publicly
771	   observable.

773	3.4.  Email Address-like Identifiers

775	   Section 3.4.1 of [RFC5322] defines the syntax of an email address-
776	   like identifier, and Section 3.2 of [RFC6532] updates it to support
777	   internationalization.  [RFC5280], section 7.5, further discusses the
778	   use of internationalized email addresses in certificates.

780	   [RFC6532] use in certificates points to [RFC6530], where Section 13
781	   of that document contains a discussion of many issues resulting from
782	   internationalization.

784	   Email address-like identifiers have a local part and a domain part.
785	   The issues with the domain part are essentially the same as with
786	   hostnames, covered earlier.

788	   The local part is left for each domain to define.  People quite
789	   commonly use email addresses as usernames with web sites such as
790	   banks or shopping sites, but the site doesn't know whether
791	   foo@example.com is the same person as FOO@example.com.  Thus email
792	   address-like identifiers are typically Indefinite identifiers.

794	   To avoid false positives, some security mechanisms (such as
795	   [RFC5280]) compare the local part using an exact match.  Hence, like
796	   URIs, email address-like identifiers are designed for use in grant-
797	   on-match security schemes, not in deny-on-match schemes.

799	   Furthermore, if a mailbox is stored and accessed using a fileystem
800	   using case-insensitive paths, there may be many paths that refer to
801	   the same mailbox.  As such, false negatives can be common in this
802	   case.

804	4.  General Conflation Issues

806	   There are a number of examples (some in the preceding sections) of
807	   strings that conflate two types of identifiers, using some heuristic
808	   to try to determine which type of identifier is given.  Similarly,
809	   two ways of encoding the same type of identifier might be conflated
810	   within the same string.

812	   Some examples include:
813	   1.  A string that might be an IPv4 address literal or an IPv6 address
814	       literal
815	   2.  A string that might be an IP address literal or a hostname
816	   3.  A string that might be a port number or a service name
817	   4.  A DNS label that might be literal or be Punycode-encoded

819	   Strings that allow such conflation can only be considered Definite if
820	   there exists a well-defined rule to determine which identifier type
821	   is meant.  One way to do so is to ensure that the valid syntax for
822	   the two is disjoint (e.g., distinguishing IPv4 vs. IPv6 address
823	   literals by the use of colons in the latter).  A second way to do so
824	   is to define a precedence rule that results in some identifiers being
825	   inaccessible via a conflated string (e.g., a host literally named
826	   "xn--de-jg4avhby1noc0d" may be inaccessible due to the "xn--" prefix
827	   denoting the use of Punycode encoding).  In some cases, such
828	   inaccessible space may be reserved so that the actual set of
829	   identifiers in use are unambiguous.  For example, Section 2.5.5.2 of
830	   [RFC4291] defines a range of the IPv6 address space for representing
831	   IPv4 addresses.

833	5.  General Internationalization Issues

835	   In addition to the issues with hostnames discussed in Section 3.1.3,
836	   there are a number of internationalization issues that apply to many
837	   types of Definite and Indefinite identifiers.

839	   First, there is no DNS mechanism for identifying whether non-
840	   identical strings would be seen by a human as being equivalent.
841	   There are problematic examples even with ASCII (Basic Latin) strings
842	   including regional spelling variations such as "color" and "colour"
843	   and many non-English cases including partially-numeric strings in
844	   Arabic script contexts, Chinese strings in Simplified and Traditional
845	   forms, and so on.  Attempts to produce such alternate forms
846	   algorithmically could produce false positives and hence have an
847	   adverse affect on security.

849	   Second, some strings are visually confusable with others, and hence
850	   if a security decision is made by a user based on visual inspection,
851	   many opportunities for false positives exist.  As such, using visual
852	   inspection for security is unreliable.  In addition to the security
853	   issues, visual confusability also adversely affects the usability of
854	   identifiers distributed via visual mediums.  Similar issues can arise
855	   with audible confusability when using audio (e.g., for radio
856	   distribution, accessibility to the blind, etc.) in place of a visual
857	   medium.

859	   Determining whether a string is a valid identifier should typically
860	   be done after, or as part of, canonicalization.  Otherwise an
861	   attacker might use the canonicalization algorithm to inject (e.g.,
862	   via percent encoding, NFKC, or non-shortest-form UTF-8) delimiters
863	   such as '@' in an email address-like identifier, or a '.' in a
864	   hostname.

866	   Any case-insensitive comparisons need to define how comparison is
867	   done, since such comparisons may vary by locale of the endpoint.  As
868	   such, using case-insensitive comparisons in general often result in
869	   identifiers being either Indefinite or, if the legal character set is
870	   restricted (e.g., to ASCII), then Definite.

872	   See also [WEBER] for a more visual discussion of many of these
873	   issues.

875	   Finally, the set of permitted characters and the canonical form of
876	   the characters (and hence the canonicalization algorithm) sometimes
877	   varies by protocol today, even when the intent is to use the same
878	   identifier, such as when one protocol passes identifiers to the
879	   other.  See [I-D.ietf-precis-problem-statement] for further
880	   discussion.

882	6.  General Scope Issues

884	   Another issue arises when an identifier (e.g., "localhost",
885	   "10.11.12.13", etc.) is not globally unique.  [RFC3986] Section 1.1
886	   states:

888	      URIs have a global scope and are interpreted consistently
889	      regardless of context, though the result of that interpretation
890	      may be in relation to the end-user's context.  For example,
891	      "http://localhost/" has the same interpretation for every user of
892	      that reference, even though the network interface corresponding to
893	      "localhost" may be different for each end-user: interpretation is
894	      independent of access.

896	   Whenever a non-globally-unique identifier is passed to another entity
897	   outside of the scope of uniqueness, it will refer to a different
898	   resource, and can result in a false positive.  This problem is often
899	   addressed by using the identifier together with some other unique
900	   identifier of the context.  For example "alice" may uniquely identify
901	   a user within a system, but must be used with "example.com" (as in
902	   "alice@example.com") to uniquely identify the context outside of that
903	   system.

905	   It is also worth noting that non-globally-scoped IPv6 addresses can
906	   be written with, or otherwise associated with, a "zone ID" to
907	   identify the context (see [RFC4007] for more information).  However,
908	   zone IDs are only unique within a host, so they typically narrow,
909	   rather than expand, the scope of uniqueness of the resulting
910	   identifier.

912	7.  Security Considerations

914	   This entire document is about security considerations.

916	   To minimize elevation of privilege issues, any system that requires
917	   the ability to use both deny and allow operations within the same
918	   identifier space, should avoid the use of Indefinite identifiers in
919	   security comparisons.

921	   To minimize future security risks, any new identifiers being designed
922	   should specify an Absolute or Definite comparison algorithm, and if
923	   extensibility is allowed (e.g., as new schemes in URIs allow) then
924	   the comparison algorithm should remain invariant so that unrecognized
925	   extensions can be compared.  That is, security risks can be reduced
926	   by specifying the comparison algorithm, making sure to resolve any
927	   ambiguities pointed out in this document (e.g., "standard dotted
928	   decimal").

930	   Some issues (such as unrecognized extensions) can be mitigated by
931	   treating such identifiers as invalid.  Validity checking of
932	   identifiers is further discussed in [RFC3696].

934	   Perhaps the hardest issues arise when multiple protocols are used
935	   together, such as in the figure in Section 2, where the two protocols
936	   are defined or implemented using different comparison algorithms.
937	   When constructing an architecture that uses multiple such protocols,
938	   designers should pay attention to any differences in comparison
939	   algorithms among the protocols, in order to fully understand the
940	   security risks.  An area for future work is how to deal with such
941	   security risks in current systems.

943	8.  Acknowledgements

945	   Yaron Goland contributed to the discussion on URIs.  Patrik Faltstrom
946	   contributed to the background on identifiers.  John Klensin
947	   contributed text in a number of different sections.  Additional
948	   helpful feedback and suggestions came from Bernard Aboba, Leslie
949	   Daigle, Mark Davis, Russ Housley, Magnus Nystrom, and Chris Weber.

951	9.  IANA Considerations

953	   This document requires no actions by the IANA.

955	10.  Informative References

957	   [EE]       Mills, E., "Evidence Explained: Citing History Sources
958	              from Artifacts to Cyberspace", 2007.

960	   [I-D.ietf-6man-uri-zoneid]
961	              Carpenter, B., Cheshire, S., and R. Hinden, "Representing
962	              IPv6 Zone Identifiers in Address Literals and Uniform
963	              Resource Identifiers", draft-ietf-6man-uri-zoneid-04 (work
964	              in progress), September 2012.

966	   [I-D.ietf-pkix-rfc5280-clarifications]
967	              Yee, P., "Updates to the Internet X.509 Public Key
968	              Infrastructure Certificate and Certificate Revocation List
969	              (CRL) Profile", draft-ietf-pkix-rfc5280-clarifications-10
970	              (work in progress), October 2012.

972	   [I-D.ietf-precis-problem-statement]
973	              Blanchet, M. and A. Sullivan, "Stringprep Revision and
974	              PRECIS Problem Statement",
975	              draft-ietf-precis-problem-statement-08 (work in progress),
976	              September 2012.

978	   [IAB1123]  IAB, "The interpretation of rules in the ICANN gTLD
979	              Applicant Guidebook", February 2012, <http://www.iab.org/
980	              documents/correspondence-reports-documents/2012-2/
981	              iab-statement-the-interpretation-of-rules-in-the-icann-
982	              gtld-applicant-guidebook>.

984	   [IANA-PORT]
985	              IANA, "PORT NUMBERS", June 2011,
986	              <http://www.iana.org/assignments/port-numbers>.

988	   [IEEE-1003.1]
989	              IEEE and The Open Group, "The Open Group Base
990	              Specifications, Issue 6 IEEE Std 1003.1, 2004 Edition",
991	              IEEE Std 1003.1, 2004.

993	   [JAVAURL]  Oracle, "Class URL, Java(TM) Platform, Standard Ed. 7",
994	              2011, <http://docs.oracle.com/javase/7/docs/api/java/net/
995	              URL.html>.

997	   [RFC1123]  Braden, R., "Requirements for Internet Hosts - Application
998	              and Support", STD 3, RFC 1123, October 1989.

1000	   [RFC2277]  Alvestrand, H., "IETF Policy on Character Sets and
1001	              Languages", BCP 18, RFC 2277, January 1998.

1003	   [RFC3490]  Faltstrom, P., Hoffman, P., and A. Costello,
1004	              "Internationalizing Domain Names in Applications (IDNA)",
1005	              RFC 3490, March 2003.

1007	   [RFC3492]  Costello, A., "Punycode: A Bootstring encoding of Unicode
1008	              for Internationalized Domain Names in Applications
1009	              (IDNA)", RFC 3492, March 2003.

1011	   [RFC3493]  Gilligan, R., Thomson, S., Bound, J., McCann, J., and W.
1012	              Stevens, "Basic Socket Interface Extensions for IPv6",
1013	              RFC 3493, February 2003.

1015	   [RFC3696]  Klensin, J., "Application Techniques for Checking and
1016	              Transformation of Names", RFC 3696, February 2004.

1018	   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
1019	              Resource Identifier (URI): Generic Syntax", STD 66,
1020	              RFC 3986, January 2005.

1022	   [RFC4007]  Deering, S., Haberman, B., Jinmei, T., Nordmark, E., and
1023	              B. Zill, "IPv6 Scoped Address Architecture", RFC 4007,
1024	              March 2005.

1026	   [RFC4291]  Hinden, R. and S. Deering, "IP Version 6 Addressing
1027	              Architecture", RFC 4291, February 2006.

1029	   [RFC5280]  Cooper, D., Santesson, S., Farrell, S., Boeyen, S.,
1030	              Housley, R., and W. Polk, "Internet X.509 Public Key
1031	              Infrastructure Certificate and Certificate Revocation List
1032	              (CRL) Profile", RFC 5280, May 2008.

1034	   [RFC5322]  Resnick, P., Ed., "Internet Message Format", RFC 5322,
1035	              October 2008.

1037	   [RFC5952]  Kawamura, S. and M. Kawashima, "A Recommendation for IPv6
1038	              Address Text Representation", RFC 5952, August 2010.

1040	   [RFC6055]  Thaler, D., Klensin, J., and S. Cheshire, "IAB Thoughts on
1041	              Encodings for Internationalized Domain Names", RFC 6055,
1042	              February 2011.

1044	   [RFC6066]  Eastlake, D., "Transport Layer Security (TLS) Extensions:
1045	              Extension Definitions", RFC 6066, January 2011.

1047	   [RFC6335]  Cotton, M., Eggert, L., Touch, J., Westerlund, M., and S.
1048	              Cheshire, "Internet Assigned Numbers Authority (IANA)
1049	              Procedures for the Management of the Service Name and
1050	              Transport Protocol Port Number Registry", BCP 165,
1051	              RFC 6335, August 2011.

1053	   [RFC6530]  Klensin, J. and Y. Ko, "Overview and Framework for
1054	              Internationalized Email", RFC 6530, February 2012.

1056	   [RFC6532]  Yang, A., Steele, S., and N. Freed, "Internationalized
1057	              Email Headers", RFC 6532, February 2012.

1059	   [TR36]     Unicode Consortium, "Unicode Security Considerations",
1060	              Unicode Technical Report 36, August 2004.

1062	   [WEBER]    Weber, C., "Attacking Software Globalization", March 2010,
1063	              <http://www.lookout.net/files/
1064	              Chris_Weber_Character%20Transformations%20v1.7_IUC33.pdf>.

1066	Author's Address

1068	   Dave Thaler (editor)
1069	   Microsoft Corporation
1070	   One Microsoft Way
1071	   Redmond, WA  98052
1072	   USA

1074	   Phone: +1 425 703 8835
1075	   Email: dthaler@microsoft.com