idnits 2.17.1 

draft-iab-identifier-comparison-08.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the
     document.

  == There are 2 instances of lines with private range IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x,
     198.51.100.x or 203.0.113.x.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 457: '...dentity of an Internet host, it SHOULD...'
     RFC 2119 keyword, line 459: '...#.#.#.#") form.  The host SHOULD check...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (February 23, 2013) is 4079 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'RFC5890' is mentioned on line 586, but not defined

  == Outdated reference: A later version (-09) exists of
     draft-iab-privacy-considerations-03

  -- Obsolete informational reference (is this intentional?): RFC 3490
     (Obsoleted by RFC 5890, RFC 5891)

  -- Obsolete informational reference (is this intentional?): RFC 6125
     (Obsoleted by RFC 9525)


     Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                     D. Thaler, Ed.
3	Internet-Draft                                                 Microsoft
4	Intended status: Informational                         February 23, 2013
5	Expires: August 27, 2013

7	         Issues in Identifier Comparison for Security Purposes
8	                 draft-iab-identifier-comparison-08.txt

10	Abstract

12	   Identifiers such as hostnames, URIs, IP addresses, and email
13	   addresses are often used in security contexts to identify security
14	   principals and resources.  In such contexts, an identifier supplied
15	   via some protocol is often compared using some policy to make
16	   security decisions such as whether the security principal may access
17	   the resource, what level of authentication or encryption is required,
18	   etc.  If the parties involved in a security decision use different
19	   algorithms to compare identifiers, then failure scenarios ranging
20	   from denial of service to elevation of privilege can result.  This
21	   document provides a discussion of these issues that designers should
22	   consider when defining identifiers and protocols, and when
23	   constructing architectures that use multiple protocols.

25	Status of this Memo

27	   This Internet-Draft is submitted in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF).  Note that other groups may also distribute
32	   working documents as Internet-Drafts.  The list of current Internet-
33	   Drafts is at http://datatracker.ietf.org/drafts/current/.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   This Internet-Draft will expire on August 27, 2013.

42	Copyright Notice

44	   Copyright (c) 2013 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents
49	   (http://trustee.ietf.org/license-info) in effect on the date of
50	   publication of this document.  Please review these documents
51	   carefully, as they describe your rights and restrictions with respect
52	   to this document.  Code Components extracted from this document must
53	   include Simplified BSD License text as described in Section 4.e of
54	   the Trust Legal Provisions and are provided without warranty as
55	   described in the Simplified BSD License.

57	Table of Contents

59	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
60	     1.1.  Canonicalization . . . . . . . . . . . . . . . . . . . . .  5
61	   2.  Security Uses  . . . . . . . . . . . . . . . . . . . . . . . .  5
62	     2.1.  Types of Identifiers . . . . . . . . . . . . . . . . . . .  7
63	     2.2.  False Positives and Negatives  . . . . . . . . . . . . . .  7
64	     2.3.  Hypothetical Example . . . . . . . . . . . . . . . . . . .  8
65	   3.  Common Identifiers . . . . . . . . . . . . . . . . . . . . . .  9
66	     3.1.  Hostnames  . . . . . . . . . . . . . . . . . . . . . . . .  9
67	       3.1.1.  IPv4 Literals  . . . . . . . . . . . . . . . . . . . . 10
68	       3.1.2.  IPv6 Literals  . . . . . . . . . . . . . . . . . . . . 12
69	       3.1.3.  Internationalization . . . . . . . . . . . . . . . . . 12
70	       3.1.4.  Resolution for comparison  . . . . . . . . . . . . . . 13
71	     3.2.  Ports and Service Names  . . . . . . . . . . . . . . . . . 14
72	     3.3.  URIs . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
73	       3.3.1.  Scheme component . . . . . . . . . . . . . . . . . . . 15
74	       3.3.2.  Authority component  . . . . . . . . . . . . . . . . . 16
75	       3.3.3.  Path component . . . . . . . . . . . . . . . . . . . . 16
76	       3.3.4.  Query component  . . . . . . . . . . . . . . . . . . . 17
77	       3.3.5.  Fragment component . . . . . . . . . . . . . . . . . . 17
78	       3.3.6.  Resolution for comparison  . . . . . . . . . . . . . . 17
79	     3.4.  Email Address-like Identifiers . . . . . . . . . . . . . . 18
80	   4.  General Issues . . . . . . . . . . . . . . . . . . . . . . . . 18
81	     4.1.  Conflation . . . . . . . . . . . . . . . . . . . . . . . . 18
82	     4.2.  Internationalization . . . . . . . . . . . . . . . . . . . 19
83	     4.3.  Scope  . . . . . . . . . . . . . . . . . . . . . . . . . . 20
84	     4.4.  Temporality  . . . . . . . . . . . . . . . . . . . . . . . 21
85	   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 21
86	   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22
87	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 22
88	   8.  Informative References . . . . . . . . . . . . . . . . . . . . 22
89	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 25

91	1.  Introduction

93	   In computing and the Internet, various types of "identifiers" are
94	   used to identify humans, devices, content, etc.  Before discussing
95	   security issues, we first give some background on some typical
96	   processes involving identifiers.

98	   As depicted in Figure 1, there are multiple processes relevant to our
99	   discussion.
100	   1.  An identifier is first generated.  If the identifier is intended
101	       to be unique, the generation process must include some mechanism,
102	       such as allocation by a central authority or verification among
103	       the members of a distributed authority, to help ensure
104	       uniqueness.  However the notion of "unique" involves determining
105	       whether a putative identifier matches any other already-allocated
106	       identifier.  As we will see, for many types of identifiers, this
107	       is not simply an exact binary match.

109	       After generating the identifier, it is often stored in two
110	       locations: with the requester or "holder" of the identifier, and
111	       with some repository of identifiers (e.g., DNS).  For example, if
112	       the identifier was allocated by a central authority, the
113	       repository might be that authority.  If the identifier identifies
114	       a device or content on a device, the repository might be that
115	       device.
116	   2.  The identifier is distributed, either by the holder of the
117	       identifier or by a repository of identifiers, to others who could
118	       use the identifier.  This distribution might be electronic, but
119	       sometimes it is via other channels such as voice, business card,
120	       billboard, or other form of advertisement.  The identifier itself
121	       might be distributed directly, or it might be used to generate a
122	       portion of another type of identifier that is then distributed.
123	       For example, a URI or email address might include a server name,
124	       and hence distributing the URI or email address also inherently
125	       distributes the server name.
126	   3.  The identifier is used by some party.  Generally the user
127	       supplies the identifier which is (directly or indirectly) sent to
128	       the repository of identifiers.  For example, using an email
129	       address to send email to the holder of an identifier may result
130	       in the email arriving at the holder's email server which has
131	       access to the mail stores.

133	       The repository of identifiers must then attempt to match the
134	       user-supplied identifier with an identifier in its repository.

136	                            +------------+
137	                            |  Holder of |     1. Generation
138	                            | identifier +<---------+
139	                            +----+-------+          |
140	                                 |                  | Match
141	                                 |                  v/
142	                                 |          +-------+-------+
143	                                 +----------+ Repository of |
144	                                 |          |  identifiers  |
145	                                 |          +-------+-------+
146	                 2. Distribution |                  ^\
147	                                 |                  | Match
148	                                 v                  |
149	                       +---------+-------+          |
150	                       |      User of    |          |
151	                       |    identifier   +----------+
152	                       +-----------------+    3. Use

154	                       Typical Identifier Processes

156	                                 Figure 1

158	   Another variation is where a user is given the identifier of a
159	   resource (e.g., a web site) to access securely, sometimes known as a
160	   "reference identifier" [RFC6125], and the server connected to then
161	   presents its identity at the time of use.  In this case the user
162	   application attempts to match the presented identity against the
163	   reference identifier.

165	   One key aspect is that the identifier values passed in generation,
166	   distribution, and use, may all be in different forms.  For example,
167	   an identifier might be exchanged in printed form at generation time,
168	   distributed to a user via voice, and then used electronically.  As
169	   such, the match process can be complicated.

171	   Furthermore, in many uses, the relationship between holder,
172	   repositories, and users may be more involved.  For example, when a
173	   hierarchy of web caches exist, each cache is itself a repository of a
174	   sort, and the match process is usually intended to be the same as on
175	   the origin server.

177	   Another aspect to keep in mind is that there can be multiple
178	   identifiers that refer to the same object (i.e., resource, human,
179	   device, etc.).  For example, a human might have a passport number and
180	   a drivers license number, and an RFC might be available at multiple
181	   locations (rfc-editor.org and ietf.org).  In this document we focus
182	   on comparing two identifiers to see whether they are the same
183	   identifier, rather than comparing two different identifiers to see
184	   whether they refer to the same entity (although a few issues with the
185	   latter are touched on in several places such as Section 3.1.4 and
186	   Section 3.3.6).

188	1.1.  Canonicalization

190	   Perhaps the most common algorithm for comparison involves first
191	   converting each identifier to a canonical form (a process known as
192	   "canonicalization" or "normalization"), and then testing the
193	   resulting canonical representations for bitwise equality.  In so
194	   doing, it is thus critical that all entities involved agree on the
195	   same canonical form and use the same canonicalization algorithm so
196	   that the overall comparison process is also the same.

198	   Note that in some contexts, such as in internationalization, the
199	   terms "canonicalization" and "normalization" have a precise meaning.
200	   In this document, however, we use these terms synonymously in their
201	   more generic form, to mean conversion to some standard form.

203	   While the most common method of comparison includes canonicalization,
204	   comparison can also be done by defining an equivalence algorithm,
205	   where no single form is canonical.  However in most cases, a
206	   canonical form is useful for other purposes, such as output, and so
207	   in such cases defining a canonical form suffices to define a
208	   comparison method.

210	2.  Security Uses

212	   Identifiers such as hostnames, URIs, and email addresses are used in
213	   security contexts to identify security principals (i.e., entities
214	   that can be authenticated) and resources as well as other security
215	   parameters such as types and values of claims.  Those identifiers are
216	   then used to make security decisions based on an identifier supplied
217	   via some protocol.  For example:
218	   o  Authentication: a protocol might match a security principal's
219	      identifier to look up expected keying material, and then match
220	      keying material.
221	   o  Authorization: a protocol might match a resource name against some
222	      policy.  For example, it might look up an access control list
223	      (ACL), and then look up the security principal's identifier (or a
224	      surrogate for it) in that ACL.
225	   o  Accounting: a system might create an accounting record for a
226	      security principal's identifier or resource name, and then might
227	      later need to match a supplied identifier to (for example) add new
228	      filtering rules based on the records in order to stop an attack.

230	   If the parties involved in a security decision use different matching
231	   algorithms for the same identifiers, then failure scenarios ranging
232	   from denial of service to elevation of privilege can result, as we
233	   will see.

235	   This is especially complicated in cases involving multiple parties
236	   and multiple protocols.  For example, there are many scenarios where
237	   some form of "security token service" is used to grant to a requester
238	   permission to access a resource, where the resource is held by a
239	   third party that relies on the security token service (see Figure 2).
240	   The protocol used to request permission (e.g., Kerberos or OAuth) may
241	   be different from the protocol used to access the resource (e.g.,
242	   HTTP).  Opportunities for security problems arise when two protocols
243	   define different comparison algorithms for the same type of
244	   identifier, or when a protocol is ambiguously specified and two
245	   endpoints (e.g., a security token service and a resource holder)
246	   implement different algorithms within the same protocol.

248	        +----------+
249	        | security |
250	        |  token   |
251	        | service  |
252	        +----------+
253	             ^
254	             | 1. supply credentials and
255	             | get token for resource
256	             |                                             +--------+
257	        +----------+  2. supply token and access resource  |resource|
258	        |requester |=------------------------------------->| holder |
259	        +----------+                                       +--------+

261	                         Simple Security Exchange

263	                                 Figure 2

265	   In many cases the situation is more complex.  With certificates, the
266	   name in a certificate gets compared against names in ACLs or other
267	   things.  In the case of web site security, the name in the
268	   certificate gets compared to a portion of the URI that a user may
269	   have typed into a browser.  The fact that many different people are
270	   doing the typing, on many different types of systems, complicates the
271	   problem.

273	   Add to this the certificate enrollment step, and the certificate
274	   issuance step, and two more parties have an opportunity to adjust the
275	   encoding or worse, the software that supports them might make changes
276	   that the parties are unaware are happening.

278	2.1.  Types of Identifiers

280	   In this document we will refer to the following types of identifiers:

282	   o  Absolute: identifiers that can be compared byte-by-byte for
283	      equality.  Two identifiers that have different bytes are defined
284	      to be different.  For example, binary IP addresses are in this
285	      class.
286	   o  Definite: identifiers that have a well-defined comparison
287	      algorithm on which all parties agree.  For example, URI scheme
288	      names are required to be ASCII and are defined to match in a case-
289	      insensitive way; the comparison is thus definite since all parties
290	      agree on how to do a case-insensitive match among ASCII strings.
291	   o  Indefinite: identifiers that have no single comparison algorithm
292	      on which all parties agree.  For example, human names are in this
293	      class.  Everyone might want the comparison to be tailored for
294	      their locale, for some definition of locale.  In some cases, there
295	      may be limited subsets of parties that might be able to agree
296	      (e.g., ASCII users might all agree on a common comparison
297	      algorithm whereas users of other Latin scripts, such as Turkish,
298	      may not), but identifiers often tend to leak out of such limited
299	      environments.

301	2.2.  False Positives and Negatives

303	   It is first worth discussing in more detail the effects of errors in
304	   the comparison algorithm.  A "false positive" results when two
305	   identifiers compare as if they were equal, but in reality refer to
306	   two different objects (e.g., security principals or resources).  When
307	   privilege is granted on a match, a false positive thus results in an
308	   elevation of privilege, for example allowing execution of an
309	   operation that should not have been permitted otherwise.  When
310	   privilege is denied on a match (e.g., matching an entry in a block/
311	   deny list or a revocation list), a permissible operation is denied.
312	   At best, this can cause worse performance (e.g., a cache miss, or
313	   forcing redundant authentication), and at worst can result in a
314	   denial of service.

316	   A "false negative" results when two identifiers that in reality refer
317	   to the same thing compare as if they were different, and the effects
318	   are the reverse of those for false positives.  That is, when
319	   privilege is granted on a match, the result is at best worse
320	   performance and at worst a denial of service; when privilege is
321	   denied on a match, elevation of privilege results.

323	   Figure 3 summarizes these effects.

325	                  | "Grant on match"       | "Deny on match"
326	   ---------------+------------------------+-----------------------
327	   False positive | Elevation of privilege | Denial of service
328	   ---------------+------------------------+-----------------------
329	   False negative | Denial of service      | Elevation of privilege
330	   ---------------+------------------------+-----------------------

332	                Worst Effects of False Positives/Negatives

334	                                 Figure 3

336	   When designing a comparison algorithm, one can typically modify it to
337	   increase the likelihood of false positives and decrease the
338	   likelihood of false negatives, or vice versa.  Which outcome is
339	   better depends on the context.

341	   Elevation of privilege is almost always seen as far worse than denial
342	   of service.  Hence, for URIs for example, Section 6.1 of [RFC3986]
343	   states: "comparison methods are designed to minimize false negatives
344	   while strictly avoiding false positives".

346	   Thus URIs were defined with a "grant privilege on match" paradigm in
347	   mind, where it is critical to prevent elevation of privilege while
348	   minimizing denial of service.  Using URIs in a "deny privilege on
349	   match" system can thus be problematic.

351	2.3.  Hypothetical Example

353	   In this example, both security principals and resources are
354	   identified using URIs.  Foo Corp has paid example.com for access to
355	   the Stuff service.  Foo Corp allows its employees to create accounts
356	   on the Stuff service.  Alice gets the account
357	   "http://example.com/Stuff/FooCorp/alice" and Bob gets
358	   "http://example.com/Stuff/FooCorp/bob".  It turns out, however, that
359	   Foo Corp's URI canonicalizer includes URI fragment components in
360	   comparisons whereas example.com's does not, and Foo Corp does not
361	   disallow the # character in the account name.  So Chuck, who is a
362	   malicious employee of Foo Corp, asks to create an account at
363	   example.com with the name alice#stuff.  Foo Corp's URI logic checks
364	   its records for accounts it has created with stuff and sees that
365	   there is no account with the name alice#stuff.  Hence, in its
366	   records, it associates the account alice#stuff with Chuck and will
367	   only issue tokens good for use with
368	   "http://example.com/Stuff/FooCorp/alice#stuff" to Chuck.

370	   Chuck, the attacker, goes to a security token service at Foo Corp and
371	   asks for a security token good for
372	   "http://example.com/Stuff/FooCorp/alice#stuff".  Foo Corp issues the
373	   token since Chuck is the legitimate owner (in Foo Corp's view) of the
374	   alice#stuff account.  Chuck then submits the security token in a
375	   request to "http://example.com/Stuff/FooCorp/alice".

377	   But example.com uses a URI canonicalizer that, for the purposes of
378	   checking equality, ignores fragments.  So when example.com looks in
379	   the security token to see if the requester has permission from Foo
380	   Corp to access the given account it successfully matches the URI in
381	   the security token, "http://example.com/Stuff/FooCorp/alice#stuff",
382	   with the requested resource name
383	   "http://example.com/Stuff/FooCorp/alice".

385	   Leveraging the inconsistencies in the canonicalizers used by Foo Corp
386	   and example.com, Chuck is able to successfully launch an elevation of
387	   privilege attack and access Alice's resource.

389	   Furthermore, consider an attacker using a similar corporation such as
390	   "foocorp" (or any variation containing a non-ASCII character that
391	   some humans might expect to represent the same corporation).  If the
392	   resource holder treats them as different, but the security token
393	   service treats them as the same, then again elevation of privilege
394	   can occur.

396	3.  Common Identifiers

398	   In this section, we walk through a number of common types of
399	   identifiers and discuss various issues related to comparison that may
400	   affect security whenever they are used to identify security
401	   principals or resources.  These examples illustrate common patterns
402	   that may arise with other types of identifiers.

404	3.1.  Hostnames

406	   Hostnames (composed of dot-separated labels) are commonly used either
407	   directly as identifiers, or as components in identifiers such as in
408	   URIs and email addresses.  Another example is in [RFC5280], sections
409	   7.2 and 7.3 (and updated in section 3 of
410	   [I-D.ietf-pkix-rfc5280-clarifications]), which specify use in X.509
411	   Public Key Infrastructure certificates.

413	   In this section we discuss a number of issues in comparing strings
414	   that appear to be some form of hostname.

416	   It is first worth pointing out that the term "hostname" itself is
417	   often ambiguous, and hence it is important that any use clarify which
418	   definition is intended.  Some examples of definitions include:

420	   a.  A Fully-Qualified Domain Name (FQDN),
421	   b.  An FQDN that is associated with address records in the DNS,
422	   c.  The leftmost label in an FQDN, or
423	   d.  The leftmost label in an FQDN that is associated with address
424	       records.

426	   The use of different definitions in different places results in
427	   questions such as whether "example" and "example.com" are considered
428	   equal or not, and hence it is important when writing new
429	   specifications to be clear about what definition is meant.

431	   Section 3 of [RFC6055] discusses the differences between a "hostname"
432	   vs. a "DNS name", where the former is a subset of the latter by using
433	   a restricted set of characters.  If one canonicalizer uses the "DNS
434	   name" definition whereas another uses a "hostname" definition, a name
435	   might be valid in the former but invalid in the latter.  As long as
436	   invalid identifiers are denied privilege, this difference will not
437	   result in elevation of privilege.

439	   Section 3.1 of [RFC1034] discusses the difference between a
440	   "complete" domain name which ends with a dot (such as
441	   "example.com."), vs. a multi-label relative name such as
442	   "example.com" that assumes the root (".") is in the suffix search
443	   list.  In most contexts these are considered equal, but there may be
444	   issues if different entities in a security architecture have
445	   different interpretations of a relative domain name.

447	   [IAB1123] briefly discusses issues with the ambiguity around whether
448	   a label will be "alphabetic", including among other issues, how
449	   "alphabetic" should be interpreted in an internationalized
450	   environment, and whether a hostname can be interpreted as an IP
451	   address.  We explore this last issue in more detail below.

453	3.1.1.  IPv4 Literals

455	   [RFC1123] section 2.1 states:

457	      Whenever a user inputs the identity of an Internet host, it SHOULD
458	      be possible to enter either (1) a host domain name or (2) an IP
459	      address in dotted-decimal ("#.#.#.#") form.  The host SHOULD check
460	      the string syntactically for a dotted-decimal number before
461	      looking it up in the Domain Name System.

463	   and

465	      This last requirement is not intended to specify the complete
466	      syntactic form for entering a dotted-decimal host number; that is
467	      considered to be a user-interface issue.

469	   In specifying the inet_addr() API, the POSIX standard [IEEE-1003.1]
470	   defines "IPv4 dotted decimal notation" as allowing not only strings
471	   of the form "10.0.1.2", but also allows octal and hexadecimal, and
472	   addresses with less than four parts.  For example, "10.0.258",
473	   "0xA000001", and "012.0x102" all represent the same IPv4 address in
474	   standard "IPv4 dotted decimal" notation.  We will refer to this as
475	   the "loose" syntax of an IPv4 address literal.

477	   In section 6.1 of [RFC3493] getaddrinfo() is defined to support the
478	   same (loose) syntax as inet_addr():

480	      If the specified address family is AF_INET or AF_UNSPEC, address
481	      strings using Internet standard dot notation as specified in
482	      inet_addr() are valid.

484	   In contrast, section 6.3 of the same RFC states, specifying
485	   inet_pton():

487	      If the af argument of inet_pton() is AF_INET, the src string shall
488	      be in the standard IPv4 dotted-decimal form: ddd.ddd.ddd.ddd where
489	      "ddd" is a one to three digit decimal number between 0 and 255.
490	      The inet_pton() function does not accept other formats (such as
491	      the octal numbers, hexadecimal numbers, and fewer than four
492	      numbers that inet_addr() accepts).

494	   As shown above, inet_pton() uses what we will refer to as the
495	   "strict" form of an IPv4 address literal.  Some platforms also use
496	   the strict form with getaddrinfo() when the AI_NUMERICHOST flag is
497	   passed to it.

499	   Both the strict and loose forms are standard forms, and hence a
500	   protocol specification is still ambiguous if it simply defines a
501	   string to be in the "standard IPv4 dotted decimal form".  And, as a
502	   result of these differences, names such as "10.11.12" are ambiguous
503	   as to whether they are an IP address or a hostname, and even
504	   "10.11.12.13" can be ambiguous because of the "SHOULD" in RFC 1123
505	   above making it optional whether to treat it as an address or a name.

507	   Protocols and data formats that can use addresses in string form for
508	   security purposes need to resolve these ambiguities.  For example,
509	   for the host component of URIs, section 3.2.2 of [RFC3986] resolves
510	   the first ambiguity by only allowing the strict form, and the second
511	   ambiguity by specifying that it is considered an IPv4 address
512	   literal.  New protocols and data formats should similarly consider
513	   using the strict form rather than the loose form in order to better
514	   match user expectations.

516	   A string might be valid under the "loose" definition, but invalid
517	   under the "strict" definition.  As long as invalid identifiers are
518	   denied privilege, this difference will not result in elevation of
519	   privilege.  Some protocols, however, use strings that can be either
520	   an IP address literal or a hostname.  Such strings are at best
521	   Definite identifiers, and often turn out to be Indefinite
522	   identifiers.  (See Section 4.1 for more discussion.)

524	   Furthermore, when strings can contain non-ASCII characters, they can
525	   contain other characters that may look like dots or digits to a human
526	   viewing and/or entering the identifier, especially to one who might
527	   expect digits to appear in his or her native script.

529	3.1.2.  IPv6 Literals

531	   IPv6 addresses similarly have a wide variety of alternate but
532	   semantically identical string representations, as defined in section
533	   2.2 of [RFC4291] and section 2 of [I-D.ietf-6man-uri-zoneid].  As
534	   discussed in section 3.2.5 of [RFC5952], this fact causes problems in
535	   security contexts if comparison (such as in X.509 certificates), is
536	   done between strings rather than between the binary representations
537	   of addresses.

539	   [RFC5952] recently specified a recommended canonical string format as
540	   an attempt to solve this problem, but it may not be ubiquitously
541	   supported at present.  And, when strings can contain non-ASCII
542	   characters, the same issues (and more, since hexadecimal and colons
543	   are allowed) arise as with IPv4 literals.

545	   Whereas (binary) IPv6 addresses are Absolute identifiers, IPv6
546	   address literals are Definite identifiers, since string-to-address
547	   conversion for IPv6 address literals is unambiguous.

549	3.1.3.  Internationalization

551	   The IETF policy on character sets and languages [RFC2277] requires
552	   support for UTF-8 in protocols, and as a result many protocols now do
553	   support non-ASCII characters.  When a hostname is sent in a UTF-8
554	   field, there are a number of ways it may be encoded.  For example,
555	   hostname labels might be encoded directly in UTF-8, or might first be
556	   Punycode-encoded [RFC3492] or even percent-encoded from UTF-8.

558	   For example, in URIs, [RFC3986] section 3.2.2 specifically allows for
559	   the use of percent-encoded UTF-8 characters in the hostname, as well
560	   as the use of IDNA encoding [RFC3490] using the Punycode algorithm.

562	   Percent-encoding is unambiguous for hostnames since the percent
563	   character cannot appear in the strict definition of a "hostname",
564	   though it can appear in a DNS name.

566	   Punycode-encoded labels (or "A-labels") on the other hand can be
567	   ambiguous if hosts are actually allowed to be named with a name
568	   starting with "xn--", and false positives can result.  While this may
569	   be extremely unlikely for normal scenarios, it nevertheless provides
570	   a possible vector for an attacker.

572	   A hostname comparator thus needs to decide whether a Punycode-encoded
573	   label should or should not be considered a valid hostname label, and
574	   if so, then whether it should match a label encoded in some other
575	   form such as a percent-encoded Unicode label (U-label).

577	   For example, Section 3 of "Transport Layer Security (TLS) Extensions"
578	   [RFC6066], states:

580	      "HostName" contains the fully qualified DNS hostname of the
581	      server, as understood by the client.  The hostname is represented
582	      as a byte string using ASCII encoding without a trailing dot.
583	      This allows the support of internationalized domain names through
584	      the use of A-labels defined in [RFC5890].  DNS hostnames are case-
585	      insensitive.  The algorithm to compare hostnames is described in
586	      [RFC5890], Section 2.3.2.4.

588	   For some additional discussion of security issues that arise with
589	   internationalization, see [TR36].

591	3.1.4.  Resolution for comparison

593	   Some systems (specifically Java URLs [JAVAURL]) use the rule that if
594	   two hostnames resolve to the same IP address(es) then the hostnames
595	   are considered equal.  That is, the canonicalization algorithm
596	   involves name resolution with an IP address being the canonical form.

598	   For example, if resolution was done via DNS, and DNS contained:

600	   example.com.  IN A 10.0.0.6
601	   example.net.  CNAME example.com.
602	   example.org.  IN A 10.0.0.6

604	   then the algorithm might treat all three names as equal, even though
605	   the third name might refer to a different entity.

607	   With the introduction of dynamic IP addresses, private IP addresses,
608	   multiple IP addresses per name, multiple address families (e.g., IPv4
609	   vs. IPv6), devices that roam to new locations, commonly deployed DNS
610	   tricks that result in the answer depending on factors such as the
611	   requester's location and the load on the server whose address is
612	   returned, etc., this method of comparison cannot be relied upon.
613	   There is no guarantee that two names for the same host will resolve
614	   the name to the same IP addresses, nor that the addresses resolved
615	   refer to the same entity such as when the names resolve to private IP
616	   addresses, nor even that the system has connectivity (and the
617	   willingness to wait for the delay) to resolve names at the time the
618	   answer is needed.  The lifetime of the identifier, and of any cached
619	   state from a previous resolution, also affects security (see
620	   Section 4.4).

622	   In addition, a comparison mechanism that relies on the ability to
623	   resolve identifiers such as hostnames to other identifies such as IP
624	   addresses leaks information about security decisions to outsiders if
625	   these queries are publicly observable.  (See
626	   [I-D.iab-privacy-considerations] for a deeper discussion of
627	   information disclosure.)

629	   Finally, it is worth noting that resolving two identifiers to
630	   determine if they refer to the same entity can be thought of as a use
631	   of such identifiers, as opposed to actually comparing the identifiers
632	   themselves, which is the focus of this document.

634	3.2.  Ports and Service Names

636	   Port numbers and service names are discussed in depth in [RFC6335].
637	   Historically, there were port numbers, service names used in SRV
638	   records, and mnemonic identifiers for assigned port numbers (known as
639	   port "keywords" at [IANA-PORT]).  The latter two are now unified, and
640	   various protocols use one or more of these types in strings.  For
641	   example, the common syntax used by many URI schemes allows port
642	   numbers but not service names.  Some implementations of the
643	   getaddrinfo() API support strings that can be either port numbers or
644	   port keywords (but not service names).

646	   For protocols that use service names that must be resolved, the
647	   issues are the same as those for resolution of addresses in
648	   Section 3.1.4.  In addition, Section 5.1 of [RFC6335] clarifies that
649	   service names/port keywords must contain at least one letter.  This
650	   prevents confusion with port numbers in strings where both are
651	   allowed.

653	3.3.  URIs

655	   This section looks at issues related to using URIs for security
656	   purposes.  For example, [RFC5280], section 7.4, specifies comparison
657	   of URIs in certificates.  Examples of URIs in security token-based
658	   access control systems include WS-*, SAML-P and OAuth WRAP.  In such
659	   systems, a variety of participants in the security infrastructure are
660	   identified by URIs.  For example, requesters of security tokens are
661	   sometimes identified with URIs.  The issuers of security tokens and
662	   the relying parties who are intended to consume security tokens are
663	   frequently identified by URIs.  Claims in security tokens often have
664	   their types defined using URIs and the values of the claims can also
665	   be URIs.

667	   URIs are defined with multiple components, each of which has its own
668	   rules.  We cover each in turn below.  However, it is also important
669	   to note that there exist multiple comparison algorithms.  [RFC3986]
670	   section 6.2 states:

672	      A variety of methods are used in practice to test URI equivalence.
673	      These methods fall into a range, distinguished by the amount of
674	      processing required and the degree to which the probability of
675	      false negatives is reduced.  As noted above, false negatives
676	      cannot be eliminated.  In practice, their probability can be
677	      reduced, but this reduction requires more processing and is not
678	      cost-effective for all applications.
679	      If this range of comparison practices is considered as a ladder,
680	      the following discussion will climb the ladder, starting with
681	      practices that are cheap but have a relatively higher chance of
682	      producing false negatives, and proceeding to those that have
683	      higher computational cost and lower risk of false negatives.

685	   The ladder approach has both pros and cons.  On the pro side, it
686	   allows some uses to optimize for security, and other uses to optimize
687	   for cost, thus allowing URIs to be applicable to a wide range of
688	   uses.  A disadvantage is that when different approaches are taken by
689	   different components in the same system using the same identifiers,
690	   the inconsistencies can result in security issues.

692	3.3.1.  Scheme component

694	   [RFC3986] defines URI schemes as being case-insensitive ASCII and in
695	   section 6.2.2.1 specifies that scheme names should be normalized to
696	   lower-case characters.

698	   New schemes can be defined over time.  In general two URIs with an
699	   unrecognized scheme cannot be safely compared, however.  This is
700	   because the canonicalization and comparison rules for the other
701	   components may vary by scheme.  For example, a new URI scheme might
702	   have a default port of X, and without that knowledge, a comparison
703	   algorithm cannot know whether "example.com" and "example.com:X"
704	   should be considered to match in the authority component.  Hence for
705	   security purposes, it is safest for unrecognized schemes to be
706	   treated as invalid identifiers.  However, if the URIs are only used
707	   with a "grant access on match" paradigm then unrecognized schemes can
708	   be supported by doing a generic case-sensitive comparison, at the
709	   expense of some false negatives.

711	3.3.2.  Authority component

713	   The authority component is scheme-specific, but many schemes follow a
714	   common syntax that allows for userinfo, host, and port.

716	3.3.2.1.  Host

718	   Section 3.1 discussed issues with hostnames in general.  In addition,
719	   [RFC3986] section 3.2.2 allows future changes using the IPvFuture
720	   production.  As with IPv4 and IPv6 literals, IPvFuture formats may
721	   have issues with multiple semantically identical string
722	   representations, and may also be semantically identical to an IPv4 or
723	   IPv6 address.  As such, false negatives may be common if IPvFuture is
724	   used.

726	3.3.2.2.  Port

728	   See discussion in Section 3.2.

730	3.3.2.3.  Userinfo

732	   [RFC3986] defines the userinfo production that allows arbitrary data
733	   about the user of the URI to be placed before '@' signs in URIs.  For
734	   example: "ftp://alice:bob@example.com/bar" has the value "alice:bob"
735	   as its userinfo.  When comparing URIs in a security context, one must
736	   decide whether to treat the userinfo as being significant or not.
737	   Some URI comparison services for example treat
738	   "ftp://alice:ick@example.com" and "ftp://example.com" as being equal.

740	   When the userinfo is treated as being significant, it has additional
741	   considerations (e.g., whether it is case-sensitive or not) which we
742	   cover in Section 3.4.

744	3.3.3.  Path component

746	   [RFC3986] supports the use of path segment values such as "./" or
747	   "../" for relative URIs.  As discussed in section 6.2.2.3 of
748	   [RFC3986], they are intended only for use within a reference relative
749	   to some other base URI, but [RFC3986] section 5.2.4 nevertheless
750	   defines an algorithm to remove them as part of URI normalization.

752	   Unless a scheme states otherwise, the path component is defined to be
753	   case-sensitive.  However, if the resource is stored and accessed
754	   using a filesystem using case-insensitive paths, there will be many
755	   paths that refer to the same resource.  As such, false negatives can
756	   be common in this case.

758	3.3.4.  Query component

760	   There is the question as to whether "http://example.com/foo",
761	   "http://example.com/foo?", and "http://example.com/foo?bar" are each
762	   considered equal or different.

764	   Similarly, it is unspecified whether the order of values matters.
765	   For example, should "http://example.com/blah?ick=bick&foo=bar" be
766	   considered equal to "http://example.com/blah?foo=bar&ick=bick"?  And
767	   if a domain name is permitted to appear in a query component (e.g.,
768	   in a reference to another URI), the same issues in Section 3.1 apply.

770	3.3.5.  Fragment component

772	   Some URI formats include fragment identifiers.  These are typically
773	   handles to locations within a resource and are used for local
774	   reference.  A classic example is the use of fragments in HTTP URIs
775	   where a URI of the form "http://example.com/blah.html#ick" means
776	   retrieve the resource "http://example.com/blah.html" and, once it has
777	   arrived locally, find the HTML anchor named ick and display that.

779	   So, for example, when a user clicks on the link
780	   "http://example.com/blah.html#baz" a browser will check its cache by
781	   doing a URI comparison for "http://example.com/blah.html" and, if the
782	   resource is present in the cache, a match is declared.

784	   Hence comparisons for security purposes typically ignore the fragment
785	   component and treat all fragments as equal to the full resource.
786	   However, if one were actually trying to compare the piece of a
787	   resource that was identified by the fragment identifier, ignoring it
788	   would result in potential false positives.

790	3.3.6.  Resolution for comparison

792	   As with Section 3.1.4 for hostnames, it may be tempting to define a
793	   URI comparison algorithm based on whether they resolve to the same
794	   content.  Similar problems exist, however, including content that
795	   dynamically changes over time or based on factors such as the
796	   requester's location, potential lack of external connectivity at the
797	   time/place comparison is done, potentially undesirable delay
798	   introduced, etc.

800	   In addition, as noted in Section 3.1.4, resolution leaks information
801	   about security decisions to outsiders if the queries are publicly
802	   observable.

804	3.4.  Email Address-like Identifiers

806	   Section 3.4.1 of [RFC5322] defines the syntax of an email address-
807	   like identifier, and Section 3.2 of [RFC6532] updates it to support
808	   internationalization.  [RFC5280], section 7.5, further discusses the
809	   use of internationalized email addresses in certificates.

811	   [RFC6532] use in certificates points to [RFC6530], where Section 13
812	   of that document contains a discussion of many issues resulting from
813	   internationalization.

815	   Email address-like identifiers have a local part and a domain part.
816	   The issues with the domain part are essentially the same as with
817	   hostnames, covered earlier in Section 3.1.

819	   The local part is left for each domain to define.  People quite
820	   commonly use email addresses as usernames with web sites such as
821	   banks or shopping sites, but the site doesn't know whether
822	   foo@example.com is the same person as FOO@example.com.  Thus email
823	   address-like identifiers are typically Indefinite identifiers.

825	   To avoid false positives, some security mechanisms (such as
826	   [RFC5280]) compare the local part using an exact match.  Hence, like
827	   URIs, email address-like identifiers are designed for use in grant-
828	   on-match security schemes, not in deny-on-match schemes.

830	   Furthermore, when such identifiers are actually used as email
831	   addresses, Section 2.4 of [RFC5321] states that the local part of a
832	   mailbox must be treated as case sensitive, but if a mailbox is stored
833	   and accessed using a fileystem using case-insensitive paths, there
834	   may be many paths that refer to the same mailbox.  As such, false
835	   negatives can be common in this case.

837	4.  General Issues

839	4.1.  Conflation

841	   There are a number of examples (some in the preceding sections) of
842	   strings that conflate two types of identifiers, using some heuristic
843	   to try to determine which type of identifier is given.  Similarly,
844	   two ways of encoding the same type of identifier might be conflated
845	   within the same string.

847	   Some examples include:
848	   1.  A string that might be an IPv4 address literal or an IPv6 address
849	       literal

851	   2.  A string that might be an IP address literal or a hostname
852	   3.  A string that might be a port number or a service name
853	   4.  A DNS label that might be literal or be Punycode-encoded

855	   Strings that allow such conflation can only be considered Definite if
856	   there exists a well-defined rule to determine which identifier type
857	   is meant.  One way to do so is to ensure that the valid syntax for
858	   the two is disjoint (e.g., distinguishing IPv4 vs. IPv6 address
859	   literals by the use of colons in the latter).  A second way to do so
860	   is to define a precedence rule that results in some identifiers being
861	   inaccessible via a conflated string (e.g., a host literally named
862	   "xn--de-jg4avhby1noc0d" may be inaccessible due to the "xn--" prefix
863	   denoting the use of Punycode encoding).  In some cases, such
864	   inaccessible space may be reserved so that the actual set of
865	   identifiers in use are unambiguous.  For example, Section 2.5.5.2 of
866	   [RFC4291] defines a range of the IPv6 address space for representing
867	   IPv4 addresses.

869	4.2.  Internationalization

871	   In addition to the issues with hostnames discussed in Section 3.1.3,
872	   there are a number of internationalization issues that apply to many
873	   types of Definite and Indefinite identifiers.

875	   First, there is no DNS mechanism for identifying whether non-
876	   identical strings would be seen by a human as being equivalent.
877	   There are problematic examples even with ASCII (Basic Latin) strings
878	   including regional spelling variations such as "color" and "colour"
879	   and many non-English cases including partially-numeric strings in
880	   Arabic script contexts, Chinese strings in Simplified and Traditional
881	   forms, and so on.  Attempts to produce such alternate forms
882	   algorithmically could produce false positives and hence have an
883	   adverse affect on security.

885	   Second, some strings are visually confusable with others, and hence
886	   if a security decision is made by a user based on visual inspection,
887	   many opportunities for false positives exist.  As such, using visual
888	   inspection for security is unreliable.  In addition to the security
889	   issues, visual confusability also adversely affects the usability of
890	   identifiers distributed via visual mediums.  Similar issues can arise
891	   with audible confusability when using audio (e.g., for radio
892	   distribution, accessibility to the blind, etc.) in place of a visual
893	   medium.

895	   Determining whether a string is a valid identifier should typically
896	   be done after, or as part of, canonicalization.  Otherwise an
897	   attacker might use the canonicalization algorithm to inject (e.g.,
898	   via percent encoding, NFKC, or non-shortest-form UTF-8) delimiters
899	   such as '@' in an email address-like identifier, or a '.' in a
900	   hostname.

902	   Any case-insensitive comparisons need to define how comparison is
903	   done, since such comparisons may vary by locale of the endpoint.  As
904	   such, using case-insensitive comparisons in general often result in
905	   identifiers being either Indefinite or, if the legal character set is
906	   restricted (e.g., to ASCII), then Definite.

908	   See also [WEBER] for a more visual discussion of many of these
909	   issues.

911	   Finally, the set of permitted characters and the canonical form of
912	   the characters (and hence the canonicalization algorithm) sometimes
913	   varies by protocol today, even when the intent is to use the same
914	   identifier, such as when one protocol passes identifiers to the
915	   other.  See [I-D.ietf-precis-problem-statement] for further
916	   discussion.

918	4.3.  Scope

920	   Another issue arises when an identifier (e.g., "localhost",
921	   "10.11.12.13", etc.) is not globally unique.  [RFC3986] Section 1.1
922	   states:

924	      URIs have a global scope and are interpreted consistently
925	      regardless of context, though the result of that interpretation
926	      may be in relation to the end-user's context.  For example,
927	      "http://localhost/" has the same interpretation for every user of
928	      that reference, even though the network interface corresponding to
929	      "localhost" may be different for each end-user: interpretation is
930	      independent of access.

932	   Whenever a non-globally-unique identifier is passed to another entity
933	   outside of the scope of uniqueness, it will refer to a different
934	   resource, and can result in a false positive.  This problem is often
935	   addressed by using the identifier together with some other unique
936	   identifier of the context.  For example "alice" may uniquely identify
937	   a user within a system, but must be used with "example.com" (as in
938	   "alice@example.com") to uniquely identify the context outside of that
939	   system.

941	   It is also worth noting that non-globally-scoped IPv6 addresses can
942	   be written with, or otherwise associated with, a "zone ID" to
943	   identify the context (see [RFC4007] for more information).  However,
944	   zone IDs are only unique within a host, so they typically narrow,
945	   rather than expand, the scope of uniqueness of the resulting
946	   identifier.

948	4.4.  Temporality

950	   Often identifiers are not unique across all time, but have some
951	   lifetime associated with them after which they may be reassigned to
952	   another entity.  For example, bob@example.com might be assigned to an
953	   employee of the Example company, but if he leaves and another Bob is
954	   later hired, the same identifier might be reused.  As another
955	   example, IP address 203.0.113.1 might be assigned to one subscriber,
956	   and then later reassigned to another subscriber.  Security issues can
957	   arise if updates are not made in all entities that store the
958	   identifier (e.g., in an access control list as discussed in
959	   Section 2, or in a resolution cache as discussed in Section 3.1.4).
960	   This issue is similar to the issue of scope discussed in Section 4.3,
961	   except that the scope of uniqueness is temporal rather than
962	   topological.

964	5.  Security Considerations

966	   This entire document is about security considerations.

968	   To minimize elevation of privilege issues, any system that requires
969	   the ability to use both deny and allow operations within the same
970	   identifier space should avoid the use of Indefinite identifiers in
971	   security comparisons.

973	   To minimize future security risks, any new identifiers being designed
974	   should specify an Absolute or Definite comparison algorithm, and if
975	   extensibility is allowed (e.g., as new schemes in URIs allow) then
976	   the comparison algorithm should remain invariant so that unrecognized
977	   extensions can be compared.  That is, security risks can be reduced
978	   by specifying the comparison algorithm, making sure to resolve any
979	   ambiguities pointed out in this document (e.g., "standard dotted
980	   decimal").

982	   Some issues (such as unrecognized extensions) can be mitigated by
983	   treating such identifiers as invalid.  Validity checking of
984	   identifiers is further discussed in [RFC3696].

986	   Perhaps the hardest issues arise when multiple protocols are used
987	   together, such as in the figure in Section 2, where the two protocols
988	   are defined or implemented using different comparison algorithms.
989	   When constructing an architecture that uses multiple such protocols,
990	   designers should pay attention to any differences in comparison
991	   algorithms among the protocols, in order to fully understand the
992	   security risks.  An area for future work is how to deal with such
993	   security risks in current systems.

995	6.  Acknowledgements

997	   Yaron Goland contributed to the discussion on URIs.  Patrik Faltstrom
998	   contributed to the background on identifiers.  John Klensin
999	   contributed text in a number of different sections.  Additional
1000	   helpful feedback and suggestions came from Bernard Aboba, Fred Baker,
1001	   Leslie Daigle, Mark Davis, Jeff Hodges, Russ Housley, Christian
1002	   Huitema, Magnus Nystrom, and Chris Weber.

1004	7.  IANA Considerations

1006	   This document requires no actions by the IANA.

1008	8.  Informative References

1010	   [I-D.iab-privacy-considerations]
1011	              Cooper, A., Tschofenig, H., Aboba, B., Peterson, J.,
1012	              Morris, J., Hansen, M., and R. Smith, "Privacy
1013	              Considerations for Internet Protocols",
1014	              draft-iab-privacy-considerations-03 (work in progress),
1015	              July 2012.

1017	   [I-D.ietf-6man-uri-zoneid]
1018	              Carpenter, B., Cheshire, S., and R. Hinden, "Representing
1019	              IPv6 Zone Identifiers in Address Literals and Uniform
1020	              Resource Identifiers", draft-ietf-6man-uri-zoneid-06 (work
1021	              in progress), December 2012.

1023	   [I-D.ietf-pkix-rfc5280-clarifications]
1024	              Yee, P., "Updates to the Internet X.509 Public Key
1025	              Infrastructure Certificate and Certificate Revocation List
1026	              (CRL) Profile", draft-ietf-pkix-rfc5280-clarifications-11
1027	              (work in progress), November 2012.

1029	   [I-D.ietf-precis-problem-statement]
1030	              Blanchet, M. and A. Sullivan, "Stringprep Revision and
1031	              PRECIS Problem Statement",
1032	              draft-ietf-precis-problem-statement-09 (work in progress),
1033	              January 2013.

1035	   [IAB1123]  IAB, "The interpretation of rules in the ICANN gTLD
1036	              Applicant Guidebook", February 2012, <http://www.iab.org/
1037	              documents/correspondence-reports-documents/2012-2/
1038	              iab-statement-the-interpretation-of-rules-in-the-icann-
1039	              gtld-applicant-guidebook>.

1041	   [IANA-PORT]
1042	              IANA, "PORT NUMBERS", June 2011,
1043	              <http://www.iana.org/assignments/port-numbers>.

1045	   [IEEE-1003.1]
1046	              IEEE and The Open Group, "The Open Group Base
1047	              Specifications, Issue 6 IEEE Std 1003.1, 2004 Edition",
1048	              IEEE Std 1003.1, 2004.

1050	   [JAVAURL]  Oracle, "Class URL, Java(TM) Platform, Standard Ed. 7",
1051	              2011, <http://docs.oracle.com/javase/7/docs/api/java/net/
1052	              URL.html>.

1054	   [RFC1034]  Mockapetris, P., "Domain names - concepts and facilities",
1055	              STD 13, RFC 1034, November 1987.

1057	   [RFC1123]  Braden, R., "Requirements for Internet Hosts - Application
1058	              and Support", STD 3, RFC 1123, October 1989.

1060	   [RFC2277]  Alvestrand, H., "IETF Policy on Character Sets and
1061	              Languages", BCP 18, RFC 2277, January 1998.

1063	   [RFC3490]  Faltstrom, P., Hoffman, P., and A. Costello,
1064	              "Internationalizing Domain Names in Applications (IDNA)",
1065	              RFC 3490, March 2003.

1067	   [RFC3492]  Costello, A., "Punycode: A Bootstring encoding of Unicode
1068	              for Internationalized Domain Names in Applications
1069	              (IDNA)", RFC 3492, March 2003.

1071	   [RFC3493]  Gilligan, R., Thomson, S., Bound, J., McCann, J., and W.
1072	              Stevens, "Basic Socket Interface Extensions for IPv6",
1073	              RFC 3493, February 2003.

1075	   [RFC3696]  Klensin, J., "Application Techniques for Checking and
1076	              Transformation of Names", RFC 3696, February 2004.

1078	   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
1079	              Resource Identifier (URI): Generic Syntax", STD 66,
1080	              RFC 3986, January 2005.

1082	   [RFC4007]  Deering, S., Haberman, B., Jinmei, T., Nordmark, E., and
1083	              B. Zill, "IPv6 Scoped Address Architecture", RFC 4007,
1084	              March 2005.

1086	   [RFC4291]  Hinden, R. and S. Deering, "IP Version 6 Addressing
1087	              Architecture", RFC 4291, February 2006.

1089	   [RFC5280]  Cooper, D., Santesson, S., Farrell, S., Boeyen, S.,
1090	              Housley, R., and W. Polk, "Internet X.509 Public Key
1091	              Infrastructure Certificate and Certificate Revocation List
1092	              (CRL) Profile", RFC 5280, May 2008.

1094	   [RFC5321]  Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,
1095	              October 2008.

1097	   [RFC5322]  Resnick, P., Ed., "Internet Message Format", RFC 5322,
1098	              October 2008.

1100	   [RFC5952]  Kawamura, S. and M. Kawashima, "A Recommendation for IPv6
1101	              Address Text Representation", RFC 5952, August 2010.

1103	   [RFC6055]  Thaler, D., Klensin, J., and S. Cheshire, "IAB Thoughts on
1104	              Encodings for Internationalized Domain Names", RFC 6055,
1105	              February 2011.

1107	   [RFC6066]  Eastlake, D., "Transport Layer Security (TLS) Extensions:
1108	              Extension Definitions", RFC 6066, January 2011.

1110	   [RFC6125]  Saint-Andre, P. and J. Hodges, "Representation and
1111	              Verification of Domain-Based Application Service Identity
1112	              within Internet Public Key Infrastructure Using X.509
1113	              (PKIX) Certificates in the Context of Transport Layer
1114	              Security (TLS)", RFC 6125, March 2011.

1116	   [RFC6335]  Cotton, M., Eggert, L., Touch, J., Westerlund, M., and S.
1117	              Cheshire, "Internet Assigned Numbers Authority (IANA)
1118	              Procedures for the Management of the Service Name and
1119	              Transport Protocol Port Number Registry", BCP 165,
1120	              RFC 6335, August 2011.

1122	   [RFC6530]  Klensin, J. and Y. Ko, "Overview and Framework for
1123	              Internationalized Email", RFC 6530, February 2012.

1125	   [RFC6532]  Yang, A., Steele, S., and N. Freed, "Internationalized
1126	              Email Headers", RFC 6532, February 2012.

1128	   [TR36]     Unicode Consortium, "Unicode Security Considerations",
1129	              Unicode Technical Report 36, August 2004,
1130	              <http://www.unicode.org/reports/tr36/>.

1132	   [WEBER]    Weber, C., "Attacking Software Globalization", March 2010,
1133	              <http://www.lookout.net/files/
1134	              Chris_Weber_Character%20Transformations%20v1.7_IUC33.pdf>.

1136	Author's Address

1138	   Dave Thaler (editor)
1139	   Microsoft Corporation
1140	   One Microsoft Way
1141	   Redmond, WA  98052
1142	   USA

1144	   Phone: +1 425 703 8835
1145	   Email: dthaler@microsoft.com