Network Working Group Y. Pettersen Internet-Draft Opera Software ASA Updates: RFC 2965 July 9, 2007 (if approved) Intended status: Standards Track Expires: January 10, 2008 The TLD Subdomain Structure Protocol and its use for Cookie domain validation draft-pettersen-subtld-structure-02 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 10, 2008. Copyright Notice Copyright (C) The IETF Trust (2007). Abstract This document defines a protocol and specification format that can be used by a client to discover how a Top Level Domain (TLD) is organized in terms of what subdomains are used to place closely related but independent domains, e.g. commercial domains in country code TLDs (ccTLD) like .uk are placed in the .co.uk subTLD domain. Pettersen Expires January 10, 2008 [Page 1] Internet-Draft SubTLD Structure Protocol July 2007 This information is then used to limit which domains an Internet service can set cookies for, strengthening the rules already defined by the cookie specifications. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 1. Introduction The Domain Name System [RFC1034] used to name Internet hosts allows a wide range of hierarchical names to be used to indicate what a given host is, some implemented by the owners of a domain, such as creating subdomains for certain tasks or functions, others by the Top Level Domain registry owner to indicate what kind of service the domain is, e.g. commercial, educational, government or geographic location, e.g. city or state. While this system makes it relatively easy for TLD administrators to organize online services, and for the user to locate and recognize relevant services, this flexibility causes various security and privacy related problems when services located at different hosts are allowed to share data through functionality administrated by the client, e.g. HTTP state management cookies [RFC2965] [NETSC]. Most information sharing mechanisms make the process of sharing easy, perhaps too easy, since in many cases there is no mechanism to ensure that the servers receiving the information really want it, and it is often difficult to determine the source of the information being shared. To some extent [RFC2965] addresses some of these concerns for cookies, in that clients that supports [RFC2965] style cookies sends the target domain for the cookie along with the cookie so that the recipient can verify that the cookie has the correct domain. Unfortunately, [RFC2965] is not widely deployed in clients, or on servers. The recipient(s) can make inappropriate information sharing more difficult by requiring the information to contain data identifying the source and assuring the integrity of the data, e.g. by use of cryptographic technologies. These techniques tend, however, to be computationally costly. There are two problem areas: o Incorrect sharing of information between non-associated services e.g. example1.com and example2.com or example1.co.uk and example2.co.uk. That is, the information may be distributed to all services within a given Top Level Domain. Pettersen Expires January 10, 2008 [Page 2] Internet-Draft SubTLD Structure Protocol July 2007 o Undesirable information sharing within a single service. This is, in particular, a problem for services that sell hosting services to many different customers, such as webhotels, where the service itself has little or no control of the customers actions. While both these problems are in some ways similar, they call for different solutions. This specification will only propose a solution for the first problem area. The second problem area must be handled separately. This specification will first define a TLS Subdomain Structure Protocol that can be used to discover the actual structure of a Top Level Domain e.g. that the TLD have several subTLDs co.tld, ac.tld, org.tld, then it will show how this information can be used to determine when information sharing through cookies is not desirable. 2. The TLD Subdomain Structure Protocol The TLD Subdomain Structure Protocol is an HTTP service, managed by the TLD owner, and located at a well known URI location that, when queried, returns information about a TLD's domain structure. The client can then use this information to decide what actions are permitted for the protocol data the client is processing. Procedure for use: o The client should retrieve the domain list for the Top Level Domain "tld" from https://www.subdomains.tld/tld/domainlist . [The actual location must be decided by IANA, this section contains the author's suggestion. Due to security considerations it should be considered whether or not an https URL, or at least a signed file should be used] o If the client is not able to retrieve a list from the "subdomains" server it MAY attempt to retrieve a list from a vendor specified location. Alternatively, the vendor MAY mirror the list from the "subdomains" server(s), and only retrieve the lists from the vendor specified location. o The Content-Type of the returned list MUST be application/ subdomain-structure. o The retrieved specification SHOULD be cached for at least 30 days o The TLD owner SHOULD update the list at least 90 days before a new sub-domain becomes active. o If no specification can be retrieved the user agent MAY fall back to alternative methods, depending on the profile. Pettersen Expires January 10, 2008 [Page 3] Internet-Draft SubTLD Structure Protocol July 2007 2.1. Securing the domain information Individuals with malicious intent may wish to modify the domain list served by the service location to either classify a domain incorrectly as a subTLD or to hide a subTLD's classification. Beside obviously securing the hosting locations, this also means that the content served will have to be secured. 1. Digitally sign the specification, using one of the available message signature methods, e.g. S/MIME [RFC2311]. This will secure the content during storage both at the client and the server, as well as during transit. The drawback is that the client must implement decoding and verification of the message format which it may not already support, which may be problematic for clients having limited resources. 2. Using an encrypted connection, such as HTTP over TLS [RFC2818], which is supported by many clients already. Unfortunately, this method does not protect the content when stored by the client. 3. Use XML Signatures [RFC3275] to create a signature over the specification. This method is currently not defined. This specification recommends using HTTP over TLS, and the client MUST use the non-anonymous cipher suites, to secure the transport of the specification. The client MUST ensure that the hostname in the certificate matches the hostname used in the request. 2.2. Domainlist format The domain list file can contain a list of subdomains that are considered top level domains, as well as a special list of names that are not top level domains. None of the domain lists need specify the TLD name, since that is implicit from the request URI. The domain names listed MUST be encoded in punycode, according to [RFC3490]. 2.2.1. Domainlist schema The domain list is an XML file that follows the following schema Pettersen Expires January 10, 2008 [Page 4] Internet-Draft SubTLD Structure Protocol July 2007 default namespace = "http://xmlns.opera.com/tlds" start = element tld { attribute levels { xsd:integer }?, attribute name { xsd:NCName }?, (domain | registry)* } registry = element registry { attribute levels { xsd:integer }?, attribute name { xsd:NCName }, (domain | registry)* } domain = element domain { attribute name { xsd:NCName } } The domainlist file contains a single block, which may contain multiple registry and domain blocks, and a registry block may also contain multiple registry and domain blocks. Both domain and registry tags MUST contain a name attribute identifying the domain or registry. The tld block MAY have a name attribute, but this name MUST be ignored by clients, which must instead use the name of the TLD used to request the file. All names MUST be punycode encoded to make it possible for clients not supporting IDNA to use the document. The tld and registry blocks MAY contain an attribute, levels, specifying how many levels below the current domain are registry- like. The default is none, meaning that the default inside the current domain level is that labels are ordinary domains and not registry-like. If the levels attribute is 1 (one) it means that by default all next-level labels within the registry/tld are registry like and not normal domains. Implementations MUST ignore attributes and syntax they do not recognize. 2.2.2. Domainlist interpretation For each new registry or domain block within the tld or registry the effective domain name the block applies to is the name of the block prepended to the ".name" of the effective domain name of the containing block. Pettersen Expires January 10, 2008 [Page 5] Internet-Draft SubTLD Structure Protocol July 2007 For the tld block the effective domain name is the name of the TLD the client is evaluating, and for the registry block named "example" the effective name becomes example.tld. In the above example, the specification is for the TLD "tld". By default any second level domain "x.tld" is a registry-like domain, although parliament.tld is not a registry-like domain In the example TLD, however, the co.tld registry has a sub registry "state.co.tld", while all other domains in the co.tld domains are ordinary domains. Also, the registry example.tld has defined all domains y.example.tld as registry like, with no exceptions. 3. A TLD Subdomain Structure Protocol profile for Cookies HTTP State management cookies is one area where it is important, both for security and privacy reasons, to ensure that unauthorized services cannot set cookies for another service. Inappropriate cookies can affect the functionality of a service, but may also be used to track the users across services in an undesirable fashion. Neither the original Netscape cookie specification [NETSC] nor [RFC2965] are adequate in many cases. The [NETSC] rules require only that the target domain must have one internal dot (e.g. example.com) if the TLD belong to a list of generic TLDs (gTLD), while for all TLDS the domain must contain two internal dots (e.g. example.co.uk). The latter rule was never properly implemented, in particular due to the many flat ccTLD domain structures that are in use. [RFC2965] set the requirement that cookies can only be set for the server's parent domain. Unfortunately, both policies still leave open the possibility of setting cookies for a subTLD by setting the cookie from a host name example.subtld.tld to the domain subtld.tld, which is by itself legal, but not desirable because that means that the cookie can be Pettersen Expires January 10, 2008 [Page 6] Internet-Draft SubTLD Structure Protocol July 2007 sent to numerous websites either revealing sensitive information, or interfering with those other websites without authroization. As can be seen, these rules do not work satisfactorily, especially when applied to ccTLDs, which may have a flat domain structure similar to the one used by the generic .com TLD, a hierarchical subTLD structure like the one used by the .uk ccTLD (e.g. .co.uk), or a combination of both. But there are also gTLDs, such as .name, for which cookies should not be allowed for the second level domains, as these are generally family names shared between many different users, not service names. A partially effective method for distinguishing servicenames from subTLDs by using DNS has been defined in [DNSCOOKIE]. However this method is not immune to TLD regsitries that uses subTLDs as directories, or to services that does not define an IP address for the domainname. Using the TLD Subdomain Structure Protcol to retrieve a list of all subTLDs in a given TLD will solve both those problems. 3.1. Procedure for using the TLD Subdomain Structure Protcol for cookies When receiving a cookie the client must first perform all the checks required by the relevant specification. Upon completion of these checks the client then performs the following additional verification checks if the cookie is being set for the server's parent, grand- parent domain (or higher): 1. If the domain structure of the TLD is not known already, or the structure information has expired, the client should retrieve or validate the structure specification from the server hosting the specification, according to section 2. If retrieval is unsuccessful, and no copy of the specification is known, the client MAY use alternative methods to decide the domain's status, e.g. those described in [DNSCOOKIE], or other heuristics. Evaluate the specification as specified in section 2. If the target domain is part of the subTLD structure the cookie MUST be discarded. 2. If the target domain is not a subTLD, the cookie is accepted. 3.2. Unverifiable transactions Use of HTTP Cookies, combined with HTTP requests to resources that are located in domains other than the one the user actually wants to visit, have caused widespread privacy concerns. The reason is that multiple websites can link to the same independent website, e.g. an advertiser, who may then use cookies to build a profile of the visitor, that can be used to select advertisements that are of interest to the user. Pettersen Expires January 10, 2008 [Page 7] Internet-Draft SubTLD Structure Protocol July 2007 [RFC2965] specified that if the name of the host of an included resource does not domain match the domain reach (defined as the parent domain of the host) of the URL of the document the user started loading, loading the resource is considered an unverifiable transcation, and in such third party transactions cookies should not be sent or accepted. The latter point is not widely implemented, except when selected by especially interested users. This means that server1.example.com and server2.example.com can share cookies, and either can be referenced automatically (e.g. by including an image) by the other without being considered an unverifiable transaction, while requests to server3.example2.com would be considered an unverifiable transaction. However, like the normal domain matching rule for cookies, this rule opens up some holes. If the host example.co.uk requests a resource from server4.example3.co.uk, the request to example3.co.uk server would not be considered an unverifiable transaction because example.co.uk's reach is co.uk, which domain matches server4.example3.co.uk, a conclusion which is obviously, to a human with some knowlegde of the .uk domain structure, incorrect. To avoid such misclassifications clients SHOULD apply the procedure specified in 3.1 for the reach domain used to decide if a request is an unverifiable, and if the reach domain is a subTLD, the reach of the original host must be changed to become the same as the name of the host itself, and requests that do not domain match the original host's name must be considered unverifiable transactions. That is, the reach for example.co.uk becomes example.co.uk, not co.uk, and example3.co.uk will therefore not domain match the resulting reach. 4. Examples The following examples demonstrates how the TLD Subdomain Structure Protcol can be used to decide cookie domain permissions. 4.1. Example 1 < ?xml version="1.0" encoding="UTF-8"?> This specification means that all names at the top level are subTLDs, except "example.tld" for which cookies are allowed. Cookies are also implicitly allowed for any y.x.tld domains. Pettersen Expires January 10, 2008 [Page 8] Internet-Draft SubTLD Structure Protocol July 2007 4.2. Example 2 < ?xml version="1.0" encoding="UTF-8"? > This specification means that example1.tld and example2.tld and any domains foo.example1.tld and bar.example2.tld are registry-like domains for which cookies are not allowed, for any other domains cookies are allowed. 4.3. Example 3 This example has the same meaning as Example 2, but with the exception that the domain example3.example2.tld is a regular domain for which cookies are allowed. 5. IANA Considerations This specification requires that the domain list is retrievable from a well-known location. This means that a hostname or group of hostnames must be assigned to serve the domain list. Suggestions for where to located the service are described in section 5.1 The specification also requires that responses are served with a specific media type. Section 5.2 provides the registration of this media type. 5.1. Location of the TLD Subdomain Structure specification The location of the domain list must be located at a location that can easily be deduced by the client from the name of the TLD. Several possibilities exist: 1. A reserved domain name in the TLD's name space e.g. https://www.subdomains.tld/domainlist or https://subdomains.nictld.tld/domainlist . Pettersen Expires January 10, 2008 [Page 9] Internet-Draft SubTLD Structure Protocol July 2007 2. A common repositiory,, e.g. https://subdomains.example.org/tld/domainlist, managed by the IANA or another Internet governance body The benefit of the first alternative is that the data are not located at a single repository which makes it more difficult to shut down the system completely. On the other hand the TLD registries may find the overhead of maintaining such a service burdensome, and therefore avoid implementing it, or let the service lapse. The second alternative creates a common repository, which may increase adoption. On the other hand, a single location makes it more susceptible to denial of service attacks. 5.2. Registration of the application/subdomain-structure Media Type Type name : application Subtype name: subdomain-structure Required parameters: none Optional parameters: none Encoding considerations: The content of this media type is always transmitted in binary form. Security considerations: See Section 6 Interoperability considerations: none Published specification: This document Additional information: Magic number(s): none File extension(s): Macintosh file type code(s): Person & email address to contact for further information: Yngve N. Pettersen Email: yngve@opera.com Intended usage: common Restrictions on usage: none Pettersen Expires January 10, 2008 [Page 10] Internet-Draft SubTLD Structure Protocol July 2007 Author/Change controller: Yngve N. Pettersen Email: yngve@opera.com 6. Security Considerations Retrieval of the specifications are vulnerable to denial of service attacks or loss of network connection. Hosting the specifications at a single location can increase this vulnerability, although the exposure can be reduced by using mirrors with the same name, but hosted at different network locations. This protocol is as vulnerable to DNS security problems as any other [RFC2616] HTTP based service. Requiring the specifications to be digitally signed or transmitted over a authenticated TLS connection reduces this vulnerabity. Section 3 of this document describe using the domain list defined in section 2 as a method of increasing security. The effectiveness of the domain list for this purpose, and the resulting security for the client depend both on the integrity of the list, and its correctness. The integrity of the list depends on how securely it is stored at the server, and how securely it is transmitted. This specification recommends downloading the domain list using HTTP over TLS, which makes the transmission as secure as the message authentication mechanism used (encryption is not required), and the servers should be configured to use the strongest available key lengths and authentication mechansims. An alternative approach would be to digitally sign the files. The correctness of the list depends on how well the TLD registry defined it. A list that does not include some subTLDs may expose the client to potential privacy and security problems, but not any worse than the situation would be without this protocol and profile, while a subdomain incorrectly classified as a subTLD can lead to denial of service for the affected services. Both of the problems can be prevented by careful construction and auditing of the lists, both by the TLD registry, and by interested thirdparties. 7. Acknowledgements Anne van Kesteren assisted with defining the XML format in Section 2.2.1. 8. References Pettersen Expires January 10, 2008 [Page 11] Internet-Draft SubTLD Structure Protocol July 2007 8.1. Normative References [NETSC] "Persistent Client State HTTP Cookies", . [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", STD 13, RFC 1034, November 1987. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. [RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000. [RFC2965] Kristol, D. and L. Montulli, "HTTP State Management Mechanism", RFC 2965, October 2000. [RFC3275] Eastlake, D., Reagle, J., and D. Solo, "(Extensible Markup Language) XML-Signature Syntax and Processing", RFC 3275, March 2002. [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003. 8.2. References [DNSCOOKIE] Pettersen, Y., "Enhanced validation of domains for HTTP State Management Cookies using DNS. Work in progress.", July 2006, . [RFC2311] Dusse, S., Hoffman, P., Ramsdell, B., Lundblade, L., and L. Repka, "S/MIME Version 2 Message Specification", RFC 2311, March 1998. Appendix A. Open issues o Download location URI for the original domain lists o Should Digital signatures be used on the files, instead of using TLS? Pettersen Expires January 10, 2008 [Page 12] Internet-Draft SubTLD Structure Protocol July 2007 Author's Address Yngve N. Pettersen Opera Software ASA Waldemar Thranes gate 98 N-0175 OSLO, Norway Email: yngve@opera.com Pettersen Expires January 10, 2008 [Page 13] Internet-Draft SubTLD Structure Protocol July 2007 Full Copyright Statement Copyright (C) The IETF Trust (2007). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). Pettersen Expires January 10, 2008 [Page 14]