Network Working Group Y. Pettersen Internet-Draft Vivaldi Technologies AS Updates: 6265 (if approved) February 13, 2014 Intended status: Experimental Expires: August 17, 2014 The Public Suffix Structure file format and its use for Cookie domain validation draft-pettersen-subtld-structure-10 Abstract This document defines the term "Public Suffix domain" as meaning a domain under which multiple parties that are unaffiliated with the owner of the Public Suffix domain may register subdomains. Examples of Public Suffix domains include "org", "co.uk", "k12.wa.us" and "uk.com". It also defines a file format that can be used to distribute information about such Public Suffix domains to relying parties. As an example, this information is then used to limit which domains an Internet service can set HTTP cookies for, strengthening the rules already defined by the cookie specification. This specification updates RFC 6265 [RFC6265] by defining the term "Public Suffix domain". Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on August 17, 2014. Pettersen Expires August 17, 2014 [Page 1] Internet-Draft Public Suffix February 2014 Copyright Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English. 1. Introduction The Domain Name System (DNS) [RFC1034] used to name Internet hosts allows a wide range of hierarchical names to be used to indicate what a kind of business a given host is engaged in. Some are implemented by the owners of a domain, such as by creating subdomains for certain tasks or functions, while others, called Public Suffixes (or registry-like domains), are created by the Top Level Domain (TLD) registry owner or individual domain owners to indicate what kind of service the hosts under the domain provides, e.g., commercial, educational, governmental or geographical location, such as city or state. While this system makes it relatively easy for TLD administrators to organize online services, and for the user to locate and recognize relevant services, this flexibility causes various security and privacy-related problems when services located at different hosts are allowed to share data through functionality administrated by the client, e.g., HTTP state management cookies [RFC6265] and cross- document information sharing in ECMAScript DOM. Most of these information-sharing mechanisms make the process of sharing easy, perhaps too easy, since, in many cases, there is no mechanism to ensure that the servers receiving the information really want it. It is also often difficult to determine the original source of the Pettersen Expires August 17, 2014 [Page 2] Internet-Draft Public Suffix February 2014 information being shared. To some extent, [RFC2965] tried to address some of these concerns for cookies, in that clients that send [RFC2965]-style cookies also send the target domain for the cookie along with the cookie so that the recipient can verify that the cookie has the correct domain. Unfortunately, [RFC2965] was never widely deployed in clients or on servers. The recipient server(s) can make inappropriate information sharing more detectable by requiring the information to contain data identifying the source, as well as by assuring the integrity of the data, e.g., by using cryptographic technologies. However, these techniques tend to be computationally costly. There are two problem areas: o Incorrect sharing of information between non-associated services e.g., example1.com and example2.com or example1.co.uk and example2.co.uk -- That is, the information may be distributed to all services within a given Top Level Domain or Public Suffix domain. Sharing within a TLD is usually prevented by a simple rule that does not permit it, but that is more difficult for general Public Suffix domains, since they do not have a well defined pattern. o Undesirable information sharing within a single service -- This is, in particular, a problem for services that sell hosting services to many different customers, such as web hotels, where the service itself has little or no control of the customers' actions. While both these problems are in some ways similar, they call for different solutions. This specification will only propose a solution for the first problem area. The second problem area must be handled separately. This specification will first define what Public Suffixes are; then it will propose a file format that can be used to distribute information about the Public Suffixes within a Top Level Domain, e.g., that the TLD have several Public Suffix domains, such as co.tld, ac.tld, org.tld. Finally it will show how this information can be used to determine when information sharing through cookies is not desirable. 2. Public Suffix domains A Public Suffix domain is used very much like a Top Level Domain. The owner or operator of the domain allow unaffiliated third parties to register domain names in the Public Suffix domain and to control all activity inside the registered domain. Pettersen Expires August 17, 2014 [Page 3] Internet-Draft Public Suffix February 2014 While most such domains are open to registration by the general public, the owner of the Public Suffix domain may have defined restrictions on which third parties may register a domain, such as only private persons, schools, or government agencies, to mention a few. Such limitations do not change the fact that each domain registrant is nominally independent of all other domain registrants in the Public Suffix domain, as well as of the owner of the Public Suffix domain. Just like a domain under a TLD may be a Public Suffix domain, a domain registered under a Public Suffix domain may also be a Public Suffix domain. Examples of this are the state.us and city.state.us Public Suffix domains in the dot-US ccTLD. There are various categories of Public Suffix domains. The most common category is the second-level domain, used by many ccTLDs to group content, such as co.tld for commercial, ac.tld for academic institution, and gov.tld for government. Another common category is Public Suffixes dedicated to geographical locations, such as as states, provinces, and cities, such as the city.state.us domain organization used by the US ccTLD, possibly with more Public Suffix domains within these domains. A third category is ISP shared hosting, and "vanity" domain names, e.g., country.com. In addition, a number of social websites provide their users with direct access names under their domains (e.g., http://example-user.example.com, rather than using http://www.example.com/example-user URLs). Information about Public Suffix domains can be used in several security features in clients: o Blocking websites' ability to set cookies to the Public Suffix domain o Limiting the ability of active content, such as EcmaScript, from affecting content in independent domains that happen to share the same Public Suffix domain as the source domain o Highlighting the actual domain in the displayed URL in the client's UI, to reduce the potential of a malicious site misleading the user with a URL that identifies the host as www .well-known-site.com.example.pubsuffix-domain.tld, which might lead users to think they are visiting www.well-known-site.com rather than a site in the domain example.pubsuffix-domain.tld As there is currently no reliable method in DNS or other protocols that allows clients to automatically recognize a Public Suffix domain, the owner of such a domain must self-declare the domain's status as a Public Suffix domain and register it with each of the Pettersen Expires August 17, 2014 [Page 4] Internet-Draft Public Suffix February 2014 repositories that track such information. Such self-registration may lead to inconsistencies between the various repositories, which could cause security problems to develop. A more reliable method might be that the TLD registrar collect such information from their registered domains, and make it available to relying parties. Section 3 presents a XML-based format for how this information can be published and shared by the TLD registrar. 3. The Public Suffix Structure file format The Public Suffix Structure file format specifies how to encode information about Public Suffix domains inside a TLD. It is based on XML and is able to specify Public Suffixes and exceptions to any level of the domain hierarchy that is desirable. 3.1. Domain list format The domain list file can contain a list of subdomains that are considered Public Suffix domains, as well as a special list of names that are not top level domains. None of the domain lists need specify the TLD name, since that is implied either by the file that is parsed or by the content of the tag. The domain names listed MUST be encoded in punycode, as specified by [RFC5891]. 3.1.1. Domain list schema The domain list is an XML file that follows the following schema: default namespace = "http://xmlns.opera.com/tlds" start = element tld { attribute levels { xsd:nonNegativeInteger | "all"}, attribute name { xsd:NCName }, (domain | registry)* } registry = element registry { attribute levels { xsd:nonNegativeInteger }, attribute name { xsd:NCName }, attribute all { string "true" | string "false" }, (domain | registry)* } domain = element domain { attribute name { xsd:NCName } } Pettersen Expires August 17, 2014 [Page 5] Internet-Draft Public Suffix February 2014 The domain list file usually contains a single -block (but may contain multiple entries), which may contain multiple registry and domain blocks, and a registry block, which define a Public Suffix domain, may also contain multiple registry and domain blocks. When used alone, the -block MAY contain a name field identifying the TLD name; if there are multiple -blocks, each block MUST specify the name field. Both and tags MUST contain a name attribute identifying the domain or registry. The -block MAY have a name attribute, but in files with a single -block this name MUST be ignored by clients, which must instead use the name of the TLD used to request the file. All names SHOULD be punycode encoded [RFC5891] to make it possible for clients unaware of either Unicode or IDNA to use the document. The - and -blocks MAY contain an attribute, "levels", specifying how many levels below the current domain are Public Suffixes. The default is "none", meaning that the default inside the current domain level is that labels are ordinary domains and not Public Suffix domains. If the value of the "levels" attribute is 1 (one) by default all next-level labels within the registry/TLD are Public Suffix domains, not normal domains. If the value of the "levels" attribute is the case-insensitive token "all", then all subdomains domains below the current domain are Public Suffix domains, by default. A -block with the attribute "all" set to "true" inside the declaration for the registry domain example.tld indicates that all domains x.example.tld are also Public Suffix domains, by default, unless a domain is specified differently by a different declaration. The registry-all block may contain additional - or -blocks, which then apply to domains foo.x.example.tld, for all domains x, except those that have separate entries. This allows specification of wildcard structures, where the structure for lower domains are similar for all domains. Implementations MUST ignore attributes and syntax they do not recognize. 3.1.2. Domainlist interpretation For each new - or -block within the - or -block, the effective domain name to which the block applies is the name of the block prepended to the ".parentdomain" of the effective domain name of the containing block. Pettersen Expires August 17, 2014 [Page 6] Internet-Draft Public Suffix February 2014 For the -block the effective domain name is the name of the TLD the client is evaluating, and for the -block named "example" the effective name becomes example.tld. In the above example, the specification is for the TLD "tld". By default any second level domain "x.tld" is a Public Suffix domain; although, parliament.tld is not a Public Suffix domain, but a normal domain. In the example TLD, however, the co.tld registry has a sub registry "state.co.tld", while all other domains in the co.tld domains are ordinary domains. Additionally, all domains x.province.tld are Public Suffixes, and all school.x.province.tld are normal domains for all domains x in province.tld. Also, the registry example.tld has defined all domains y.example.tld as Public Suffixes, with no exceptions. 3.2. Public Suffix Structure as a web service The Public Suffix structure file can be provided as an HTTP service, managed by either the application vendor, the TLD owners, or some other trusted organization, and it can be located at a URI location that, when queried, returns information about a TLD's domain structure. The client can then use this information to decide what actions are permitted for the protocol data the client is processing. The procedure for use as a service is as follows: o The client retrieves the domain list for the Top Level Domain "tld" from the vendor specified URI https://tld- Pettersen Expires August 17, 2014 [Page 7] Internet-Draft Public Suffix February 2014 structure.example.com/tld/domainlist . Multiple alternative URIs for a fallback procedure may be specified. o The Content-Type of the returned list MUST be application/ subdomain-structure. o The retrieved specification SHOULD be cached by the client for at least 30 days. o The TLD owner SHOULD update the list at least 90 days before a new sub-domain becomes active. o If no specification can be retrieved the user agent MAY fall back to alternative, undefined methods, depending on the profile. 3.3. Securing the domain information Individuals with malicious intent may wish to modify the domain list served by the service location to either classify a domain incorrectly as a Public Suffix domain or to hide a Public Suffix domain's classification. Besides obviously securing the hosting locations, this also means that the content served will have to be secured. 1. Digitally sign the specification, using one of the available message signature methods, e.g., S/MIME [RFC2311]. This will secure the content during storage both at the client and the server, as well as during transit. The drawback is that the client must implement decoding and verification of the message format that it may not already support, which may be problematic for clients having limited resources. 2. Use an encrypted connection, such as HTTP over TLS [RFC2818], which is supported by many clients already. Unfortunately, this method does not protect the content when stored by the client. 3. Use XML Signatures [RFC3275] to create a signature over the specification. This method is currently not defined. This specification recommends using HTTP over TLS, and the client MUST use the non-anonymous cipher suites, to secure the transport of the specification. The client MUST ensure that the hostname in the certificate matches the hostname used in the request. Pettersen Expires August 17, 2014 [Page 8] Internet-Draft Public Suffix February 2014 4. A Public Suffix Structure file format profile for HTTP Cookies The HTTP State management cookies area is one where it is important, both for security and privacy reasons, to ensure that unauthorized services cannot set cookies for another service. Inappropriate cookies can affect the functionality of a service, but they may also be used to track the users across services in an undesirable fashion. Neither the original Netscape cookie specification[NETSC], [RFC2965] or [RFC6265] are adequate in many cases. The original Netscape specification's rules required only that the target domain must have one internal dot (e.g., example.com) if the TLD belongs to a list of generic TLDs (gTLD), while for all other TLDs the domain must contain two internal dots (e.g., example.co.uk). The latter rule was never properly implemented, in particular due to the many flat ccTLD domain structures that are in use. (The successor [RFC6265] has since expanded this policy to exclude domains listed in client-specific lists of "public suffixes").[RFC2965] set the requirement that cookies can only be set for the server's parent domain. Unfortunately, both the [NETSC] and [RFC2965] policies still left open the possibility of setting cookies for a Public Suffix domain by setting the cookie from a host name example.pubsuf.tld to the domain pubsuf.tld, which is by itself legal, but not desirable, because that means that the cookie can be sent to numerous websites either revealing sensitive information, or interfering with those other websites without authorization. As can be seen, these rules do not work satisfactorily, especially when applied to ccTLDs, which may have a flat domain structure similar to the one used by the generic .com TLD, a hierarchical Public Suffix domain structure like the one used by the .uk ccTLD (e.g., .co.uk), or a combination of both. However, there are also gTLDs, such as .name, for which cookies should not be allowed for the second-level domains, as these are generally family names shared between many different users, not service names. A partially effective method for distinguishing service names from Public Suffix domains by using DNS was developed by Opera Software ASA. However, this method was not immune to TLD registries that use Public Suffix domains as directories or to services that do not define an IP address for the domain name. Using the Public Suffix Structure file format to retrieve a list of all Public Suffix domains in a given TLD will solve both those problems. Pettersen Expires August 17, 2014 [Page 9] Internet-Draft Public Suffix February 2014 4.1. Procedure for using the Public Suffix Structure file format for cookies When receiving a cookie, the client must first perform all the checks required by the relevant specification. Upon completion of these checks the client then performs the following additional verification checks if the cookie is being set for the server's parent, grand- parent domain (or higher): 1. If the Public Suffix domain structure of the TLD is not known already, or the structure information has expired, according to the client's policies, the client should retrieve or revalidate the structure specification from the server hosting the specification, according to Section 3. If retrieval is unsuccessful, and no copy of the specification is known, the client MAY use alternative information or heuristics to decide the domain's status. Upon successful retrieval the specification is evaluated as specified in Section 3. If the target domain is designated as a Public Suffix domain, then the cookie MUST be discarded or, alternatively, processed as if it had not specified a domain attribute. 2. If the target domain is not a Public Suffix domain, the cookie is accepted (unless other policies configured for the client prevent this). 4.2. Third party cookies Use of HTTP Cookies, combined with HTTP requests to resources that are located in domains other than the one the user actually wants to visit, have caused widespread privacy concerns. The reason is that multiple websites can link to the same independent website, e.g., an advertiser, who may then use cookies to build a profile of the visitor, which can be used to select advertisements that might be of interest to the user. Some clients have therefore implemented restrictions on what cookie related activities are accepted in relation to a third-party domain. Frequently, such restrictions are based on determining whether the two hosts share the same immediate parent domain as a domain suffix, or if the first domain of the two is a parent domain (suffix) of the other. This determination method might incorrectly classify a third-party server as a first party if the immediate parent domain of the first party server is a Public Suffix domain, and possibly break the user's privacy expectations. Pettersen Expires August 17, 2014 [Page 10] Internet-Draft Public Suffix February 2014 To avoid such misclassifications, when the two servers have the first party server's immediate parent domain as a shared suffix, clients SHOULD apply the procedure specified in Section 4.1 for this domain, and, if this domain is determined to be a Public Suffix domain, the second host must be considered a third party. That is, if the first party server example.co.uk causes a request for a resource at example3.co.uk the parent domain co.uk is determined to be a Public Suffix domain, and example3.co.uk is therefore a third-party server, even if they share the same immediate parent domain. 5. Examples The following examples demonstrate how the Public Suffix Structure file format can be used to decide cookie domain permissions. 5.1. Example 1 This specification means that all names at the top level are Public Suffix domains, except "example.tld" for which cookies are allowed. Cookies are also implicitly allowed for any y.x.tld domains. 5.2. Example 2 This specification means that example1.tld and example2.tld and any domains (foo.example1.tld and bar.example2.tld) are Public Suffix domains for which cookies are not allowed; for any other domains cookies are allowed. 5.3. Example 3 Pettersen Expires August 17, 2014 [Page 11] Internet-Draft Public Suffix February 2014 This example has the same meaning as Example 2, but with the exception that the domain example3.example2.tld is a regular domain for which cookies are allowed. 6. IANA Considerations This specification also requires that responses are served with a specific media type. Below is the registration information for this media type. 6.1. Registration of the application/subdomain-structure Media Type Type name : application Subtype name: subdomain-structure Required parameters: none Optional parameters: none Encoding considerations: The content of this media type is always transmitted in binary form. Security considerations: See Section 7. Interoperability considerations: none Published specification: This document Additional information: Magic number(s): none File extension(s): Macintosh file type code(s): Person & email address to contact for further information: Yngve N. Pettersen Pettersen Expires August 17, 2014 [Page 12] Internet-Draft Public Suffix February 2014 Email: yngve@vivaldi.com Intended usage: common Restrictions on usage: none Author/Change controller: Yngve N. Pettersen Email: yngve@vivaldi.com 7. Security Considerations Retrieval of the Public Suffix Structure specifications is vulnerable to denial of service attacks or loss of network connection. Hosting the specifications at a single location can increase this vulnerability, although the exposure can be reduced by using mirrors with the same name but hosted at different network locations. This protocol is as vulnerable to DNS security problems as any other [RFC2616] HTTP-based service. Requiring the specifications to be digitally signed or transmitted over a authenticated TLS connection reduces this vulnerability. Section 4 of this document describes using the domain list defined in Section 3 as a method of increasing security and privacy. The effectiveness of the domain list for this purpose, and the resulting security and privacy improvements for the user, depend both on the integrity of the list, and its correctness. The integrity of the list depends on how securely it is stored in the repository and how securely it is transmitted. This specification recommends downloading the domain list using HTTP over TLS [RFC2818], which makes the transmission as secure as the message authentication mechanism used (encryption is not required), and the servers should be configured to use the strongest available key lengths and authentication mechanisms. An alternative or complimentary approach would be to digitally sign the files. The correctness of the list depends on how well the TLD registry defined it, or how well the list maintainer have been able to collect correct information. A list that does not include some Public Suffix domains may expose the client to potential privacy and security problems, but the situation would not be any worse than it would be without this protocol and profile, while a subdomain incorrectly classified as a Public Suffix domain can lead to denial of service for the affected services. Both of the problems can be prevented by careful construction and auditing of the lists, both by the TLD regis try and by interested third parties. Pettersen Expires August 17, 2014 [Page 13] Internet-Draft Public Suffix February 2014 8. Acknowledgments Anne van Kesteren assisted with defining the XML format in Section 3.1.1. The Public Suffix List project [PUBSUFFIX] was initiated by members of the Mozilla Community, and members of the project, in particular Gervase Markham, have provided input to this document. 9. References 9.1. Normative References [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", STD 13, RFC 1034, November 1987. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2311] Dusse, S., Hoffman, P., Ramsdell, B., Lundblade, L., and L. Repka, "S/MIME Version 2 Message Specification", RFC 2311, March 1998. [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. [RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000. [RFC3275] Eastlake, D., Reagle, J., and D. Solo, "(Extensible Markup Language) XML-Signature Syntax and Processing", RFC 3275, March 2002. [RFC5891] Klensin, J., "Internationalized Domain Names in Applications (IDNA): Protocol", RFC 5891, August 2010. [RFC6265] Barth, A., "HTTP State Management Mechanism", RFC 6265, April 2011. 9.2. Non-normative references [NETSC] "Persistent Client State HTTP Cookies", . Pettersen Expires August 17, 2014 [Page 14] Internet-Draft Public Suffix February 2014 [PUBSUFFIX] "The Homepage of the Public Suffix List, a list of registry-like domains gathered by volunteers.", . [RFC2965] Kristol, D. and L. Montulli, "HTTP State Management Mechanism", RFC 2965, October 2000. Appendix A. Collection of information for the TLD structure specification This document does not define how the information encoded in the TLD Structure Specification is gathered. There are several methods available for collecting the information encoded in the TLD Structure Specification, the two main ones being: Data provided by the TLD registry owner through a machine readable repository at well known locations Data gathered by one or more application vendors based on publicly available information, such as the Mozilla Project's Public Suffix List[PUBSUFFIX], Appendix B. Alternative solutions A possible alternative to the format specified in Section 3, encoding the information directly in the DNS records for the Public Suffix domain, using a DNS extension. Accessing this type of information requires that the client or its environment is able to directly access the DNS network. In many environments, e.g., firewalled systems, this may not be possible. Also, not all runtime environments can provide this information, which may lead to a DNS client embedded directly in the client. For some applications, it may be necessary, due to system limitations, to access this information through an online web service in order to provide the necessary information for each hostname or domain visited. A web service may, however, introduce unnecessary privacy problems, as well as delays each time a new domain is tested. Appendix C. Open issues o Download location URI for the original domain lists o Should Digital signatures be used on the files, instead of using TLS? Pettersen Expires August 17, 2014 [Page 15] Internet-Draft Public Suffix February 2014 Author's Address Yngve N. Pettersen Vivaldi Technologies AS Norway Email: yngve@vivaldi.com Pettersen Expires August 17, 2014 [Page 16]