idnits 2.17.1 draft-ohye-canonical-link-relation-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 6, 2012) is 4424 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 5988 (Obsoleted by RFC 8288) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Ohye 3 Internet-Draft J. Kupke 4 Intended status: Informational March 6, 2012 5 Expires: September 7, 2012 7 The Canonical Link Relation 8 draft-ohye-canonical-link-relation-05 10 Abstract 12 RFC5988 specified a way to define relationships between links on the 13 web. This document describes a new type of such relationship, 14 "canonical", to designate an IRI as preferred over resources with 15 duplicative content. 17 Editorial Note (To be removed by RFC Editor) 19 Distribution of this document is unlimited. Comments should be sent 20 to the IETF Apps-Discuss mailing list (see 21 ). 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on September 7, 2012. 40 Copyright Notice 42 Copyright (c) 2012 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 1. Introduction 57 The canonical link relation specifies the preferred IRI from 58 resources with duplicative content. Common implementations of the 59 canonical link relation are to specify the preferred version of an 60 IRI from duplicate pages created with the addition of IRI parameters 61 (e.g., session IDs), or to specify the single-page version as 62 preferred over the same content separated on multiple component 63 pages. 65 In regard to the link relation type, "canonical" can be described 66 informally as the author's preferred version of a resource. More 67 formally, the canonical link relation specifies the preferred IRI 68 from a set of resources that return the context IRI's content in 69 duplicated form. Once specified, applications such as search engines 70 can focus processing on the canonical, and references to the context 71 (referring) IRI can be updated to reference the target (canonical) 72 IRI. 74 2. Notational Conventions 76 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 77 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 78 document are to be interpreted as described in [RFC2119]. 80 3. The Canonical Link Relation 82 The target (canonical) IRI MUST identify content that is either 83 duplicative or a superset of the content at the context (referring) 84 IRI. Authors who declare the canonical link relation ought to 85 anticipate that applications such as search engines can: 87 o Index content only from the target IRI (i.e. content from the 88 context IRIs will be likely disregarded as duplicative) 90 o Consolidate IRI properties, such as link popularity, to the target 91 IRI 93 o Display the target IRI as the representative IRI 95 The target (canonical) IRI MAY: 97 o Specify a relative IRI (see [RFC3986] Section 4.2) 99 o Be self-referential (context IRI identical to target IRI) 101 o Exist on a different hostname or domain 103 o Have different scheme names, such as "http" to "https," or 104 "gopher" to "ftp" 106 o Be a superset of the content at the context IRI 108 * As an example, each component page (e.g., page-1.html, page- 109 2.html) of a multi-page article MAY specify the "view-all" 110 version (e.g., page-all.html), the superset of their content, 111 as the target IRI. This is because the content from each 112 component page is contained within the view-all version. Given 113 this implementation, applications can mark page-1.html and 114 page-2.html as duplicates of page-all.html, process content 115 only from page-all.html, and disregard the component pages. 116 All references can then be made to the view-all version (page- 117 all.html, the target IRI), and no content will have been lost 118 in this process. 120 * Using the same example above, page-2.html SHOULD NOT designate 121 page-1.html as the target (canonical) IRI because this may 122 cause a loss of data. When page-2.html desginates page-1.html 123 as the canonical, only content from the target IRI, page- 124 1.html, will be processed. page-2.html may be marked as a 125 duplicate of page-1.html and its content disregarded. 127 o Be the source IRI of a temporary redirect. For HTTP, this refers 128 to status codes 302, 303, or 307 (Sections 10.3.3, 10.3.4, and 129 10.3.8, respectively, of [RFC2616]). 131 To better ensure that applications properly handle the canonical link 132 relation, administrators ought to consider the following guidelines: 134 o Specify only one canonical link relation for a resource. (It 135 would be confusing to consider/label/designate more than one IRI 136 as authoritative.) 138 o Avoid desginating the target (canonical) as: 140 * The source IRI of a permanent redirect (for HTTP, this refers 141 to 300 and 301 response codes, defined in Sections 10.3.1 and 142 10.3.2 of [RFC2616]) 144 * An IRI that also specifies a canonical link relation to an IRI 145 other than itself 147 * An IRI that returns an error code, such as 4xx response in HTTP 148 (Section 10.4 of [RFC2616]) 150 * The first page of a multi-page article or multi-page listing of 151 items (since the first page is not duplicative or a superset of 152 the context IRI). For example, page-2.html and page-3.html of 153 an article SHOULD NOT specify page-1.html as the canonical. 154 This may cause a loss of data from page-2.html and page-3.html 155 as they will be marked duplicative of page-1.html with only 156 content from page-1.html being processed. 158 When the canonical link relation is declared improperly, such as 159 creating chained canonicals (i.e., target IRI specifies the source 160 IRI of a permanent redirect) or designating a target IRI which 161 returns a 4xx response, applications can use their own heuristics 162 when processing the resource. For instance, an application can 163 choose to ignore any improper canonical designation and continue to 164 process the remaining content on a page. 166 4. Examples 168 The following example illustrates: 170 o Three IRIs that serve duplicate content 172 o One IRI which is the canonical or "preferred version" 174 o Two IRIs with additional query parameters, making them the non- 175 preferred version of the content (duplicates). The canonical link 176 relation is therefore specified on these duplicates. 178 If the preferred version of a IRI and its content exists at: 179 http://www.example.com/page.php?item=purse 181 Then duplicate content IRIs such as: 182 http://www.example.com/page.php?item=purse&category=bags 183 http://www.example.com/page.php?item=purse&category=bags&sid=1234 185 may designate the canonical link relation in HTML as specified in 186 [REC-html401-19991224]: 187 190 or as a relative IRI: 191 192 or alternatively, in the HTTP header field as specified in Section 5 193 of [RFC5988]: 194 Link: ; rel="canonical" 196 This signals to applications, such as search engines, that these are 197 duplicates of the target (canonical) IRI: 198 http://www.example.com/page.php?item=purse. 200 Applications may then select the canonical value as the display IRI 201 (such as in search results), and additional IRI properties such as 202 indexing and ranking signals, can be transferred as well. 204 5. Recommendations 206 Before adding the canonical link relation, verification of the 207 following is RECOMMENDED: 209 1. The content of the context IRI is duplicated within the content 210 of the target (canonical) IRI. 212 2. For HTTP, Permanent HTTP redirects (Section 10.3.2 of [RFC2616]), 213 the traditional strong indicator that a IRI's content has been 214 permanently moved, could not be implemented in place of the 215 canonical link relation. 217 3. In the case where the target (canonical) IRI is a superset of 218 content from the context IRI (i.e., the case where page-1.html 219 and page-2.html designate page-all.html as the canonical), that 220 the user experience is strongly taken into consideration, both in 221 regard to possible increased load time and potential complexity 222 in navigation. 224 6. IANA Considerations 226 IANA is asked to register the Canonical Link Relation below as per 227 [RFC5988]. 229 Relation Name: 231 CANONICAL 233 Description: 235 Designates the preferred version of a resource (the IRI and its 236 contents). 238 Reference: 240 This specification. 242 Notes: 244 None. 246 Application Data: 248 None. 250 7. Security Considerations 252 When a site is compromised, the canonical link relation can be 253 implemented with malicious intent to designate the attacker's IRI as 254 the preferred version of the content. While this technique is 255 largely unnoticeable to humans, automated programs may cluster the 256 compromised resource as duplicative of the attacker's target IRI, 257 transferring properties such as link popularity away from the 258 compromised resource to the attacker's designated canonical. 259 (Naturally, even a site that is not compromised could provide 260 inaccurate or misleading information about which URI is canonical.) 262 8. Internationalisation Considerations 264 In designating a canonical IRI, please see section 8 of [RFC5988] for 265 information on URI encoding. 267 9. Normative References 269 [REC-html401-19991224] Le Hors, A., Raggett, D., and I. Jacobs, 270 "HTML 4.01 Specification", W3C 271 Recommendation REC-html401-19991224, 272 December 1999, . 275 Latest version available at 276 . 278 [RFC2119] Bradner, S., "Key words for use in RFCs to 279 Indicate Requirement Levels", BCP 14, 280 RFC 2119, March 1997. 282 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, 283 H., Masinter, L., Leach, P., and T. Berners- 284 Lee, "Hypertext Transfer Protocol -- 285 HTTP/1.1", RFC 2616, June 1999. 287 [RFC3986] Berners-Lee, T., Fielding, R., and L. 289 Masinter, "Uniform Resource Identifier (URI): 290 Generic Syntax", STD 66, RFC 3986, 291 January 2005. 293 [RFC5988] Nottingham, M., "Web Linking", RFC 5988, 294 October 2010. 296 Appendix A. Implementations 298 Automated programs that implement functionality with regard for the 299 canonical link relation include: 301 o Google, canonical link relation HTML and HTTP header support, 302 within the same domain and across domains: 304 * 307 * 310 * 313 o Yahoo, canonical link relation HTML support within the same 314 domain: 316 * 319 o Bing, canonical link relation HTML support within the same domain: 321 * 325 Authors' Addresses 327 Maile Ohye 329 EMail: maileohye@gmail.com 330 URI: http://maileohye.com/ 332 Joachim Kupke 334 EMail: joachim@kupke.za.net