<?xml version='1.0'?>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt'?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc rfcedstyle="yes"?>
<!-- BELOW YOU MAY NEED TO MODIFY THE ATTRIBUTE; IF NOT SURE CONSULT WITH THE MAILING LIST-->
<rfc docName="draft-ohye-canonical-link-relation-04" category="info" ipr="trust200902">
  <front>
    <title>The Canonical Link Relation</title>
    <author fullname="Maile Ohye" surname="Ohye" initials="M.">
      <address>
        <email>maileohye@gmail.com</email>
        <uri>http://maileohye.com/</uri>
      </address>
    </author>
    <author fullname="Joachim Kupke" surname="Kupke" initials="J.">
      <address>
        <email>joachim@kupke.za.net</email>
      </address>
    </author>
    <date year="2011" month="October"/>
    <abstract>
      <t><xref target="RFC5988"/> specified a way to define relationships between links on the web. This document describes a new type of such relationship, "canonical," which designates the preferred URI from a set of identical or vastly similar ones.</t>
    </abstract>
    <note title="Editorial Note (To be removed by RFC Editor)">
    <t>Distribution of this document is unlimited. Comments should be sent to the IETF Apps-Discuss mailing list (see <eref target="https://www.ietf.org/mailman/listinfo/apps-discuss"/>).</t> 
    </note>
  </front>
  <middle>
    <section title="Introduction">
      <t>The canonical link relation specifies the preferred URI from a set of URIs that return identical or vastly similar content, making it possible for references to the context URI to be updated to reference the target URI.</t>
      
      <t>The most common application of the canonical link relation includes specifying the preferred version of a URI from duplicate content pages created with the addition of parameters (e.g. session IDs, tracking IDs, category, or sort information).</t>
    </section>
    <section title="Notational Conventions">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in <xref target="RFC2119"/>.</t>
    </section>
    <section title="The Canonical Link Relation">
      <t>The canonical (target) URI MUST identify content that duplicates, is extremely similar, or is a superset of the content at the context (referring) URI. Authors who declare the canonical link relation ought to anticipate that applications such as search engines can:</t>
      
      <t>
        <list style="symbols">
          <t>Index content only from the canonical target (i.e. content from the context URIs will be likely disregarded as duplicative)</t>
          
          <t>Consolidate URI properties, such as link popularity, to the canonical</t>
          
          <t>Display the canonical target as the representative URI</t>
        </list>
      </t>
      
      <t>A resource SHOULD NOT specify more than one canonical link relation.</t>
      
      <t>The target/canonical URI MAY:</t>
      
      <t>
        <list style="symbols">
          <t>Specify a relative URI (see <xref target="RFC3986" /> Section 4.2)</t>
        
          <t>Be self-referential (context URI identical to target URI)</t>
        
          <t>Exist on a different hostname or domain</t>
        
          <t>Have different scheme names, such as "http" to "https," or "gopher" to "ftp"</t>
        
          <t>Be a superset of the content of the context URI
            <list style="symbols">
              <t>For example, "page1" of a multi-page article may specify the canonical target as the "view-all" URI because "view-all" is a superset of page1's content. However, "page2" SHOULD NOT designate "page1" as the canonical because the content of page1 is not inclusive of page2.</t>
  	        </list>
  	      </t>
  	      
  	      <t>Be the source URI of a temporary redirect. For HTTP, this refers to status codes 302, 303, or 307 (Sections 10.3.3, 10.3.4, and 10.3.8, respectively, of <xref target="RFC2616"/>).</t>
  	    </list>
  	  </t>
  	    
  	  <t>The target/canonical URI SHOULD NOT designate:</t>
  	  
  	  <t>
  	    <list style="symbols">
  	      <t>The source URI of a permanent redirect (for HTTP, this refers to 300 and 301 response codes, defined in Sections 10.3.1 and 10.3.2 of <xref target="RFC2616"/>)</t>
  	      
  	      <t>A URI that also specifies a canonical link relation to a URI other than itself</t>
  	      
  	      <t>A URI that returns an error code, such as 4xx response in HTTP (Section 10.4 of <xref target="RFC2616"/>)</t>
  	      
  	      <t>The first page of a multi-page article or multi-page listing of items (since the first page is not a duplicate or a superset of the context URI). For example, page2 and page3 of an article SHOULD NOT specify page1 as the canonical.</t>
  	    </list>
  	  </t>
    </section>
    
    <section title="Examples">
    
      <t>The following example illustrates:</t>
      <t>
        <list style="symbols">
          <t>Three URIs that serve nearly the exact same content</t>
        
          <t>One URI which is the canonical or "preferred version"</t>
        
          <t>Two URIs with additional query parameters, making them the non-preferred version of the content (duplicates). The canonical link relation is therefore specified on these duplicates.</t>
        </list>
      </t>
    
      <t>If the preferred version of a URI and its content exists at:</t>
      
      <figure>
        <artwork type="example">http://www.example.com/page.php?item=purse</artwork>
      </figure>
      
      <t>Then duplicate content URIs such as:</t>
    
      <figure>
        <artwork type="example">http://www.example.com/page.php?item=purse&amp;category=bags</artwork>
      </figure>
    
      <figure>
        <artwork type="example">http://www.example.com/page.php?item=purse&amp;category=bags&amp;sid=1234</artwork>
      </figure>
    
      <t>may designate the canonical link relation in HTML as specified in <xref target="REC-html401-19991224"/>:</t>
    
      <figure>
        <artwork type="example">&lt;link rel="canonical"
        href="http://www.example.com/page.php?item=purse"&gt;</artwork>
      </figure>
    
      <t>or as a relative URI:</t>
    
      <figure>
        <artwork type="example">&lt;link rel="canonical" href="page.php?item=purse"&gt;</artwork>
      </figure>
    
      <t>or alternatively, in the HTTP header field as specified in Section 5 of <xref target="RFC5988"/>:</t>
    
      <figure>
        <artwork type="example">Link: &lt;http://www.example.com/page.php?item=purse&gt;; rel="canonical"</artwork>
      </figure>
    
      <t>This signals to automated programs, such as search engines, that these are duplicates of the canonical URI:
      http://www.example.com/page.php?item=purse.</t>
  
      <t>Automated programs may then select the canonical value as the display URI (such as in search results), and additional URI properties such as indexing and ranking signals, can be transferred as well.</t>
  
    </section>

    <section title="Recommendations">
    
      <t>Before adding the canonical link relation, verification of the following is recommended:</t>
    
      <t>
        <list style="numbers">
          <t>The content of the context URI is identical with, similar to, or a subset of the content of the canonical.</t>
          
          <t>For HTTP, Permanent HTTP redirects (Section 10.3.2 of <xref target="RFC2616"/>), the traditional strong indicator that a URI's content has been permanently moved, could not be implemented in place of the canonical link relation.</t>
          
          <t>In the case where the canonical target is a superset of content from the context URI (e.g. page1 or page2 to view-all), that the user experience is strongly taken into consideration, both in regard to possible increased load time and potential complexity in navigation.</t>
        </list>
      </t>
    </section>

    <section title="IANA Considerations">
      
      <t>IANA is asked to register the Canonical Link Relation below as per <xref target="RFC5988"/>.</t>
      
      <t>Relation Name:
        <list>
          <t>CANONICAL</t>
        </list>
      </t>
      <t>Description:
        <list>
          <t>Designates the preferred version of a resource (the URI and its contents).</t>
        </list>
      </t>
      <t>Reference:
        <list>
          <t>This specification.</t>
        </list>
      </t>
      
      <t>Notes:
        <list>
          <t>None.</t>
        </list>
      </t>
      
      <t>Application Data:
        <list>
          <t>None.</t>
        </list>
      </t>
      
    </section>
    
    <section title="Security Considerations">
      
      <t>When a site is compromised, the canonical link relation can be implemented with malicious intent to designate the attacker's URI as the preferred version of the content. While this technique is largely unnoticeable to humans, automated programs may cluster the compromised resource as duplicative of the attacker's designated canonical, transferring properties such as link popularity away from the resource to the attacker's URI.</t>
    
    </section>
    
    <section title="Internationalisation Considerations">
      
      <t>In designating a canonical URI, please see <xref target="RFC3986" /> for information on URI encoding.</t>
      
    </section>
  
  </middle>
  
  <back>
    <references title="Normative References">
    
      <reference anchor='REC-html401-19991224' target='http://www.w3.org/TR/1999/REC-html401-19991224'>
        <front>
          <title>HTML 4.01 Specification</title>
          <author fullname='Arnaud Le Hors' surname='Le Hors' initials='A.'/>
          <author fullname='David Raggett' surname='Raggett' initials='D.'/>
          <author fullname='Ian Jacobs' surname='Jacobs' initials='I.'/>
          <date year='1999' month='December' day='24'/>
        </front>
        <seriesInfo name='W3C Recommendation' value='REC-html401-19991224'/>
        <annotation>Latest version available at <eref target='http://www.w3.org/TR/html401'/>.</annotation>
      </reference>
      
      <reference anchor="RFC2119">
        <front>
          <title>Key words for use in RFCs to Indicate Requirement Levels</title>
          <author initials="S." surname="Bradner" fullname="Scott Bradner">
            <organization>Harvard University</organization>
            <address>
              <email>sob at harvard.edu</email>
            </address>
          </author>
          <date month="March" year="1997"/>
        </front>
        <seriesInfo name="BCP" value="14"/>
        <seriesInfo name="RFC" value="2119"/>
      </reference>
      
      <reference anchor="RFC2616">
        <front>
          <title>Hypertext Transfer Protocol -- HTTP/1.1</title>
          <author initials="R." surname="Fielding" fullname="R. Fielding">
            <organization>University of California, Irvine</organization>
            <address>
              <email>fielding@ics.uci.edu</email>
            </address>
          </author>
          
          <author initials="J." surname="Gettys" fullname="J. Gettys">
            <organization>W3C</organization>
            <address>
              <email>jg@w3.org</email>
            </address>
          </author>
          
          <author initials="J." surname="Mogul" fullname="J. Mogul">
            <organization>Compaq Computer Corporation</organization>
            <address>
              <email>mogul@wrl.dec.com</email>
            </address>
          </author>
          
          <author initials="H." surname="Frystyk" fullname="H. Frystyk">
            <organization>MIT Laboratory for Computer Science</organization>
            <address>
              <email>frystyk@w3.org</email>
            </address>
          </author>
          
          <author initials="L." surname="Masinter" fullname="L. Masinter">
            <organization>Xerox Corporation</organization>
            <address>
              <email>masinter@parc.xerox.com</email>
            </address>
          </author>
          
          <author initials="P." surname="Leach" fullname="P. Leach">
            <organization>Microsoft Corporation</organization>
            <address>
              <email>paulle@microsoft.com</email>
            </address>
          </author>
          
          <author initials="T." surname="Berners-Lee" fullname="T. Berners-Lee">
            <organization>W3C</organization>
            <address>
              <email>timbl@w3.org</email>
            </address>
          </author>
          <date month="June" year="1999"/>
        </front>
        <seriesInfo name="RFC" value="2616"/>
      </reference>
      
      <reference anchor="RFC3986">
      
        <front>
          <title abbrev='URI Generic Syntax'>Uniform Resource Identifier (URI): Generic Syntax</title>
          <author initials='T.' surname='Berners-Lee' fullname='Tim Berners-Lee'>
            <organization abbrev="W3C/MIT">World Wide Web Consortium</organization>
            <address>
              <email>timbl@w3.org</email>
              <uri>http://www.w3.org/People/Berners-Lee/</uri>
            </address>
          </author>
        
          <author initials='R.' surname='Fielding' fullname='Roy T. Fielding'>
            <organization abbrev="Day Software">Day Software</organization>
            <address>
              <email>fielding@gbiv.com</email>
              <uri>http://roy.gbiv.com/</uri>
            </address>
          </author>
        
          <author initials='L.' surname='Masinter' fullname='Larry Masinter'>
            <organization abbrev="Adobe Systems">Adobe Systems Incorporated</organization>
            <address>
              <email>LMM@acm.org</email>
              <uri>http://larry.masinter.net/</uri>
            </address>
          </author>
          <date month='January' year='2005'></date>
        </front>
        <seriesInfo name="STD" value="66"/>
        <seriesInfo name="RFC" value="3986"/>
      </reference>
      
      <reference anchor="RFC5988">
        <front>
          <title>Web Linking</title>
          <author fullname="Mark Nottingham" initials="M." surname="Nottingham">
            <address>
              <email>mnot at mnot.net</email>
            </address>
          </author>
          <date month="October" year="2010" />
        </front>
        <seriesInfo name="RFC" value="5988"/>
      </reference>
    
    </references>
    
    <section title="Implementations">
    
      <t>Automated programs that implement functionality with regard for the canonical link relation include:</t>
    
      <t>
        <list style="symbols">
          <t>Google, canonical link relation HTML and HTTP header support, within the same domain and across domains:</t>
          <list>
          	<t>
          	  <eref target="http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html" />
          	</t>
          	<t>
          	  <eref target="http://googlewebmastercentral.blogspot.com/2011/06/supporting-relcanonical-http-headers.html" />
          	</t>
          	<t>
          	  <eref target="http://googlewebmastercentral.blogspot.com/2009/12/handling-legitimate-cross-domain.html" />
          	</t>
           </list>
          <t>Yahoo, canonical link relation HTML support within the same domain:</t>
          <list>
            <t>
              <eref target="http://www.ysearchblog.com/2009/02/12/fighting-duplication-adding-more-arrows-to-your-quiver/" />
            </t>
           </list>
          <t>Bing, canonical link relation HTML support within the same domain:</t>
          <list>
            <t>
              <eref target="http://www.bing.com/community/site_blogs/b/webmaster/archive/2009/02/12/partnering-to-help-solve-duplicate-content-issues.aspx" />
            </t>
           </list>
        </list>
      </t>
    </section>
  </back>
</rfc>