idnits 2.17.1 draft-bryan-metalinkhttp-22.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 4, 2011) is 4792 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'BITTORRENT' -- Possible downref: Non-RFC (?) normative reference: ref. 'FIPS-180-3' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 3230 (Obsoleted by RFC 9530) ** Obsolete normative reference: RFC 5751 (Obsoleted by RFC 8551) ** Obsolete normative reference: RFC 5988 (Obsoleted by RFC 8288) Summary: 4 errors (**), 0 flaws (~~), 1 warning (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Bryan 3 Internet-Draft N. McNab 4 Intended status: Standards Track T. Tsujikawa 5 Expires: September 5, 2011 6 P. Poeml 7 MirrorBrain 8 H. Nordstrom 9 March 4, 2011 11 Metalink/HTTP: Mirrors and Hashes 12 draft-bryan-metalinkhttp-22 14 Abstract 16 This document specifies Metalink/HTTP: Mirrors and Cryptographic 17 Hashes in HTTP header fields, a different way to get information that 18 is usually contained in the Metalink XML-based download description 19 format. Metalink/HTTP describes multiple download locations 20 (mirrors), Peer-to-Peer, cryptographic hashes, digital signatures, 21 and other information using existing standards for HTTP header 22 fields. Metalink clients can use this information to make file 23 transfers more robust and reliable. Normative requirements for 24 Metalink/HTTP clients and servers are described here. 26 Editorial Note (To be removed by RFC Editor) 28 Discussion of this draft should take place on the HTTPBIS working 29 group mailing list (ietf-http-wg@w3.org), although this draft is not 30 a WG item. 32 The changes in this draft are summarized in Appendix C. 34 Status of this Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at http://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on September 5, 2011. 50 Copyright Notice 52 Copyright (c) 2011 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 1.1. Example Metalink Server Response . . . . . . . . . . . . . 5 69 1.2. Notational Conventions . . . . . . . . . . . . . . . . . . 5 70 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 71 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 6 72 3. Mirrors / Multiple Download Locations . . . . . . . . . . . . 8 73 3.1. Mirror Priority . . . . . . . . . . . . . . . . . . . . . 8 74 3.2. Mirror Geographical Location . . . . . . . . . . . . . . . 9 75 3.3. Coordinated Mirror Policies . . . . . . . . . . . . . . . 9 76 3.4. Mirror Depth . . . . . . . . . . . . . . . . . . . . . . . 9 77 4. Peer-to-Peer / Metainfo . . . . . . . . . . . . . . . . . . . 10 78 4.1. Metalink/XML Files . . . . . . . . . . . . . . . . . . . . 10 79 5. Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . 10 80 5.1. OpenPGP Signatures . . . . . . . . . . . . . . . . . . . . 10 81 5.2. S/MIME Signatures . . . . . . . . . . . . . . . . . . . . 11 82 6. Cryptographic Hashes of Whole Documents . . . . . . . . . . . 11 83 7. Client / Server Multi-source Download Interaction . . . . . . 11 84 7.1. Error Prevention, Detection, and Correction . . . . . . . 15 85 7.1.1. Error Prevention (Early File Mismatch Detection) . . . 15 86 7.1.2. Error Correction . . . . . . . . . . . . . . . . . . . 16 87 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 88 9. Security Considerations . . . . . . . . . . . . . . . . . . . 17 89 9.1. URIs and IRIs . . . . . . . . . . . . . . . . . . . . . . 17 90 9.2. Spoofing . . . . . . . . . . . . . . . . . . . . . . . . . 17 91 9.3. Cryptographic Hashes . . . . . . . . . . . . . . . . . . . 17 92 9.4. Signing . . . . . . . . . . . . . . . . . . . . . . . . . 17 93 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 94 10.1. Normative References . . . . . . . . . . . . . . . . . . . 18 95 10.2. Informative References . . . . . . . . . . . . . . . . . . 19 96 Appendix A. Acknowledgements and Contributors . . . . . . . . . . 19 97 Appendix B. Comparisons to Similar Options . . . . . . . . . . . 19 98 Appendix C. Document History . . . . . . . . . . . . . . . . . . 20 99 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 101 1. Introduction 103 Metalink/HTTP is an alternative and complementary representation of 104 Metalink information, which is usually presented as an XML-based 105 document format [RFC5854]. Metalink/HTTP attempts to provide as much 106 functionality as the Metalink/XML format by using existing standards 107 such as Web Linking [RFC5988], Instance Digests in HTTP [RFC3230], 108 and Entity Tags (also known as ETags) [RFC2616]. Metalink/HTTP is 109 used to list information about a file to be downloaded. This can 110 include lists of multiple URIs (mirrors), Peer-to-Peer information, 111 cryptographic hashes, and digital signatures. 113 Identical copies of a file are frequently accessible in multiple 114 locations on the Internet over a variety of protocols (such as FTP, 115 HTTP, and Peer-to-Peer). In some cases, users are shown a list of 116 these multiple download locations (mirrors) and must manually select 117 a single one on the basis of geographical location, priority, or 118 bandwidth. This distributes the load across multiple servers, and 119 should also increase throughput and resilience. At times, however, 120 individual servers can be slow, outdated, or unreachable, but this 121 can not be determined until the download has been initiated. Users 122 will rarely have sufficient information to choose the most 123 appropriate server, and will often choose the first in a list which 124 might not be optimal for their needs, and will lead to a particular 125 server getting a disproportionate share of load. The use of 126 suboptimal mirrors can lead to the user canceling and restarting the 127 download to try to manually find a better source. During downloads, 128 errors in transmission can corrupt the file. There are no easy ways 129 to repair these files. For large downloads this can be extremely 130 troublesome. Any of the number of problems that can occur during a 131 download lead to frustration on the part of users. 133 Some popular sites automate the process of selecting mirrors using 134 DNS load balancing, both to approximately balance load between 135 servers, and to direct clients to nearby servers with the hope that 136 this improves throughput. Indeed, DNS load balancing can balance 137 long-term server load fairly effectively, but it is less effective at 138 delivering the best throughput to users when the bottleneck is not 139 the server but the network. 141 This document describes a mechanism by which the benefit of mirrors 142 can be automatically and more effectively realized. All the 143 information about a download, including mirrors, cryptographic 144 hashes, digital signatures, and more can be transferred in 145 coordinated HTTP header fields hereafter referred to as a Metalink. 146 This Metalink transfers the knowledge of the download server (and 147 mirror database) to the client. Clients can fallback to other 148 mirrors if the current one has an issue. With this knowledge, the 149 client is enabled to work its way to a successful download even under 150 adverse circumstances. All this can be done without complicated user 151 interaction and the download can be much more reliable and efficient. 152 In contrast, a traditional HTTP redirect to a mirror conveys only 153 minimal information - one link to one server, and there is no 154 provision in the HTTP protocol to handle failures. Furthermore, in 155 order to provide better load distribution across servers and 156 potentially faster downloads to users, Metalink/HTTP facilitates 157 multi-source downloads, where portions of a file are downloaded from 158 multiple mirrors (and optionally, Peer-to-Peer) simultaneously. 160 Upon connection to a Metalink/HTTP server, a client will receive 161 information about other sources of the same resource and a 162 cryptographic hash of the whole resource. The client will then be 163 able to request chunks of the file from the various sources, 164 scheduling appropriately in order to maximize the download rate. 166 1.1. Example Metalink Server Response 168 This example shows a brief Metalink server response with ETag, 169 mirrors, .meta4, OpenPGP signature, and a cryptographic hash of the 170 whole file: 172 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 173 Link: ; rel=duplicate 174 Link: ; rel=duplicate 175 Link: ; rel=describedby; 176 type="application/x-bittorrent" 177 Link: ; rel=describedby; 178 type="application/metalink4+xml" 179 Link: ; rel=describedby; 180 type="application/pgp-signature" 181 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 182 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 184 1.2. Notational Conventions 186 This specification describes conformance of Metalink/HTTP. 188 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 189 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 190 document are to be interpreted as described in BCP 14, [RFC2119], as 191 scoped to those conformance targets. 193 1.3. Terminology 195 The following terms as used in this document are defined here: 197 o Metalink server : HTTP server that provides a Metalink in HTTP 198 response header fields. 199 o Metalink : A collection of HTTP response header fields from a 200 Metalink server, which is the reply to a GET or HEAD request from 201 a client, and includes Link header fields listing mirrors and 202 Instance Digests listing a cryptographic hash. 203 o Link header field : HTTP response header field defined in 204 [RFC5988] that can list mirrors and potentially other dowload 205 methods to obtain a file from, along with digital signatures. 206 o Instance Digest : HTTP response header field defined in [RFC3230] 207 that contains the cryptographic hash of a file, which is used by 208 the Metalink client to verify the integrity of the file once the 209 download has completed. 210 o Entity Tag or ETag : HTTP response header field defined in 211 [RFC2616] that, if synchronized between Metalink server and mirror 212 servers, allows Metalink clients to provide advanced features. 213 o Mirror server : Typically FTP or HTTP servers that "mirror" the 214 Metalink server, as in they provide identical copies of (at least 215 some) files that are also on the mirrored server. 216 o Metalink clients : Applications that process Metalinks and use 217 them provide an improved download experience. They support HTTP 218 and could also support other download protocols like FTP or 219 various Peer-to-Peer methods. 220 o Metalink/XML : An XML file that can contain similar information to 221 a HTTP response header Metalink, such as mirrors and cryptographic 222 hashes. 224 2. Requirements 226 In this context, "Metalink" refers to Metalink/HTTP which consists of 227 mirrors and cryptographic hashes in HTTP header fields as described 228 in this document. "Metalink/XML" refers to the XML format described 229 in [RFC5854]. 231 Metalink resources include Link header fields [RFC5988] to present a 232 list of mirrors in the response to a client request for the resource. 233 Metalink servers MUST include the cryptographic hash of a resource 234 via Instance Digests in HTTP [RFC3230]. Algorithms used in the 235 Instance Digest field are registered in the IANA registry named 236 "Hypertext Transfer Protocol (HTTP) Digest Algorithm Values" at 237 . 238 This document restricts the use of these algorithms. SHA-256 and 239 SHA-512 were added to the registry by [RFC5843]. Metalinks contain 240 whole file hashes as described in Section 6, and MUST include SHA- 241 256, as specified in [FIPS-180-3]. It MAY also include other hashes. 243 Metalink servers are HTTP servers with one or more Metalink 244 resources. Metalink servers MUST support the Link header fields for 245 listing mirrors and MUST support Instance Digests in HTTP [RFC3230]. 246 Metalink servers MUST return the same Link header fields and Instance 247 Digests on HEAD requests. Metalink servers and their associated 248 preferred mirror servers MUST all share the same ETag policy. 249 Metalink servers and their associated normal mirror servers SHOULD 250 all share the same ETag policy. (See Section 3.3 for the definition 251 of "preferred" and "normal" mirror servers.) It is up to the 252 administrator of the Metalink server to communicate the details of 253 the shared ETag policy to the administrators of the mirror servers so 254 that the mirror servers can be configured with the same ETag policy. 255 To have the same ETag policy means that ETags are synchronized across 256 servers for resources that are mirrored, i.e. byte-for-byte identical 257 files will have the same ETag on mirrors that they have on the 258 Metalink server. For example, it would be better to derive an ETag 259 from a cryptographic hash of the file contents than on server-unique 260 filesystem metadata. Metalink servers SHOULD offer Metalink/XML 261 documents that contain cryptographic hashes of parts of the file (and 262 other information) if error recovery is desirable. 264 Mirror servers are typically FTP or HTTP servers that "mirror" 265 another server. That is, they provide identical copies of (at least 266 some) files that are also on the mirrored server. Mirror servers 267 SHOULD support serving partial content. HTTP mirror servers SHOULD 268 share the same ETag policy as the originating Metalink server. HTTP 269 Mirror servers SHOULD support Instance Digests in HTTP [RFC3230] 270 using the same algorithm as the Metalink server. Optimally, mirror 271 servers will share the same ETag policy and support Instance Digests 272 in HTTP. Mirror servers that share the same ETag policy and/or 273 support Instance Digests in HTTP using the same algorithm as a 274 Metalink server are known as preferred mirror servers. 276 Metalink clients use the mirrors provided by a Metalink server in 277 Link header fields [RFC5988] but it is restricted to the initial 278 Metalink server they contacted. If Metalink clients find Link header 279 fields [RFC5988] for listing mirrors from mirrors or a Metalink 280 server listing itself as a mirror, they MUST discard such Link header 281 fields [RFC5988] to prevent a possible infinite loop. Metalink 282 clients MUST support HTTP and SHOULD support FTP [RFC0959]. Metalink 283 clients MAY support BitTorrent [BITTORRENT], or other download 284 methods. Metalink clients SHOULD switch downloads from one mirror to 285 another if a mirror becomes unreachable. Metalink clients MAY 286 support multi-source, or parallel, downloads, where portions of a 287 file can be downloaded from multiple mirrors simultaneously (and 288 optionally, from Peer-to-Peer sources). Metalink clients MUST 289 support Instance Digests in HTTP [RFC3230] by requesting and 290 verifying cryptographic hashes. Metalink clients SHOULD support 291 error recovery by using the cryptographic hashes of parts of the file 292 listed in Metalink/XML files. Metalink clients SHOULD support 293 checking digital signatures. 295 3. Mirrors / Multiple Download Locations 297 Mirrors are specified with the Link header fields [RFC5988] and a 298 relation type of "duplicate" as defined in Section 8. 300 The following list contains OPTIONAL attributes which are defined 301 elsewhere in this document: 302 o "depth" : mirror depth in Section 3.4. 303 o "geo" : mirror geographical location in Section 3.2. 304 o "pref" : a preferred mirror server in Section 3.3. 305 o "pri" : mirror priority in Section 3.1. 307 This example shows a brief Metalink server response with two mirrors 308 only: 310 Link: ; rel=duplicate; 311 pri=1; pref 312 Link: ; rel=duplicate; 313 pri=2; geo=gb; depth=1 315 As some organizations can have many mirrors, it is up to the 316 organization to configure the amount of Link header fields the 317 Metalink server will provide. Such a decision could be a random 318 selection or a hard-coded limit based on network proximity, file 319 size, server load, or other factors. 321 3.1. Mirror Priority 323 Entries for mirror servers MAY have a "pri" value to designate the 324 priority of a mirror. Valid ranges for the "pri" attribute are from 325 1 to 999999. Mirror servers with a lower value of the "pri" 326 attribute have a higher priority while mirrors with an undefined 327 "pri" attribute are considered to have a value of 999999 which is the 328 lowest priority. For example a mirror with "pri=10" has higher 329 priority than a mirror with "pri=20". Metalink clients SHOULD use 330 mirrors with lower "pri" values first, but depending on other 331 conditions MAY decide to use other mirrors. 333 This is purely an expression of the server's preferences; it is up to 334 the client what it does with this information, particularly with 335 reference to how many servers to use at any one time. 337 3.2. Mirror Geographical Location 339 Entries for a mirror server MAY have a "geo" value, which is a 340 [ISO3166-1] alpha-2 two letter country code for the geographical 341 location of the physical server the URI is used to access. A client 342 MAY use this information to select a mirror, or set of mirrors, that 343 are geographically near (if the client has access to such 344 information), with the aim of reducing network load at inter-country 345 bottlenecks. 347 3.3. Coordinated Mirror Policies 349 There are two types of mirror servers: preferred and normal. Entries 350 for preferred HTTP mirror servers have a "pref" value and entries for 351 normal mirrors don't. Preferred mirror servers are HTTP mirror 352 servers that MUST share the same ETag policy as the originating 353 Metalink server and/or MUST provide Instance Digests using the same 354 algorithm as the Metalink server. Preferred mirrors make it possible 355 for Metalink clients to detect early on, before data is transferred, 356 if the file requested matches the desired file. This early file 357 mismatch detection is described in Section 7.1.1. Normal mirrors do 358 not necessarily share the same ETag policy or support Instance 359 Digests using the same algorithm as the Metalink server. FTP mirrors 360 are considered "normal", as they do not emit ETags or support 361 Instance Digests. 363 3.4. Mirror Depth 365 Some mirrors can mirror single files, whole directories, or multiple 366 directories. 368 Entries for mirror servers can have a "depth" value, where "depth=0" 369 is the default. A value of 0 means only that file is mirrored and 370 that other URI path segments are not. A value of 1 means that file 371 and all other files and URI path segments contained in the rightmost 372 URI path segment are mirrored. For values of N, N-1 URI path 373 segments closer to the Host are mirrored. A value of 2 means one URI 374 path segment closer to the Host is mirrored, and all files and URI 375 path segments contained are mirrored. For each higher value, another 376 URI path segment closer to the Host is mirrored. 378 This example shows a mirror with a depth value of 4: 380 Link: ; 381 rel=duplicate; pri=1; pref; depth=4 383 In the above example, 4 URI path segments closer to the Host are 384 mirrored, from /dir2/ and all files and directories included. 386 4. Peer-to-Peer / Metainfo 388 Entries for metainfo files, which describe ways to download a file 389 over Peer-to-Peer networks or otherwise, are specified with the Link 390 header fields [RFC5988] and a relation type of "describedby" and a 391 type parameter that indicates the MIME type of the metadata available 392 at the URI. Since metainfo files can sometimes describe multiple 393 files, or the filename MAY not be the same on the Metalink server and 394 in the metainfo file but still have the same content, an OPTIONAL 395 "name" attribute can be used. 397 The following list contains an OPTIONAL attribute which is defined in 398 this document: 399 o "name" : a file described within the metainfo file. 401 This example shows a brief Metalink server response with .torrent and 402 .meta4: 404 Link: ; rel=describedby; 405 type="application/x-bittorrent"; name="differentname.ext" 406 Link: ; rel=describedby; 407 type="application/metalink4+xml" 409 Metalink clients MAY support the use of metainfo files for 410 downloading files. 412 4.1. Metalink/XML Files 414 Metalink/XML files for a given resource MAY be provided in a Link 415 header field as shown in the example in Section 4. Metalink/XML 416 files are specified in [RFC5854] and they are particularly useful for 417 providing metadata such as cryptographic hashes of parts of a file, 418 allowing a client to recover from errors (see Section 7.1.2). 419 Metalink servers SHOULD provide Metalink/XML files with partial file 420 hashes in Link header fields and Metalink clients SHOULD use them for 421 error recovery. 423 5. Signatures 425 5.1. OpenPGP Signatures 427 OpenPGP signatures [RFC3156] of requested files are specified with 428 the Link header fields [RFC5988] and a relation type of "describedby" 429 and a type parameter of "application/pgp-signature". 431 This example shows a brief Metalink server response with OpenPGP 432 signature only: 434 Link: ; rel=describedby; 435 type="application/pgp-signature" 437 Metalink clients SHOULD support the use of OpenPGP signatures. 439 5.2. S/MIME Signatures 441 S/MIME signatures [RFC5751] of requested files are specified with the 442 Link header fields [RFC5988] and a relation type of "describedby" and 443 a type parameter of "application/pkcs7-mime". 445 This example shows a brief Metalink server response with S/MIME 446 signature only: 448 Link: ; rel=describedby; 449 type="application/pkcs7-mime" 451 Metalink clients SHOULD support the use of S/MIME signatures. 453 6. Cryptographic Hashes of Whole Documents 455 If Instance Digests are not provided by the Metalink servers, the 456 Link header fields pertaining to this specification MUST be ignored. 458 This example shows a brief Metalink server response with ETag, 459 mirror, and cryptographic hash: 461 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 462 Link: ; rel=duplicate 463 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 464 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 466 7. Client / Server Multi-source Download Interaction 468 Metalink clients begin a download with a standard HTTP [RFC2616] GET 469 request to the Metalink server. Metalink clients MAY use a Range 470 limit if desired. 472 GET /distribution/example.ext HTTP/1.1 473 Host: www.example.com 475 The Metalink server responds with the data and these header fields: 477 HTTP/1.1 200 OK 478 Accept-Ranges: bytes 479 Content-Length: 14867603 480 Content-Type: application/x-cd-image 481 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 482 Link: ; rel=duplicate; pref 483 Link: ; rel=duplicate 484 Link: ; rel=describedby; 485 type="application/x-bittorrent" 486 Link: ; rel=describedby; 487 type="application/metalink4+xml" 488 Link: ; rel=describedby; 489 type="application/pgp-signature" 490 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 491 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 493 Alternatively, Metalink clients can begin with a HEAD request to the 494 Metalink server to discover mirrors via Link header fields, and then 495 skip to making the following decisions on every available mirror 496 server found via the Link header fields. 498 After that, the client follows with a GET request to the desired 499 mirrors. 501 From the Metalink server response the client learns some or all of 502 the following metadata about the requested object, in addition to 503 also starting to receive the object: 505 o Mirror profile link, which can describe the mirror's priority, 506 whether it shares the ETag policy of the originating Metalink 507 server, geographical location, and mirror depth. 508 o Instance Digest, which is the whole file cryptographic hash. 509 o ETag. 510 o Object size from the Content-Length header field. 511 o Metalink/XML, which can include partial file cryptographic hashes 512 to repair a file. 513 o Peer-to-peer information. 514 o Digital signature. 516 Next, the Metalink client requests a Range of the object from a 517 preferred mirror server, so it can use If-Match conditions: 519 GET /example.ext HTTP/1.1 520 Host: www2.example.com 521 Range: bytes=7433802- 522 If-Match: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 523 Referer: http://www.example.com/distribution/example.ext 524 Metalink clients SHOULD use preferred mirrors, if possible, as they 525 allow early file mismatch detection as described in Section 7.1.1. 526 Preferred mirrors have coordinated ETags, as described in 527 Section 3.3, and Metalink clients SHOULD use If-Match conditions 528 based on the ETag to quickly detect out-of-date mirrors by using the 529 ETag from the Metalink server response. Metalink clients SHOULD use 530 partial file cryptographic hashes as described in Section 7.1.2, if 531 available, to detect if the mirror server returned the correct data. 533 Optimally, the mirror server also will include an Instance Digest in 534 the mirror response to the client GET request, which the client can 535 also use to detect a mismatch early. Metalink clients MUST reject 536 individual downloads from mirrors that support Instance Digests if 537 the Instance Digest from the mirror does not match the Instance 538 Digest as reported by the Metalink server and the same algorithm is 539 used. If normal mirrors are used, then a mismatch can not be 540 detected until the completed object is verified. Errors in 541 transmission and substitutions of incorrect data on mirrors, whether 542 deliberate or accidental, can be detected with error correction as 543 described in Section 7.1.2. 545 Here, the preferred mirror server has the correct file (the If-Match 546 conditions match) and responds with a 206 Partial Content HTTP status 547 code and appropriate "Content-Length", "Content Range", ETag, and 548 Instance Digest header fields. In this example, the mirror server 549 responds, with data, to the above request: 551 HTTP/1.1 206 Partial Content 552 Accept-Ranges: bytes 553 Content-Length: 7433801 554 Content-Range: bytes 7433802-14867602/14867603 555 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 556 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 557 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 559 Metalink clients MAY start a number of parallel ranged downloads (one 560 per selected mirror server other than the first) using mirrors 561 provided by the Link header fields with "duplicate" relation type. 562 Metalink clients MUST limit the number of parallel connections to 563 mirror servers, ideally based on observing how the aggregate 564 throughput changes as connections are opened. It would be pointless 565 to blindly open connections once the path bottleneck is filled. 566 Metalink clients SHOULD use the location of the original GET request 567 in the "Referer" header field for these ranged requests. 569 The Metalink client can determine the size and number of ranges 570 requested from each server, based upon the type and number of mirrors 571 and performance observed from each mirror. Note that Range requests 572 impose an overhead on servers and clients need to be aware of that 573 and not abuse them. When dowloading a particular file, metalink 574 clients MUST NOT make more than one concurrent request to each mirror 575 server that it downloads from. 577 Metalink clients SHOULD close all but the fastest connection if any 578 Ranged requests generated after the first request end up with a 579 complete response, instead of a partial response (as some mirrors 580 might not support HTTP ranges), if the goal is the fastest transfer. 581 Metalink clients MAY monitor mirror conditions and dynamically switch 582 between mirrors to achieve the fastest download possible. Similarly, 583 Metalink clients SHOULD abort extremely slow or stalled range 584 requests and finish the request on other mirrors. If all ranges have 585 finished except for the final one, the Metalink client can split the 586 final range into multiple range requests to other mirrors so the 587 transfer finishes faster. 589 If the first request was GET and no Range header field was sent and 590 the client determines later that it will issue a Range request, then 591 the client SHOULD close the first connection when it catches up with 592 the other parallel ranged downloads of the same object. This means 593 the first connection was sacrificed. Metalink clients can use a HEAD 594 request first, if possible, so that the client can find out if there 595 are any Link header fields, and then Range-based requests are 596 undertaken to the mirror servers without sacrificing a first 597 connection. 599 Metalink clients MUST reject individual downloads from mirrors where 600 the file size does not match the file size as reported by the 601 Metalink server. 603 If a Metalink client does not support certain download methods (such 604 as FTP or BitTorrent) that a file is available from, and there are no 605 available download methods that the client supports, then the 606 download will have no way to complete. 608 Metalink clients MUST verify the cryptographic hash of the file once 609 the download has completed. If the cryptographic hash offered by the 610 Metalink server with Instance Digests does not match the 611 cryptographic hash of the downloaded file, see Section 7.1.2 for a 612 possible way to repair errors. 614 If the download can not be repaired, it is considered corrupt. The 615 client can attempt to re-download the file. 617 Metalink clients that support verifying digital signatures MUST 618 verify digital signatures of requested files if they are included. 619 Digital signatures MUST validate back to a trust anchor as described 620 in the validation rules in [RFC3156] and [RFC5280]. 622 7.1. Error Prevention, Detection, and Correction 624 Error prevention, or early file mismatch detection, is possible 625 before file transfers with the use of file sizes, ETags, and Instance 626 Digests provided by Metalink servers. Error detection requires 627 Instance Digests to detect errors in transfer after the transfers 628 have completed. Error correction, or download repair, is possible 629 with partial file cryptographic hashes. 631 Note that cryptographic hashes obtained from Instance Digests are in 632 base64 encoding, while those from Metalink/XML are in hexadecimal. 634 7.1.1. Error Prevention (Early File Mismatch Detection) 636 In HTTP terms, the merging of ranges from multiple responses SHOULD 637 be verified with a strong validator, which in this context is either 638 an Instance Digest or a shared ETag from that Metalink server that 639 matches with the same provided by a preferred mirror server. In most 640 cases, it is sufficient that the Metalink server provides mirrors and 641 Instance Digest information, but operation will be more robust and 642 efficient if the mirror servers do implement a shared ETag policy or 643 Instance Digests as well. There is no need to specify how the ETag 644 is generated, just that it needs to be shared between the Metalink 645 server and the mirror servers. The benefit of having mirror servers 646 return an Instance Digest is that the client then can detect 647 mismatches early even if ETags are not used. Mirrors that support 648 both a shared ETag and Instance Digests do provide value, but just 649 one is sufficient for early detection of mismatches. If the mirror 650 server provides neither shared ETag nor Instance Digest, then early 651 detection of mismatches is not possible unless file length also 652 differs. Finally, errors are still detectable after the download has 653 completed, when the cryptographic hash of the merged response is 654 verified. 656 ETags can not be used for verifying the integrity of the received 657 content. If the ETag given by the mirror server matches the ETag 658 given by the Metalink server, then the Metalink client assumes the 659 responses are valid for that object. 661 This guarantees that a mismatch will be detected by using only the 662 shared ETag from a Metalink server and mirror server. Mirror servers 663 will respond with an error if ETags do not match, which will prevent 664 accidental merges of ranges from different versions of files with the 665 same name. 667 A shared ETag or Instance Digest can not strictly protect against 668 malicious attacks or server or network errors replacing content. An 669 attacker can make a mirror server seemingly respond with the expected 670 Instance Digest or ETags even if the file contents have been 671 modified. The same goes for various system failures which would also 672 cause bad data (i.e. corrupted files) to be returned. The Metalink 673 client has to rely on the Instance Digest returned by the Metalink 674 server in the first response for the verification of the downloaded 675 object as a whole. To verify the individual ranges, which might have 676 been requested from different sources, see Section 7.1.2. 678 7.1.2. Error Correction 680 Partial file cryptographic hashes can be used to detect errors during 681 the download. Metalink servers SHOULD provide Metalink/XML files 682 with partial file hashes in Link header fields as specified in 683 Section 4.1, and Metalink clients SHOULD use them for error 684 correction. 686 An error in transfer or a substitution attack will be detected by a 687 cryptographic hash of the object not matching the Instance Digest 688 from the Metalink server. If the cryptographic hash of the object 689 does not match the Instance Digest from the Metalink server, then the 690 client SHOULD fetch the Metalink/XML (if available). This may 691 contain partial file cryptographic hashes which will allow detection 692 of which mirror server returned incorrect data. Metalink clients 693 SHOULD use the Metalink/XML data to figure out what ranges of the 694 downloaded data can be recovered and what needs to be fetched again. 696 Other methods can be used for error correction. For example, some 697 other metainfo files also include partial file hashes that can be 698 used to check for errors. 700 8. IANA Considerations 702 Accordingly, IANA will make the following registration to the Link 703 Relation Type registry at . 706 o Relation Name: duplicate 708 o Description: Refers to a resource whose available representations 709 are byte-for-byte identical with the corresponding representations of 710 the context IRI. 712 o Reference: This specification. 714 o Notes: This relation is for static resources. That is, an HTTP GET 715 request on any duplicate will return the same representation. It 716 does not make sense for dynamic or POSTable resources and should not 717 be used for them. 719 9. Security Considerations 721 9.1. URIs and IRIs 723 Metalink clients handle URIs and IRIs. See Section 7 of [RFC3986] 724 and Section 8 of [RFC3987] for security considerations related to 725 their handling and use. 727 9.2. Spoofing 729 There is potential for spoofing attacks where the attacker publishes 730 Metalinks with false information. In that case, this could deceive 731 unaware downloaders into downloading a malicious or worthless file. 732 Metalink clients are advised to prevent loops, possibly from a mirror 733 server to a Metalink server and back again, in Section 2. As with 734 all downloads, users should only download from trusted sources. 735 Also, malicious publishers could attempt a distributed denial of 736 service attack by inserting unrelated URIs into Metalinks. [RFC4732] 737 contains information on amplification attacks and denial of service 738 attacks. 740 9.3. Cryptographic Hashes 742 Currently, some of the digest values defined in Instance Digests in 743 HTTP [RFC3230] are considered insecure. These include the whole 744 Message Digest family of algorithms which are not suitable for 745 cryptographically strong verification. Malicious people could 746 provide files that appear to be identical to another file because of 747 a collision, i.e. the weak cryptographic hashes of the intended file 748 and a substituted malicious file could match. 750 9.4. Signing 752 Metalinks SHOULD include digital signatures, as described in 753 Section 5. 755 Digital signatures provide authentication, message integrity, and 756 enable non-repudiation with proof of origin. 758 10. References 759 10.1. Normative References 761 [BITTORRENT] 762 Cohen, B., "The BitTorrent Protocol Specification", 763 BITTORRENT 11031, February 2008, 764 . 766 [FIPS-180-3] 767 National Institute of Standards and Technology (NIST), 768 "Secure Hash Standard (SHS)", FIPS PUB 180-3, 769 October 2008. 771 [ISO3166-1] 772 International Organization for Standardization, "ISO 3166- 773 1:2006. Codes for the representation of names of 774 countries and their subdivisions -- Part 1: Country 775 codes", November 2006. 777 [RFC0959] Postel, J. and J. Reynolds, "File Transfer Protocol", 778 STD 9, RFC 0959, October 1985. 780 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 781 Requirement Levels", BCP 14, RFC 2119, March 1997. 783 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 784 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 785 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 787 [RFC3156] Elkins, M., Del Torto, D., Levien, R., and T. Roessler, 788 "MIME Security with OpenPGP", RFC 3156, August 2001. 790 [RFC3230] Mogul, J. and A. Van Hoff, "Instance Digests in HTTP", 791 RFC 3230, January 2002. 793 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 794 Resource Identifier (URI): Generic Syntax", STD 66, 795 RFC 3986, January 2005. 797 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 798 Identifiers (IRIs)", RFC 3987, January 2005. 800 [RFC5280] Cooper, D., Santesson, S., Farrell, S., Boeyen, S., 801 Housley, R., and W. Polk, "Internet X.509 Public Key 802 Infrastructure Certificate and Certificate Revocation List 803 (CRL) Profile", RFC 5280, May 2008. 805 [RFC5751] Ramsdell, B. and S. Turner, "Secure/Multipurpose Internet 806 Mail Extensions (S/MIME) Version 3.2 Message 807 Specification", RFC 5751, January 2010. 809 [RFC5854] Bryan, A., Tsujikawa, T., McNab, N., and P. Poeml, "The 810 Metalink Download Description Format", RFC 5854, 811 June 2010. 813 [RFC5988] Nottingham, M., "Web Linking", RFC 5988, October 2010. 815 10.2. Informative References 817 [RFC4732] Handley, M., Rescorla, E., and IAB, "Internet Denial-of- 818 Service Considerations", RFC 4732, December 2006. 820 [RFC5843] Bryan, A., "Additional Hash Algorithms for HTTP Instance 821 Digests", RFC 5843, April 2010. 823 Appendix A. Acknowledgements and Contributors 825 Thanks to the Metalink community, Alexey Melnikov, Julian Reschke, 826 Mark Nottingham, Daniel Stenberg, Matt Domsch, Micah Cowan, David 827 Morris, Yves Lafon, Juergen Schoenwaelder, Ben Campbell, Lars Eggert, 828 Sean Turner, Robert Sparks, and the HTTPBIS Working Group. 830 Thanks to Alan Ford and Mark Handley for spurring us on to publish 831 this document. 833 This document is dedicated to Zimmy Bryan and Juanita Anthony. 835 Appendix B. Comparisons to Similar Options 837 [[ to be removed by the RFC editor before publication as an RFC. ]] 839 This draft, compared to the Metalink/XML format [RFC5854] : 841 o (+) Reuses existing HTTP standards without much new besides a Link 842 Relation Type. It's more of a collection/coordinated feature set. 843 o (?) The existing standards don't seem to be widely implemented. 844 o (+) No XML dependency, except for Metalink/XML for partial file 845 cryptographic hashes. 846 o (+) Existing Metalink/XML clients can be easily converted to 847 support this as well. 848 o (+) Coordination of mirror servers is preferred, but not required. 849 Coordination could be difficult or impossible unless one group is 850 in control of all servers on the mirror network. 852 o (-) Requires software or configuration changes to originating 853 server. 854 o (-?) Tied to HTTP, not as generic. FTP/P2P clients won't be 855 using it unless they also support HTTP, unlike Metalink/XML. 856 o (-) Requires server-side support. Metalink/XML can be created by 857 user (or server, but server component/changes not required). 858 o (-) Also, Metalink/XML files are easily mirrored on all servers. 859 Even if usage in that case is not as transparent, this method 860 still gives access to all download information (with no changes 861 needed to servers) from all mirrors (FTP included). 862 o (-) Not portable/archivable/emailable. Metalink/XML is used to 863 import/export transfer queues. Not as easy for search engines to 864 index? 865 o (-) Not as rich metadata. 866 o (-) Not able to add multiple files to a download queue or create 867 directory structure. 869 Appendix C. Document History 871 [[ to be removed by the RFC editor before publication as an RFC. ]] 873 Known issues concerning this draft: 874 o None. 876 -21 : February 27, 2011. 877 o IESG review. 879 -20 : February 14, 2011. 880 o Yves Lafon's apps-team review, Juergen Schoenwaelder's secdir 881 review, Ben Campbell's Gen-ART review. 883 -19 : January 20, 2011. 884 o Julian Reschke's review. 886 -18 : January 1, 2010. 887 o AD review by Alexey Melnikov. 889 -17 : September 13, 2010. 890 o RFC 5854 Metalink/XML. 892 -16 : April 16, 2010. 893 o Add draft-ietf-ftpext2-hash reference and FTP mirror coordination. 895 -15 : February 20, 2010. 896 o Update references and terminology. 898 -14 : December 31, 2009. 900 o Baseline file hash: SHA-256. 902 -13 : November 22, 2009. 903 o Metalink/XML for partial file cryptographic hashes. 905 -12 : November 11, 2009. 906 o Clarifications. 908 -11 : October 23, 2009. 909 o Mirror changes. 911 -10 : October 15, 2009. 912 o Mirror coordination changes. 914 -09 : October 13, 2009. 915 o Mirror location, coordination, and depth. 916 o Split HTTP Digest Algorithm Values Registration into 917 draft-bryan-http-digest-algorithm-values-update. 919 -08 : October 4, 2009. 920 o Clarifications. 922 -07 : September 29, 2009. 923 o Preferred mirror servers. 925 -06 : September 24, 2009. 926 o Add Mismatch Detection, Error Recovery, and Digest Algorithm 927 values. 928 o Remove Content-MD5 and Want-Digest. 930 -05 : September 19, 2009. 931 o ETags, preferably matching the Instance Digests. 933 -04 : September 17, 2009. 934 o Temporarily remove .torrent. 936 -03 : September 16, 2009. 937 o Mention HEAD request, negotiate mirrors if Want-Digest is used. 939 -02 : September 7, 2009. 940 o Content-MD5 for partial file cryptographic hashes. 942 -01 : September 1, 2009. 943 o Link Relation Type Registration: "duplicate" 945 -00 : August 24, 2009. 947 o Initial draft. 949 Authors' Addresses 951 Anthony Bryan 952 Pompano Beach, FL 953 USA 955 Email: anthonybryan@gmail.com 956 URI: http://www.metalinker.org 958 Neil McNab 960 Email: neil@nabber.org 961 URI: http://www.nabber.org 963 Tatsuhiro Tsujikawa 964 Shiga 965 Japan 967 Email: tatsuhiro.t@gmail.com 968 URI: http://aria2.sourceforge.net 970 Dr. med. Peter Poeml 971 MirrorBrain 972 Venloer Str. 317 973 Koeln 50823 974 DE 976 Phone: +49 221 6778 333 8 977 Email: peter@poeml.de 978 URI: http://mirrorbrain.org/~poeml/ 980 Henrik Nordstrom 982 Email: henrik@henriknordstrom.net 983 URI: http://www.henriknordstrom.net/