idnits 2.17.1 draft-bryan-metalinkhttp-17.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 13, 2010) is 4966 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'FIPS-180-3' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 3230 (Obsoleted by RFC 9530) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Bryan 3 Internet-Draft N. McNab 4 Intended status: Standards Track H. Nordstrom 5 Expires: March 17, 2011 T. Tsujikawa 7 P. Poeml 8 MirrorBrain 9 A. Ford 10 Roke Manor Research 11 September 13, 2010 13 Metalink/HTTP: Mirrors and Cryptographic Hashes in HTTP Headers 14 draft-bryan-metalinkhttp-17 16 Abstract 18 This document specifies Metalink/HTTP: Mirrors and Cryptographic 19 Hashes in HTTP Headers, a different way to get information that is 20 usually contained in the Metalink XML-based download description 21 format. Metalink/HTTP describes multiple download locations 22 (mirrors), Peer-to-Peer, cryptographic hashes, digital signatures, 23 and other information using existing standards for HTTP headers. 24 Clients can transparently use this information to make file transfers 25 more robust and reliable. 27 Status of this Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at http://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on March 17, 2011. 44 Copyright Notice 46 Copyright (c) 2010 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 62 1.1. Operation Overview . . . . . . . . . . . . . . . . . . . . 4 63 1.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 4 64 1.3. Notational Conventions . . . . . . . . . . . . . . . . . . 5 65 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 5 66 3. Mirrors / Multiple Download Locations . . . . . . . . . . . . 6 67 3.1. Mirror Priority . . . . . . . . . . . . . . . . . . . . . 6 68 3.2. Mirror Geographical Location . . . . . . . . . . . . . . . 6 69 3.3. Coordinated Mirror Policies . . . . . . . . . . . . . . . 7 70 3.4. Mirror Depth . . . . . . . . . . . . . . . . . . . . . . . 7 71 4. Peer-to-Peer / Metainfo . . . . . . . . . . . . . . . . . . . 8 72 4.1. Metalink/XML Files . . . . . . . . . . . . . . . . . . . . 8 73 5. OpenPGP Signatures . . . . . . . . . . . . . . . . . . . . . . 8 74 6. Cryptographic Hashes of Whole Files . . . . . . . . . . . . . 8 75 7. Client / Server Multi-source Download Interaction . . . . . . 9 76 7.1. Error Prevention, Detection, and Correction . . . . . . . 11 77 7.1.1. Error Prevention (Early File Mismatch Detection) . . . 11 78 7.1.2. Error Correction . . . . . . . . . . . . . . . . . . . 12 79 8. Multi-server Performance . . . . . . . . . . . . . . . . . . . 13 80 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 81 10. Security Considerations . . . . . . . . . . . . . . . . . . . 14 82 10.1. URIs and IRIs . . . . . . . . . . . . . . . . . . . . . . 14 83 10.2. Spoofing . . . . . . . . . . . . . . . . . . . . . . . . . 14 84 10.3. Cryptographic Hashes . . . . . . . . . . . . . . . . . . . 14 85 10.4. Signing . . . . . . . . . . . . . . . . . . . . . . . . . 15 86 11. Normative References . . . . . . . . . . . . . . . . . . . . . 15 87 Appendix A. Acknowledgements and Contributors . . . . . . . . . . 16 88 Appendix B. Comparisons to Similar Options . . . . . . . . . . . 16 89 Appendix C. Document History . . . . . . . . . . . . . . . . . . 17 90 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 92 1. Introduction 94 Metalink/HTTP is an alternative representation of Metalink 95 information, which is usually presented as an XML-based document 96 format [RFC5854]. Metalink/HTTP attempts to provide as much 97 functionality as the Metalink/XML format by using existing standards 98 such as Web Linking [draft-nottingham-http-link-header], Instance 99 Digests in HTTP [RFC3230], and ETags. Metalink/HTTP is used to list 100 information about a file to be downloaded. This can include lists of 101 multiple URIs (mirrors), Peer-to-Peer information, cryptographic 102 hashes, and digital signatures. 104 Identical copies of a file are frequently accessible in multiple 105 locations on the Internet over a variety of protocols (such as FTP, 106 HTTP, and Peer-to-Peer). In some cases, users are shown a list of 107 these multiple download locations (mirrors) and must manually select 108 a single one on the basis of geographical location, priority, or 109 bandwidth. This distributes the load across multiple servers, and 110 should also increase throughput and resilience. At times, however, 111 individual servers can be slow, outdated, or unreachable, but this 112 can not be determined until the download has been initiated. Users 113 will rarely have sufficient information to choose the most 114 appropriate server, and will often choose the first in a list which 115 may not be optimal for their needs, and will lead to a particular 116 server getting a disproportionate share of load. The use of 117 suboptimal mirrors can lead to the user canceling and restarting the 118 download to try to manually find a better source. During downloads, 119 errors in transmission can corrupt the file. There are no easy ways 120 to repair these files. For large downloads this can be extremely 121 troublesome. Any of the number of problems that can occur during a 122 download lead to frustration on the part of users. 124 Some popular sites automate the process of selecting mirrors using 125 DNS load balancing, both to approximately balance load between 126 servers, and to direct clients to nearby servers with the hope that 127 this improves throughput. Indeed, DNS load balancing can balance 128 long-term server load fairly effectively, but it is less effective at 129 delivering the best throughput to users when the bottleneck is not 130 the server but the network. 132 This document describes a mechanism by which the benefit of mirrors 133 can be automatically and more effectively realized. All the 134 information about a download, including mirrors, cryptographic 135 hashes, digital signatures, and more can be transferred in 136 coordinated HTTP Headers. This Metalink transfers the knowledge of 137 the download server (and mirror database) to the client. Clients can 138 fallback to other mirrors if the current one has an issue. With this 139 knowledge, the client is enabled to work its way to a successful 140 download even under adverse circumstances. All this is done 141 transparently to the user and the download is much more reliable and 142 efficient. In contrast, a traditional HTTP redirect to a mirror 143 conveys only extremely minimal information - one link to one server, 144 and there is no provision in the HTTP protocol to handle failures. 145 Furthermore, in order to provide better load distribution across 146 servers and potentially faster downloads to users, Metalink/HTTP 147 facilitates multi-source downloads, where portions of a file are 148 downloaded from multiple mirrors (and optionally, Peer-to-Peer) 149 simultaneously. 151 [[ Discussion of this draft should take place on IETF HTTP WG mailing 152 list at ietf-http-wg@w3.org or the Metalink discussion mailing list 153 located at metalink-discussion@googlegroups.com. To join the list, 154 visit http://groups.google.com/group/metalink-discussion . ]] 156 1.1. Operation Overview 158 Detailed discussion of Metalink operation is covered in Section 2; 159 this section will present a very brief, high-level overview of how 160 Metalink achieves its goals. 162 Upon connection to a Metalink/HTTP server, a client will receive 163 information about other sources of the same resource and a 164 cryptographic hash of the whole resource. The client will then be 165 able to request chunks of the file from the various sources, 166 scheduling appropriately in order to maximise the download rate. 168 1.2. Examples 170 A brief Metalink server response with ETag, mirrors, .metalink, 171 OpenPGP signature, and a cryptographic hash of the whole file: 173 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 174 Link: ; rel="duplicate" 175 Link: ; rel="duplicate" 176 Link: ; rel="describedby"; 177 type="application/x-bittorrent" 178 Link: ; rel="describedby"; 179 type="application/metalink4+xml" 180 Link: ; rel="describedby"; 181 type="application/pgp-signature" 182 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 183 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 185 1.3. Notational Conventions 187 This specification describes conformance of Metalink/HTTP. 189 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 190 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 191 document are to be interpreted as described in BCP 14, [RFC2119], as 192 scoped to those conformance targets. 194 2. Requirements 196 In this context, "Metalink" refers to Metalink/HTTP which consists of 197 mirrors and cryptographic hashes in HTTP Headers as described in this 198 document. "Metalink/XML" refers to the XML format described in 199 [RFC5854]. 201 Metalink resources include a Link header 202 [draft-nottingham-http-link-header] to present a list of mirrors in 203 the response to a client request for the resource. The cryptographic 204 hash of a resource must be included via Instance Digests in HTTP 205 [RFC3230]. 207 Metalink servers are HTTP servers with one or more Metalink 208 resources. Mirror and cryptographic hash information provided by the 209 originating Metalink server MUST be considered authoritative. 210 Metalink servers and their associated mirror servers SHOULD all share 211 the same ETag policy (ETag Synchronization), i.e. based on the file 212 contents (cryptographic hash) and not server-unique filesystem 213 metadata. The emitted ETag MAY be implemented the same as the 214 Instance Digest for simplicity. Metalink servers MAY offer Metalink/ 215 XML documents that contain cryptographic hashes of parts of the file 216 and other information. 218 Mirror servers are typically FTP or HTTP servers that "mirror" 219 another server. That is, they provide identical copies of (at least 220 some) files that are also on the mirrored server. Mirror servers MAY 221 be Metalink servers. Mirror servers MUST support serving partial 222 content. HTTP mirror servers SHOULD share the same ETag policy as 223 the originating Metalink server. HTTP Mirror servers SHOULD support 224 Instance Digests in HTTP [RFC3230]. 226 Metalink clients use the mirrors provided by a Metalink server with 227 Link header [draft-nottingham-http-link-header]. Metalink clients 228 MUST support HTTP and MAY support FTP, BitTorrent, or other download 229 methods. Metalink clients MUST switch downloads from one mirror to 230 another if the mirror becomes unreachable. Metalink clients SHOULD 231 support multi-source, or parallel, downloads, where portions of a 232 file are downloaded from multiple mirrors simultaneously (and 233 optionally, from Peer-to-Peer sources). Metalink clients MUST 234 support Instance Digests in HTTP [RFC3230] by requesting and 235 verifying cryptographic hashes. Metalink clients MAY make use of 236 digital signatures if they are offered. 238 3. Mirrors / Multiple Download Locations 240 Mirrors are specified with the Link header 241 [draft-nottingham-http-link-header] and a relation type of 242 "duplicate" as defined in Section 9. 244 A brief Metalink server response with two mirrors only: 246 Link: ; rel="duplicate"; 247 pri=1; pref=1 248 Link: ; rel="duplicate"; 249 pri=2; geo="gb"; depth=1 251 [[Some organizations have many mirrors. Only send a few mirrors, or 252 only use the Link header if Want-Digest is used?]] 254 It is up to the server to choose how many Link headers to send. Such 255 a decision could be a hard-coded limit, a random selection, based on 256 file size, or based on server load. 258 3.1. Mirror Priority 260 Mirror servers are listed in order of priority (from most preferred 261 to least) or have a "pri" value, where mirrors with lower values are 262 used first. 264 This is purely an expression of the server's preferences; it is up to 265 the client what it does with this information, particularly with 266 reference to how many servers to use at any one time. A client MUST 267 respect the server's priority ordering, however. 269 [[Would it make more sense to use qvalue-style policies here, i.e. 270 q=1.0 through q=0.0 ?]] 272 3.2. Mirror Geographical Location 274 Mirror servers MAY have a "geo" value, which is a [ISO3166-1] alpha-2 275 two letter country code for the geographical location of the physical 276 server the URI is used to access. A client may use this information 277 to select a mirror, or set of mirrors, that are geographically near 278 (if the client has access to such information), with the aim of 279 reducing network load at inter-country bottlenecks. 281 3.3. Coordinated Mirror Policies 283 There are two types of mirror servers: preferred and normal. 284 Preferred mirror servers are HTTP mirror servers that MUST share the 285 same ETag policy as the originating Metalink server. Preferred 286 mirrors make it possible to detect early on, before data is 287 transferred, if the file requested matches the desired file. 288 Preferred HTTP mirror servers have a "pref" value of 1. By default, 289 if unspecified then mirrors are considered "normal" and do not share 290 the same ETag policy. FTP mirrors, as they do not emit ETags, MUST 291 always be considered "normal". ([draft-bryan-ftp-hash] allows for FTP 292 mirrors to be coordinated and provide file hashes). 294 HTTP Mirror servers SHOULD support Instance Digests in HTTP 295 [RFC3230]. Optimally, mirror servers will share the same ETag policy 296 and support Instance Digests in HTTP. 298 [[Suggestion: In order for clients to identify servers that have 299 coordinated ETag policies, the ETag MUST begin with "Metalink:", e.g. 301 ETag: "Metalink:SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=" 303 ]] 305 3.4. Mirror Depth 307 Some mirrors may mirror single files, whole directories, or multiple 308 directories. 310 Mirror servers MAY have a "depth" value, where "depth=0" is the 311 default. A value of 0 means ONLY that file is mirrored. A value of 312 1 means that file and all other files and subdirectories in the 313 directory are mirrored. A value of 2 means the directory above, and 314 all files and subdirectories, are mirrored. 316 A mirror with a depth value of 4: 318 Link: ; 319 rel="duplicate"; pri=1; pref=1; depth=4 321 Is the above example, 4 directories up are mirrored, from /dir2/ on 322 down. 324 4. Peer-to-Peer / Metainfo 326 Metainfo files, which describe ways to download a file over Peer-to- 327 Peer networks or otherwise, are specified with the Link header 328 [draft-nottingham-http-link-header] and a relation type of 329 "describedby" and a type parameter that indicates the MIME type of 330 the metadata available at the URI. 332 A brief Metalink server response with .torrent and .metalink: 334 Link: ; rel="describedby"; 335 type="application/x-bittorrent" 336 Link: ; rel="describedby"; 337 type="application/metalink4+xml" 339 Metalink clients MAY support the use of metainfo files for 340 downloading files. 342 4.1. Metalink/XML Files 344 Full Metalink/XML files for a given resource can be specified as 345 shown in Section 4. This is particularly useful for providing 346 metadata such as cryptographic hashes of parts of a file, allowing a 347 client to recover from partial errors (see Section 7.1.2). 349 5. OpenPGP Signatures 351 OpenPGP signatures are specified with the Link header 352 [draft-nottingham-http-link-header] and a relation type of 353 "describedby" and a type parameter of "application/pgp-signature". 355 A brief Metalink server response with OpenPGP signature only: 357 Link: ; rel="describedby"; 358 type="application/pgp-signature" 360 Metalink clients MAY support the use of OpenPGP signatures. 362 6. Cryptographic Hashes of Whole Files 364 Metalink servers MUST provide Instance Digests in HTTP [RFC3230] for 365 files they describe with mirrors. Mirror servers SHOULD as well. 367 A brief Metalink server response with cryptographic hash: 369 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 370 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 372 7. Client / Server Multi-source Download Interaction 374 Metalink clients begin a download with a standard HTTP [RFC2616] GET 375 request to the Metalink server. A Range limit is optional, not 376 required. Alternatively, Metalink clients can begin with a HEAD 377 request to the Metalink server to discover mirrors via Link headers. 378 After that, the client follows with a GET request to the desired 379 mirrors. 381 GET /distribution/example.ext HTTP/1.1 382 Host: www.example.com 384 The Metalink server responds with the data and these headers: 386 HTTP/1.1 200 OK 387 Accept-Ranges: bytes 388 Content-Length: 14867603 389 Content-Type: application/x-cd-image 390 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 391 Link: ; rel="duplicate" pref=1 392 Link: ; rel="duplicate" 393 Link: ; rel="describedby"; 394 type="application/x-bittorrent" 395 Link: ; rel="describedby"; 396 type="application/metalink4+xml" 397 Link: ; rel="describedby"; 398 type="application/pgp-signature" 399 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 400 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 402 From the Metalink server response the client learns some or all of 403 the following metadata about the requested object, in addition to 404 also starting to receive the object: 406 o Object size. 407 o ETag. 408 o Mirror profile link, which may describe the mirror's priority, 409 whether it shares the ETag policy of the originating Metalink 410 server, geographical location, and mirror depth. 412 o Peer-to-peer information. 413 o Metalink/XML, which can include partial file cryptographic hashes 414 to repair a file. 415 o Digital signature. 416 o Instance Digest, which is the whole file cryptographic hash. 418 (Alternatively, the client could have requested a HEAD only, and then 419 skipped to making the following decisions on every available mirror 420 server found via the Link headers) 422 If the object is large and gets delivered slower than expected then 423 the Metalink client starts a number of parallel ranged downloads (one 424 per selected mirror server other than the first) using mirrors 425 provided by the Link header with "duplicate" relation type, using the 426 location of the original GET request in the "Referer" header field. 427 The size and number of ranges requested from each server is for the 428 client to decide, based upon the performance observed from each 429 server. Further discussion of performance considerations is 430 presented in Section 8. 432 If no range limit was given in the original request then work from 433 the tail of the object (the first request is still running and will 434 eventually catch up), otherwise continue after the range requested in 435 the first request. If no Range was provided, the original connection 436 must be terminated once all parts of the resource have been 437 retrieved. It is recommended that a HEAD request is undertaken 438 first, so that the client can find out if there are any Link headers, 439 and then Range-based requests are undertaken to the mirror servers as 440 well as on the original connection. 442 Preferred mirrors have coordinated ETags, as described in 443 Section 3.3, and If-Match conditions based on the ETag SHOULD be used 444 to quickly detect out-of-date mirrors by using the ETag from the 445 Metalink server response. If no indication of ETag syncronisation/ 446 knowledge is given then If-Match should not be used, and optimally 447 there will be an Instance Digest in the mirror response which we can 448 use to detect a mismatch early, and if not then a mismatch won't be 449 detected until the completed object is verified. Early file mismatch 450 detection is described in detail in Section 7.1.1. 452 One of the client requests to a mirror server: 454 GET /example.ext HTTP/1.1 455 Host: www2.example.com 456 Range: bytes=7433802- 457 If-Match: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 458 Referer: http://www.example.com/distribution/example.ext 459 The mirror servers respond with a 206 Partial Content HTTP status 460 code and appropriate "Content-Length" and "Content Range" header 461 fields. The mirror server response, with data, to the above request: 463 HTTP/1.1 206 Partial Content 464 Accept-Ranges: bytes 465 Content-Length: 7433801 466 Content-Range: bytes 7433802-14867602/14867603 467 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 468 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 469 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 471 If the first request was not Range limited then abort it by closing 472 the connection when it catches up with the other parallel downloads 473 of the same object. 475 Downloads from mirrors that do not have the same file size as the 476 Metalink server MUST be aborted. 478 Once the download has completed, the Metalink client MUST verify the 479 cryptographic hash of the file. 481 7.1. Error Prevention, Detection, and Correction 483 Error prevention, or early file mismatch detection, is possible 484 before file transfers with the use of file sizes, ETags, and 485 cryptographic hashes. Error detection requires Instance Digests, or 486 cryptographic hashes, to determine after transfers if there has been 487 an error. Error correction, or download repair, is possible with 488 partial file cryptographic hashes. 490 7.1.1. Error Prevention (Early File Mismatch Detection) 492 In HTTP terms, the requirement is that merging of ranges from 493 multiple responses must be verified with a strong validator, which in 494 this context is the same as either Instance Digest or a strong ETag. 495 In most cases it is sufficient that the Metalink server provides 496 mirrors and Instance Digest information, but operation will be more 497 robust and efficient if the mirror servers do implement a 498 synchronized ETag as well. In fact, the emitted ETag may be 499 implemented the same as the Instance Digest for simplicity, but there 500 is no need to specify how the ETag is generated, just that it needs 501 to be shared among the mirror servers. If the mirror server provides 502 neither synchronized ETag or Instance Digest, then early detection of 503 mismatches is not possible unless file length also differs. Finally, 504 the error is still detectable, after the download has completed, when 505 the merged response is verified. 507 ETags can not be used for verifying the integrity of the received 508 content. But it is a guarantee issued by the Metalink server that 509 the content is correct for that ETag. And if the ETag given by the 510 mirror server matches the ETag given by the master server, then we 511 have a chain of trust where the master server authorizes these 512 responses as valid for that object. 514 This guarantees that a mismatch will be detected by using only the 515 synchronized ETag from a master server and mirror server, even 516 alerted by the mirror servers themselves by responding with an error, 517 preventing accidental merges of ranges from different versions of 518 files with the same name. This even includes many malicious attacks 519 where the data on the mirror has been replaced by some other file, 520 but not all. 522 Synchronized ETag can not strictly protect against malicious attacks 523 or server or network errors replacing content, but neither can 524 Instance Digest on the mirror servers as the attacker most certainly 525 can make the server seemingly respond with the expected Instance 526 Digest even if the file contents have been modified, just as he can 527 with ETag, and the same for various system failures also causing bad 528 data to be returned. The Metalink client has to rely on the Instance 529 Digest returned by the Metalink master server in the first response 530 for the verification of the downloaded object as a whole. 532 If the mirror servers do return an Instance Digest, then that is a 533 bonus, just as having them return the right set of Link headers is. 534 The set of trusted mirrors doing that can be substituted as master 535 servers accepting the initial request if one likes. 537 The benefit of having slave mirror servers (those not trusted as 538 masters) return Instance Digest is that the client then can detect 539 mismatches early even if ETag is not used. Both ETag and slave 540 mirror Instance Digest do provide value, but just one is sufficient 541 for early detection of mismatches. If none is provided then early 542 detection of mismatches is not possible unless the file length also 543 differs, but the error is still detected when the merged response is 544 verified. 546 If FTP servers support the FTP HASH command [draft-bryan-ftp-hash] 547 and the same hash algorithm as the originating Metalink server, then 548 that information can be used for early file mismatch detection. 550 7.1.2. Error Correction 552 Partial file cryptographic hashes can be used to detect errors during 553 the download. Metalink servers are not required to offer partial 554 file cryptographic hashes, but they are encouraged to do so. 556 If the object cryptographic hash does not match the Instance Digest 557 then fetch the Metalink/XML as specified in Section 4.1, where 558 partial file cryptographic hashes may be found, allowing detection of 559 which server returned incorrect data. If the Instance Digest 560 computation does not match then the client needs to fetch the partial 561 file cryptographic hashes, if available, and from there figure out 562 what of the downloaded data can be recovered and what needs to be 563 fetched again. If no partial cryptographic hashes are available, 564 then the client MUST fetch the complete object from other mirrors. 566 8. Multi-server Performance 568 When opting to download simultaneously from multiple mirrors, there 569 are a number of factors (both within and outside the influence of the 570 client software) that are relevant to the performance achieved: 572 o The number of servers used simultaneously. 573 o The ability to pipeline sufficient or sufficiently large range 574 requests to each server so as to avoid connections going idle. 575 o The ability to pipeline sufficiently few or sufficiently small 576 range requests to servers so that all the servers finish their 577 final chunks simultaneously. 578 o The ability to switch between mirrors dynamically so as to use the 579 fastest mirrors at any moment in time 581 Obviously we do not want to use too many simultaneous connections, or 582 other traffic sharing a bottleneck link will be starved. But at the 583 same time, good performance requires that the client can 584 simultaneously download from at least one fast mirror while exploring 585 whether any other mirror is faster. Based on laboratory experiments, 586 we suggest a good default number of simultaneous connections is 587 probably four, with three of these being used for the best three 588 mirrors found so far, and one being used to evaluate whether any 589 other mirror might offer better performance. 591 The size of chunks chosen by the client should be sufficiently large 592 that the chunk request headers and reponse headers represent neglible 593 overhead, and sufficiently large that they can be pipelined 594 effectively without needing a very high rate of chunk requests. At 595 the same time, the amount of time wasted waiting for the last chunk 596 to download from the last server after all the other servers have 597 finished should be minimized. Thus we currently recommend that a 598 chunk size of at least 10KBytes should be used. If the file being 599 transfered is very large, or the download speed very high, this can 600 be increased to perhaps 1MByte. As network bandwidths increase, we 601 expect these numbers to increase appropriately, so that the time to 602 transfer a chunk remains significantly larger than the latency of 603 requesting a chunk from a server. 605 9. IANA Considerations 607 Accordingly, IANA has made the following registration to the Link 608 Relation Type registry. 610 o Relation Name: duplicate 612 o Description: Refers to a resource whose available representations 613 are byte-for-byte identical with the corresponding representations of 614 the context IRI. 616 o Reference: This specification. 618 o Notes: This relation is for static resources. That is, an HTTP GET 619 request on any duplicate will return the same representation. It 620 does not make sense for dynamic or POSTable resources and should not 621 be used for them. 623 10. Security Considerations 625 10.1. URIs and IRIs 627 Metalink clients handle URIs and IRIs. See Section 7 of [RFC3986] 628 and Section 8 of [RFC3987] for security considerations related to 629 their handling and use. 631 10.2. Spoofing 633 There is potential for spoofing attacks where the attacker publishes 634 Metalinks with false information. In that case, this could deceive 635 unaware downloaders that they are downloading a malicious or 636 worthless file. Also, malicious publishers could attempt a 637 distributed denial of service attack by inserting unrelated URIs into 638 Metalinks. 640 10.3. Cryptographic Hashes 642 Currently, some of the digest values defined in Instance Digests in 643 HTTP [RFC3230] are considered insecure. These include the whole 644 Message Digest family of algorithms which are not suitable for 645 cryptographically strong verification. Malicious people could 646 provide files that appear to be identical to another file because of 647 a collision, i.e. the weak cryptographic hashes of the intended file 648 and a substituted malicious file could match. 650 If a Metalink contains whole file hashes as described in Section 6, 651 it SHOULD include "sha-256" which is SHA-256, as specified in 652 [FIPS-180-3], or stronger. It MAY also include other hashes. 654 10.4. Signing 656 Metalinks should include digital signatures, as described in 657 Section 5. 659 Digital signatures provide authentication, message integrity, and 660 non-repudiation with proof of origin. 662 11. Normative References 664 [FIPS-180-3] 665 National Institute of Standards and Technology (NIST), 666 "Secure Hash Standard (SHS)", FIPS PUB 180-3, 667 October 2008. 669 [ISO3166-1] 670 International Organization for Standardization, "ISO 3166- 671 1:2006. Codes for the representation of names of 672 countries and their subdivisions -- Part 1: Country 673 codes", November 2006. 675 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 676 Requirement Levels", BCP 14, RFC 2119, March 1997. 678 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 679 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 680 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 682 [RFC3230] Mogul, J. and A. Van Hoff, "Instance Digests in HTTP", 683 RFC 3230, January 2002. 685 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 686 Resource Identifier (URI): Generic Syntax", STD 66, 687 RFC 3986, January 2005. 689 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 690 Identifiers (IRIs)", RFC 3987, January 2005. 692 [RFC5854] Bryan, A., Tsujikawa, T., McNab, N., and P. Poeml, "The 693 Metalink Download Description Format", RFC 5854, 694 June 2010. 696 [draft-bryan-ftp-hash] 697 Bryan, A., Kosse, T., and D. Stenberg, "FTP Extensions for 698 Cryptographic Hashes", draft-bryan-ftp-hash-07 (work in 699 progress), August 2010. 701 [draft-nottingham-http-link-header] 702 Nottingham, M., "Web Linking", 703 draft-nottingham-http-link-header-10 (work in progress), 704 May 2010. 706 Appendix A. Acknowledgements and Contributors 708 Thanks to the Metalink community, Mark Handley, Mark Nottingham, 709 Daniel Stenberg, Tatsuhiro Tsujikawa, Peter Poeml, Matt Domsch, Micah 710 Cowan, and David Morris. 712 Support for simultaneous download from multiple mirrors is based upon 713 work by Mark Handley and Javier Vela Diago, who also provided 714 validation of the benefits of this approach. 716 Appendix B. Comparisons to Similar Options 718 [[ to be removed by the RFC editor before publication as an RFC. ]] 720 This draft, compared to the Metalink/XML format [RFC5854] : 722 o (+) Reuses existing HTTP standards without much new besides a Link 723 Relation Type. It's more of a collection/coordinated feature set. 724 o (?) The existing standards don't seem to be widely implemented. 725 o (+) No XML dependency, except for Metalink/XML for partial file 726 cryptographic hashes. 727 o (+) Existing Metalink/XML clients can be easily converted to 728 support this as well. 729 o (+) Coordination of mirror servers is preferred, but not required. 730 Coordination may be difficult or impossible unless you are in 731 control of all servers on the mirror network. 732 o (-) Requires software or configuration changes to originating 733 server. 734 o (-?) Tied to HTTP, not as generic. FTP/P2P clients won't be 735 using it unless they also support HTTP, unlike Metalink/XML. 736 o (-) Requires server-side support. Metalink/XML can be created by 737 user (or server, but server component/changes not required). 738 o (-) Also, Metalink/XML files are easily mirrored on all servers. 739 Even if usage in that case is not as transparent, it still gives 740 access to users at all mirrors (FTP included) to all download 741 information with no changes needed to the server. 743 o (-) Not portable/archivable/emailable. Metalink/XML is used to 744 import/export transfer queues. Not as easy for search engines to 745 index? 746 o (-) Not as rich metadata. 747 o (-) Not able to add multiple files to a download queue or create 748 directory structure. 750 Appendix C. Document History 752 [[ to be removed by the RFC editor before publication as an RFC. ]] 754 Known issues concerning this draft: 755 o Some organizations have many mirrors. Should all be sent, or only 756 a certain number? All should be included in the Metalink/XML, if 757 used. 758 o Would it make more sense to use qvalue-style policies to describe 759 mirror priority, i.e. q=1.0 through q=0.0 ? 760 o Using Metalink/XML for partial file cryptographic hashes. That 761 adds XML dependency to apps for an important feature. Is there a 762 better method? 763 o Do we need an "official" MIME type for .torrent files or allow 764 "application/x-bittorrent"? 766 -17 : August , 2010. 767 o RFC 5854 Metalink/XML. 769 -16 : April 16, 2010. 770 o Add draft-bryan-ftp-hash reference and FTP mirror coordination. 772 -15 : December 31, 2009. 773 o Update references and terminology. 775 -14 : December 31, 2009. 776 o Baseline file hash: SHA-256. 778 -13 : November 22, 2009. 779 o Metalink/XML for partial file cryptographic hashes. 781 -12 : November 11, 2009. 782 o Clarifications. 784 -11 : October 23, 2009. 785 o Mirror changes. 787 -10 : October 15, 2009. 789 o Mirror coordination changes. 791 -09 : October 12, 2009. 792 o Mirror location, coordination, and depth. 793 o Split HTTP Digest Algorithm Values Registration into 794 draft-bryan-http-digest-algorithm-values-update. 796 -08 : October 4, 2009. 797 o Clarifications. 799 -07 : September 29, 2009. 800 o Preferred mirror servers. 802 -06 : September 24, 2009. 803 o Add Mismatch Detection, Error Recovery, and Digest Algorithm 804 values. 805 o Remove Content-MD5 and Want-Digest. 807 -05 : September 19, 2009. 808 o ETags, preferably matching the Instance Digests. 810 -04 : September 17, 2009. 811 o Temporarily remove .torrent. 813 -03 : September 16, 2009. 814 o Mention HEAD request, negotiate mirrors if Want-Digest is used. 816 -02 : September 6, 2009. 817 o Content-MD5 for partial file cryptographic hashes. 819 -01 : September 1, 2009. 820 o Link Relation Type Registration: "duplicate" 822 -00 : August 24, 2009. 823 o Initial draft. 825 Authors' Addresses 827 Anthony Bryan 828 Pompano Beach, FL 829 USA 831 Email: anthonybryan@gmail.com 832 URI: http://www.metalinker.org 833 Neil McNab 835 Email: neil@nabber.org 836 URI: http://www.nabber.org 838 Henrik Nordstrom 840 Email: henrik@henriknordstrom.net 841 URI: http://www.henriknordstrom.net/ 843 Tatsuhiro Tsujikawa 844 Shiga 845 Japan 847 Email: tatsuhiro.t@gmail.com 848 URI: http://aria2.sourceforge.net 850 Dr. med. Peter Poeml 851 MirrorBrain 852 Venloer Str. 317 853 Koeln 50823 854 DE 856 Phone: +49 221 6778 333 8 857 Email: peter@poeml.de 858 URI: http://mirrorbrain.org/~poeml/ 860 Alan Ford 861 Roke Manor Research 862 Old Salisbury Lane 863 Romsey, Hampshire SO51 0ZN 864 UK 866 Phone: +44 1794 833 465 867 Email: alan.ford@roke.co.uk