idnits 2.17.1 draft-bryan-metalinkhttp-18.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 1, 2011) is 4862 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'BITTORRENT' -- Possible downref: Non-RFC (?) normative reference: ref. 'FIPS-180-3' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 3230 (Obsoleted by RFC 9530) ** Obsolete normative reference: RFC 5988 (Obsoleted by RFC 8288) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Bryan 3 Internet-Draft N. McNab 4 Intended status: Standards Track H. Nordstrom 5 Expires: July 5, 2011 T. Tsujikawa 7 P. Poeml 8 MirrorBrain 9 A. Ford 10 Roke Manor Research 11 January 1, 2011 13 Metalink/HTTP: Mirrors and Cryptographic Hashes in HTTP Headers 14 draft-bryan-metalinkhttp-18 16 Abstract 18 This document specifies Metalink/HTTP: Mirrors and Cryptographic 19 Hashes in HTTP Headers, a different way to get information that is 20 usually contained in the Metalink XML-based download description 21 format. Metalink/HTTP describes multiple download locations 22 (mirrors), Peer-to-Peer, cryptographic hashes, digital signatures, 23 and other information using existing standards for HTTP headers. 24 Clients can transparently use this information to make file transfers 25 more robust and reliable. 27 Status of this Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at http://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on July 5, 2011. 44 Copyright Notice 46 Copyright (c) 2011 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 62 1.1. Operation Overview . . . . . . . . . . . . . . . . . . . . 4 63 1.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 4 64 1.3. Notational Conventions . . . . . . . . . . . . . . . . . . 5 65 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 5 66 3. Mirrors / Multiple Download Locations . . . . . . . . . . . . 6 67 3.1. Mirror Priority . . . . . . . . . . . . . . . . . . . . . 6 68 3.2. Mirror Geographical Location . . . . . . . . . . . . . . . 7 69 3.3. Coordinated Mirror Policies . . . . . . . . . . . . . . . 7 70 3.4. Mirror Depth . . . . . . . . . . . . . . . . . . . . . . . 7 71 4. Peer-to-Peer / Metainfo . . . . . . . . . . . . . . . . . . . 8 72 4.1. Metalink/XML Files . . . . . . . . . . . . . . . . . . . . 8 73 5. OpenPGP Signatures . . . . . . . . . . . . . . . . . . . . . . 8 74 6. Cryptographic Hashes of Whole Files . . . . . . . . . . . . . 9 75 7. Client / Server Multi-source Download Interaction . . . . . . 9 76 7.1. Error Prevention, Detection, and Correction . . . . . . . 11 77 7.1.1. Error Prevention (Early File Mismatch Detection) . . . 12 78 7.1.2. Error Correction . . . . . . . . . . . . . . . . . . . 13 79 8. Multi-server Performance . . . . . . . . . . . . . . . . . . . 13 80 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 81 10. Security Considerations . . . . . . . . . . . . . . . . . . . 14 82 10.1. URIs and IRIs . . . . . . . . . . . . . . . . . . . . . . 14 83 10.2. Spoofing . . . . . . . . . . . . . . . . . . . . . . . . . 15 84 10.3. Cryptographic Hashes . . . . . . . . . . . . . . . . . . . 15 85 10.4. Signing . . . . . . . . . . . . . . . . . . . . . . . . . 15 86 11. Normative References . . . . . . . . . . . . . . . . . . . . . 15 87 Appendix A. Acknowledgements and Contributors . . . . . . . . . . 16 88 Appendix B. Comparisons to Similar Options . . . . . . . . . . . 17 89 Appendix C. Document History . . . . . . . . . . . . . . . . . . 17 90 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 92 1. Introduction 94 Metalink/HTTP is an alternative representation of Metalink 95 information, which is usually presented as an XML-based document 96 format [RFC5854]. Metalink/HTTP attempts to provide as much 97 functionality as the Metalink/XML format by using existing standards 98 such as Web Linking [RFC5988], Instance Digests in HTTP [RFC3230], 99 and ETags [RFC2616]. Metalink/HTTP is used to list information about 100 a file to be downloaded. This can include lists of multiple URIs 101 (mirrors), Peer-to-Peer information, cryptographic hashes, and 102 digital signatures. 104 Identical copies of a file are frequently accessible in multiple 105 locations on the Internet over a variety of protocols (such as FTP, 106 HTTP, and Peer-to-Peer). In some cases, users are shown a list of 107 these multiple download locations (mirrors) and must manually select 108 a single one on the basis of geographical location, priority, or 109 bandwidth. This distributes the load across multiple servers, and 110 should also increase throughput and resilience. At times, however, 111 individual servers can be slow, outdated, or unreachable, but this 112 can not be determined until the download has been initiated. Users 113 will rarely have sufficient information to choose the most 114 appropriate server, and will often choose the first in a list which 115 may not be optimal for their needs, and will lead to a particular 116 server getting a disproportionate share of load. The use of 117 suboptimal mirrors can lead to the user canceling and restarting the 118 download to try to manually find a better source. During downloads, 119 errors in transmission can corrupt the file. There are no easy ways 120 to repair these files. For large downloads this can be extremely 121 troublesome. Any of the number of problems that can occur during a 122 download lead to frustration on the part of users. 124 Some popular sites automate the process of selecting mirrors using 125 DNS load balancing, both to approximately balance load between 126 servers, and to direct clients to nearby servers with the hope that 127 this improves throughput. Indeed, DNS load balancing can balance 128 long-term server load fairly effectively, but it is less effective at 129 delivering the best throughput to users when the bottleneck is not 130 the server but the network. 132 This document describes a mechanism by which the benefit of mirrors 133 can be automatically and more effectively realized. All the 134 information about a download, including mirrors, cryptographic 135 hashes, digital signatures, and more can be transferred in 136 coordinated HTTP Headers. This Metalink transfers the knowledge of 137 the download server (and mirror database) to the client. Clients can 138 fallback to other mirrors if the current one has an issue. With this 139 knowledge, the client is enabled to work its way to a successful 140 download even under adverse circumstances. All this is done 141 transparently to the user and the download is much more reliable and 142 efficient. In contrast, a traditional HTTP redirect to a mirror 143 conveys only extremely minimal information - one link to one server, 144 and there is no provision in the HTTP protocol to handle failures. 145 Furthermore, in order to provide better load distribution across 146 servers and potentially faster downloads to users, Metalink/HTTP 147 facilitates multi-source downloads, where portions of a file are 148 downloaded from multiple mirrors (and optionally, Peer-to-Peer) 149 simultaneously. 151 [[ Discussion of this draft should take place on IETF HTTP WG mailing 152 list at ietf-http-wg@w3.org or the Metalink discussion mailing list 153 located at metalink-discussion@googlegroups.com. To join the list, 154 visit http://groups.google.com/group/metalink-discussion . ]] 156 1.1. Operation Overview 158 Detailed discussion of Metalink operation is covered in Section 2; 159 this section will present a very brief, high-level overview of how 160 Metalink achieves its goals. 162 Upon connection to a Metalink/HTTP server, a client will receive 163 information about other sources of the same resource and a 164 cryptographic hash of the whole resource. The client will then be 165 able to request chunks of the file from the various sources, 166 scheduling appropriately in order to maximise the download rate. 168 1.2. Examples 170 A brief Metalink server response with ETag, mirrors, .metalink, 171 OpenPGP signature, and a cryptographic hash of the whole file: 173 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 174 Link: ; rel="duplicate" 175 Link: ; rel="duplicate" 176 Link: ; rel="describedby"; 177 type="application/x-bittorrent" 178 Link: ; rel="describedby"; 179 type="application/metalink4+xml" 180 Link: ; rel="describedby"; 181 type="application/pgp-signature" 182 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 183 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 185 1.3. Notational Conventions 187 This specification describes conformance of Metalink/HTTP. 189 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 190 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 191 document are to be interpreted as described in BCP 14, [RFC2119], as 192 scoped to those conformance targets. 194 2. Requirements 196 In this context, "Metalink" refers to Metalink/HTTP which consists of 197 mirrors and cryptographic hashes in HTTP Headers as described in this 198 document. "Metalink/XML" refers to the XML format described in 199 [RFC5854]. 201 Metalink resources include a Link header [RFC5988] to present a list 202 of mirrors in the response to a client request for the resource. 203 Metalink servers MUST include the cryptographic hash of a resource 204 via Instance Digests in HTTP [RFC3230]. Valid algorithms are found 205 in the IANA registry named "Hypertext Transfer Protocol (HTTP) Digest 206 Algorithm Values" at 207 http://www.iana.org/assignments/http-dig-alg/http-dig-alg.xhtml . 209 Metalink servers are HTTP servers with one or more Metalink 210 resources. Metalink servers MUST support the Link header for listing 211 mirrors and MUST support Instance Digests in HTTP [RFC3230]. Mirror 212 and cryptographic hash information provided by the originating 213 Metalink server MUST be considered authoritative. Metalink servers 214 and their associated mirror servers are RECOMMENDED to all share the 215 same ETag policy (ETag Synchronization), i.e. based on the file 216 contents (cryptographic hash) and not server-unique filesystem 217 metadata. The emitted ETag MAY be implemented the same as the 218 Instance Digest for simplicity. Metalink servers MAY offer Metalink/ 219 XML documents that contain cryptographic hashes of parts of the file 220 and other information. 222 Mirror servers are typically FTP or HTTP servers that "mirror" 223 another server. That is, they provide identical copies of (at least 224 some) files that are also on the mirrored server. Mirror servers MAY 225 be Metalink servers. Mirror servers are RECOMMENDED to support 226 serving partial content. HTTP mirror servers are RECOMMENDED to 227 share the same ETag policy as the originating Metalink server. HTTP 228 Mirror servers are RECOMMENDED to support Instance Digests in HTTP 229 [RFC3230]. 231 Metalink clients use the mirrors provided by a Metalink server with 232 Link header [RFC5988]. Metalink clients MUST support HTTP and are 233 RECOMMENDED to support FTP [RFC0959]. Metalink clients MAY support 234 BitTorrent [BITTORRENT], or other download methods. Metalink clients 235 are RECOMMENDED to switch downloads from one mirror to another if a 236 mirror becomes unreachable. Metalink clients are RECOMMENDED to 237 support multi-source, or parallel, downloads, where portions of a 238 file can be downloaded from multiple mirrors simultaneously (and 239 optionally, from Peer-to-Peer sources). Metalink clients MUST 240 support Instance Digests in HTTP [RFC3230] by requesting and 241 verifying cryptographic hashes. Metalink clients MAY make use of 242 digital signatures if they are offered. 244 3. Mirrors / Multiple Download Locations 246 Mirrors are specified with the Link header [RFC5988] and a relation 247 type of "duplicate" as defined in Section 9. 249 A brief Metalink server response with two mirrors only: 251 Link: ; rel="duplicate"; 252 pri=1; pref=1 253 Link: ; rel="duplicate"; 254 pri=2; geo="gb"; depth=1 256 [[Some organizations have many mirrors. Only send a few mirrors, or 257 only use the Link header if Want-Digest is used?]] 259 It is up to the server to choose how many Link headers to send. Such 260 a decision could be a hard-coded limit, a random selection, based on 261 file size, or based on server load. 263 3.1. Mirror Priority 265 Mirror servers are listed in order of priority (from most preferred 266 to least) or have a "pri" value, where mirrors with lower values are 267 used first. 269 This is purely an expression of the server's preferences; it is up to 270 the client what it does with this information, particularly with 271 reference to how many servers to use at any one time. A client MUST 272 respect the server's priority ordering, however. 274 [[Would it make more sense to use qvalue-style policies here, i.e. 275 q=1.0 through q=0.0 ?]] 277 3.2. Mirror Geographical Location 279 Mirror servers MAY have a "geo" value, which is a [ISO3166-1] alpha-2 280 two letter country code for the geographical location of the physical 281 server the URI is used to access. A client may use this information 282 to select a mirror, or set of mirrors, that are geographically near 283 (if the client has access to such information), with the aim of 284 reducing network load at inter-country bottlenecks. 286 3.3. Coordinated Mirror Policies 288 There are two types of mirror servers: preferred and normal. 289 Preferred mirror servers are HTTP mirror servers that MUST share the 290 same ETag policy as the originating Metalink server. Preferred 291 mirrors make it possible to detect early on, before data is 292 transferred, if the file requested matches the desired file. 293 Preferred HTTP mirror servers have a "pref" value of 1. By default, 294 if unspecified then mirrors are considered "normal" and do not 295 necessarily share the same ETag policy. FTP mirrors, as they do not 296 emit ETags, are considered "normal". ([draft-ietf-ftpext2-hash] 297 allows for FTP mirrors to be coordinated and provide file hashes). 299 HTTP Mirror servers SHOULD support Instance Digests in HTTP 300 [RFC3230]. Optimally, mirror servers will share the same ETag policy 301 and support Instance Digests in HTTP. 303 [[Suggestion: In order for clients to identify servers that have 304 coordinated ETag policies, the ETag MUST begin with "Metalink:", e.g. 306 ETag: "Metalink:SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=" 308 ]] 310 3.4. Mirror Depth 312 Some mirrors may mirror single files, whole directories, or multiple 313 directories. 315 Mirror servers MAY have a "depth" value, where "depth=0" is the 316 default. A value of 0 means ONLY that file is mirrored. A value of 317 1 means that file and all other files and subdirectories in the 318 directory are mirrored. A value of 2 means the directory above, and 319 all files and subdirectories, are mirrored. For each higher value, 320 another directory closer to the root is mirrored. 322 A mirror with a depth value of 4: 324 Link: ; 325 rel="duplicate"; pri=1; pref=1; depth=4 327 Is the above example, 4 directories up are mirrored, from /dir2/ on 328 down. 330 4. Peer-to-Peer / Metainfo 332 Metainfo files, which describe ways to download a file over Peer-to- 333 Peer networks or otherwise, are specified with the Link header 334 [RFC5988] and a relation type of "describedby" and a type parameter 335 that indicates the MIME type of the metadata available at the URI. 337 A brief Metalink server response with .torrent and .metalink: 339 Link: ; rel="describedby"; 340 type="application/x-bittorrent" 341 Link: ; rel="describedby"; 342 type="application/metalink4+xml" 344 Metalink clients MAY support the use of metainfo files for 345 downloading files. 347 4.1. Metalink/XML Files 349 Full Metalink/XML files for a given resource can be specified as 350 shown in Section 4. This is particularly useful for providing 351 metadata such as cryptographic hashes of parts of a file, allowing a 352 client to recover from partial errors (see Section 7.1.2). 354 5. OpenPGP Signatures 356 OpenPGP signatures [RFC3156] are specified with the Link header 357 [RFC5988] and a relation type of "describedby" and a type parameter 358 of "application/pgp-signature". 360 A brief Metalink server response with OpenPGP signature only: 362 Link: ; rel="describedby"; 363 type="application/pgp-signature" 365 Metalink clients MAY support the use of OpenPGP signatures. 367 6. Cryptographic Hashes of Whole Files 369 Metalink servers MUST provide Instance Digests in HTTP [RFC3230] for 370 files they describe with mirrors. Mirror servers SHOULD as well. 372 A brief Metalink server response with cryptographic hash: 374 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 375 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 377 7. Client / Server Multi-source Download Interaction 379 Metalink clients begin a download with a standard HTTP [RFC2616] GET 380 request to the Metalink server. A Range limit is optional, not 381 required. Alternatively, Metalink clients can begin with a HEAD 382 request to the Metalink server to discover mirrors via Link headers. 383 After that, the client follows with a GET request to the desired 384 mirrors. 386 GET /distribution/example.ext HTTP/1.1 387 Host: www.example.com 389 The Metalink server responds with the data and these headers: 391 HTTP/1.1 200 OK 392 Accept-Ranges: bytes 393 Content-Length: 14867603 394 Content-Type: application/x-cd-image 395 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 396 Link: ; rel="duplicate" pref=1 397 Link: ; rel="duplicate" 398 Link: ; rel="describedby"; 399 type="application/x-bittorrent" 400 Link: ; rel="describedby"; 401 type="application/metalink4+xml" 402 Link: ; rel="describedby"; 403 type="application/pgp-signature" 404 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 405 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 407 From the Metalink server response the client learns some or all of 408 the following metadata about the requested object, in addition to 409 also starting to receive the object: 411 o Object size. 412 o ETag. 413 o Mirror profile link, which may describe the mirror's priority, 414 whether it shares the ETag policy of the originating Metalink 415 server, geographical location, and mirror depth. 416 o Peer-to-peer information. 417 o Metalink/XML, which can include partial file cryptographic hashes 418 to repair a file. 419 o Digital signature. 420 o Instance Digest, which is the whole file cryptographic hash. 422 (Alternatively, the client could have requested a HEAD only, and then 423 skipped to making the following decisions on every available mirror 424 server found via the Link headers) 426 If the object is large and gets delivered slower than expected then 427 the Metalink client starts a number of parallel ranged downloads (one 428 per selected mirror server other than the first) using mirrors 429 provided by the Link header with "duplicate" relation type, using the 430 location of the original GET request in the "Referer" header field. 431 The size and number of ranges requested from each server is for the 432 client to decide, based upon the performance observed from each 433 server. Further discussion of performance considerations is 434 presented in Section 8. 436 If no range limit was given in the original request then work from 437 the tail of the object (the first request is still running and will 438 eventually catch up), otherwise continue after the range requested in 439 the first request. If no Range was provided, the original connection 440 must be terminated once all parts of the resource have been 441 retrieved. It is recommended that a HEAD request is undertaken 442 first, so that the client can find out if there are any Link headers, 443 and then Range-based requests are undertaken to the mirror servers as 444 well as on the original connection. 446 Preferred mirrors have coordinated ETags, as described in 447 Section 3.3, and If-Match conditions based on the ETag SHOULD be used 448 to quickly detect out-of-date mirrors by using the ETag from the 449 Metalink server response. If no indication of ETag syncronisation/ 450 knowledge is given then If-Match should not be used, and optimally 451 there will be an Instance Digest in the mirror response which we can 452 use to detect a mismatch early, and if not then a mismatch won't be 453 detected until the completed object is verified. Early file mismatch 454 detection is described in detail in Section 7.1.1. 456 One of the client requests to a mirror server: 458 GET /example.ext HTTP/1.1 459 Host: www2.example.com 460 Range: bytes=7433802- 461 If-Match: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 462 Referer: http://www.example.com/distribution/example.ext 464 The mirror servers respond with a 206 Partial Content HTTP status 465 code and appropriate "Content-Length" and "Content Range" header 466 fields. The mirror server response, with data, to the above request: 468 HTTP/1.1 206 Partial Content 469 Accept-Ranges: bytes 470 Content-Length: 7433801 471 Content-Range: bytes 7433802-14867602/14867603 472 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 473 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 474 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 476 If the first request was not Range limited then abort it by closing 477 the connection when it catches up with the other parallel downloads 478 of the same object. 480 Downloads from mirrors that do not have the same file size as the 481 Metalink server MUST be aborted. 483 If a Metalink client does not support certain download methods (such 484 as FTP or BitTorrent) that a file is available from, and there are no 485 available download methods that the client supports, then the 486 download will have no way to complete. 488 Once the download has completed, the Metalink client MUST verify the 489 cryptographic hash of the file. 491 7.1. Error Prevention, Detection, and Correction 493 Error prevention, or early file mismatch detection, is possible 494 before file transfers with the use of file sizes, ETags, and 495 cryptographic hashes. Error detection requires Instance Digests, or 496 cryptographic hashes, to determine after transfers if there has been 497 an error. Error correction, or download repair, is possible with 498 partial file cryptographic hashes. 500 Note that cyptographic hashes obtained from Instance Digests are in 501 base64 encoding, while those from Metalink/XML and FTP HASH are in 502 hexadecimal. 504 7.1.1. Error Prevention (Early File Mismatch Detection) 506 In HTTP terms, the requirement is that merging of ranges from 507 multiple responses must be verified with a strong validator, which in 508 this context is the same as either Instance Digest or a strong ETag. 509 In most cases it is sufficient that the Metalink server provides 510 mirrors and Instance Digest information, but operation will be more 511 robust and efficient if the mirror servers do implement a 512 synchronized ETag as well. In fact, the emitted ETag may be 513 implemented the same as the Instance Digest for simplicity, but there 514 is no need to specify how the ETag is generated, just that it needs 515 to be shared among the mirror servers. If the mirror server provides 516 neither synchronized ETag or Instance Digest, then early detection of 517 mismatches is not possible unless file length also differs. Finally, 518 the error is still detectable, after the download has completed, when 519 the merged response is verified. 521 ETags can not be used for verifying the integrity of the received 522 content. But it is a guarantee issued by the Metalink server that 523 the content is correct for that ETag. And if the ETag given by the 524 mirror server matches the ETag given by the master server, then we 525 have a chain of trust where the master server authorizes these 526 responses as valid for that object. 528 This guarantees that a mismatch will be detected by using only the 529 synchronized ETag from a master server and mirror server, even 530 alerted by the mirror servers themselves by responding with an error, 531 preventing accidental merges of ranges from different versions of 532 files with the same name. This even includes many malicious attacks 533 where the data on the mirror has been replaced by some other file, 534 but not all. 536 Synchronized ETag can not strictly protect against malicious attacks 537 or server or network errors replacing content, but neither can 538 Instance Digest on the mirror servers as the attacker most certainly 539 can make the server seemingly respond with the expected Instance 540 Digest even if the file contents have been modified, just as he can 541 with ETag, and the same for various system failures also causing bad 542 data to be returned. The Metalink client has to rely on the Instance 543 Digest returned by the Metalink master server in the first response 544 for the verification of the downloaded object as a whole. 546 If the mirror servers do return an Instance Digest, then that is a 547 bonus, just as having them return the right set of Link headers is. 548 The set of trusted mirrors doing that can be substituted as master 549 servers accepting the initial request if one likes. 551 The benefit of having slave mirror servers (those not trusted as 552 masters) return Instance Digest is that the client then can detect 553 mismatches early even if ETag is not used. Both ETag and slave 554 mirror Instance Digest do provide value, but just one is sufficient 555 for early detection of mismatches. If none is provided then early 556 detection of mismatches is not possible unless the file length also 557 differs, but the error is still detected when the merged response is 558 verified. 560 If FTP servers support the FTP HASH command [draft-ietf-ftpext2-hash] 561 and the same hash algorithm as the originating Metalink server, then 562 that information can be used for early file mismatch detection. 564 7.1.2. Error Correction 566 Partial file cryptographic hashes can be used to detect errors during 567 the download. Metalink servers are not required to offer partial 568 file cryptographic hashes, but they are encouraged to do so. 570 If the object cryptographic hash does not match the Instance Digest 571 then fetch the Metalink/XML as specified in Section 4.1, where 572 partial file cryptographic hashes may be found, allowing detection of 573 which server returned incorrect data. If the Instance Digest 574 computation does not match then the client needs to fetch the partial 575 file cryptographic hashes, if available, and from there figure out 576 what of the downloaded data can be recovered and what needs to be 577 fetched again. If no partial cryptographic hashes are available, 578 then the client MUST fetch the complete object from other mirrors. 580 8. Multi-server Performance 582 When opting to download simultaneously from multiple mirrors, there 583 are a number of factors (both within and outside the influence of the 584 client software) that are relevant to the performance achieved: 586 o The number of servers used simultaneously. 587 o The ability to pipeline sufficient or sufficiently large range 588 requests to each server so as to avoid connections going idle. 589 o The ability to pipeline sufficiently few or sufficiently small 590 range requests to servers so that all the servers finish their 591 final chunks simultaneously. 592 o The ability to switch between mirrors dynamically so as to use the 593 fastest mirrors at any moment in time 595 Obviously we do not want to use too many simultaneous connections, or 596 other traffic sharing a bottleneck link will be starved. But at the 597 same time, good performance requires that the client can 598 simultaneously download from at least one fast mirror while exploring 599 whether any other mirror is faster. Based on laboratory experiments, 600 we suggest a good default number of simultaneous connections is 601 probably four, with three of these being used for the best three 602 mirrors found so far, and one being used to evaluate whether any 603 other mirror might offer better performance. 605 The size of chunks chosen by the client should be sufficiently large 606 that the chunk request headers and reponse headers represent neglible 607 overhead, and sufficiently large that they can be pipelined 608 effectively without needing a very high rate of chunk requests. At 609 the same time, the amount of time wasted waiting for the last chunk 610 to download from the last server after all the other servers have 611 finished should be minimized. Thus we currently recommend that a 612 chunk size of at least 10KBytes should be used. If the file being 613 transfered is very large, or the download speed very high, this can 614 be increased to perhaps 1MByte. As network bandwidths increase, we 615 expect these numbers to increase appropriately, so that the time to 616 transfer a chunk remains significantly larger than the latency of 617 requesting a chunk from a server. 619 9. IANA Considerations 621 Accordingly, IANA has made the following registration to the Link 622 Relation Type registry. 624 o Relation Name: duplicate 626 o Description: Refers to a resource whose available representations 627 are byte-for-byte identical with the corresponding representations of 628 the context IRI. 630 o Reference: This specification. 632 o Notes: This relation is for static resources. That is, an HTTP GET 633 request on any duplicate will return the same representation. It 634 does not make sense for dynamic or POSTable resources and should not 635 be used for them. 637 10. Security Considerations 639 10.1. URIs and IRIs 641 Metalink clients handle URIs and IRIs. See Section 7 of [RFC3986] 642 and Section 8 of [RFC3987] for security considerations related to 643 their handling and use. 645 10.2. Spoofing 647 There is potential for spoofing attacks where the attacker publishes 648 Metalinks with false information. In that case, this could deceive 649 unaware downloaders that they are downloading a malicious or 650 worthless file. Also, malicious publishers could attempt a 651 distributed denial of service attack by inserting unrelated URIs into 652 Metalinks. 654 10.3. Cryptographic Hashes 656 Currently, some of the digest values defined in Instance Digests in 657 HTTP [RFC3230] are considered insecure. These include the whole 658 Message Digest family of algorithms which are not suitable for 659 cryptographically strong verification. Malicious people could 660 provide files that appear to be identical to another file because of 661 a collision, i.e. the weak cryptographic hashes of the intended file 662 and a substituted malicious file could match. 664 If a Metalink contains whole file hashes as described in Section 6, 665 it SHOULD include SHA-256, as specified in [FIPS-180-3], or stronger. 666 It MAY also include other hashes. 668 10.4. Signing 670 Metalinks should include digital signatures, as described in 671 Section 5. 673 Digital signatures provide authentication, message integrity, and 674 non-repudiation with proof of origin. 676 11. Normative References 678 [BITTORRENT] 679 Cohen, B., "The BitTorrent Protocol Specification", 680 BITTORRENT 11031, February 2008, 681 . 683 [FIPS-180-3] 684 National Institute of Standards and Technology (NIST), 685 "Secure Hash Standard (SHS)", FIPS PUB 180-3, 686 October 2008. 688 [ISO3166-1] 689 International Organization for Standardization, "ISO 3166- 690 1:2006. Codes for the representation of names of 691 countries and their subdivisions -- Part 1: Country 692 codes", November 2006. 694 [RFC0959] Postel, J. and J. Reynolds, "File Transfer Protocol", 695 STD 9, RFC 0959, October 1985. 697 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 698 Requirement Levels", BCP 14, RFC 2119, March 1997. 700 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 701 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 702 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 704 [RFC3156] Elkins, M., Del Torto, D., Levien, R., and T. Roessler, 705 "MIME Security with OpenPGP", RFC 3156, August 2001. 707 [RFC3230] Mogul, J. and A. Van Hoff, "Instance Digests in HTTP", 708 RFC 3230, January 2002. 710 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 711 Resource Identifier (URI): Generic Syntax", STD 66, 712 RFC 3986, January 2005. 714 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 715 Identifiers (IRIs)", RFC 3987, January 2005. 717 [RFC5854] Bryan, A., Tsujikawa, T., McNab, N., and P. Poeml, "The 718 Metalink Download Description Format", RFC 5854, 719 June 2010. 721 [RFC5988] Nottingham, M., "Web Linking", RFC 5988, October 2010. 723 [draft-ietf-ftpext2-hash] 724 Bryan, A., Kosse, T., and D. Stenberg, "FTP Extensions for 725 Cryptographic Hashes", draft-ietf-ftpext2-hash-00 (work in 726 progress), November 2010. 728 Appendix A. Acknowledgements and Contributors 730 Thanks to the Metalink community, Mark Nottingham, Daniel Stenberg, 731 Alexey Melnikov, Matt Domsch, Micah Cowan, and David Morris. 733 Mark Handley and Javier Vela Diago did work on simultaneous download 734 from multiple mirrors, which also provided validation of the benefits 735 of this approach. 737 Appendix B. Comparisons to Similar Options 739 [[ to be removed by the RFC editor before publication as an RFC. ]] 741 This draft, compared to the Metalink/XML format [RFC5854] : 743 o (+) Reuses existing HTTP standards without much new besides a Link 744 Relation Type. It's more of a collection/coordinated feature set. 745 o (?) The existing standards don't seem to be widely implemented. 746 o (+) No XML dependency, except for Metalink/XML for partial file 747 cryptographic hashes. 748 o (+) Existing Metalink/XML clients can be easily converted to 749 support this as well. 750 o (+) Coordination of mirror servers is preferred, but not required. 751 Coordination may be difficult or impossible unless you are in 752 control of all servers on the mirror network. 753 o (-) Requires software or configuration changes to originating 754 server. 755 o (-?) Tied to HTTP, not as generic. FTP/P2P clients won't be 756 using it unless they also support HTTP, unlike Metalink/XML. 757 o (-) Requires server-side support. Metalink/XML can be created by 758 user (or server, but server component/changes not required). 759 o (-) Also, Metalink/XML files are easily mirrored on all servers. 760 Even if usage in that case is not as transparent, this method 761 still gives access to all download information (with no changes 762 needed to servers) from all mirrors (FTP included). 763 o (-) Not portable/archivable/emailable. Metalink/XML is used to 764 import/export transfer queues. Not as easy for search engines to 765 index? 766 o (-) Not as rich metadata. 767 o (-) Not able to add multiple files to a download queue or create 768 directory structure. 770 Appendix C. Document History 772 [[ to be removed by the RFC editor before publication as an RFC. ]] 774 Known issues concerning this draft: 775 o Some organizations have many mirrors. Should all be sent, or only 776 a certain number? All should be included in the Metalink/XML, if 777 used. 778 o Would it make more sense to use qvalue-style policies to describe 779 mirror priority, i.e. q=1.0 through q=0.0 ? 780 o Using Metalink/XML for partial file cryptographic hashes. That 781 adds XML dependency to apps for an important feature. Is there a 782 better method? 784 o Do we need an "official" MIME type for .torrent files or keep 785 using "application/x-bittorrent"? 787 -18 : November , 2010. 788 o More realistic requirements. 790 -17 : September 13 , 2010. 791 o RFC 5854 Metalink/XML. 793 -16 : April 16, 2010. 794 o Add draft-ietf-ftpext2-hash reference and FTP mirror coordination. 796 -15 : February 20, 2010. 797 o Update references and terminology. 799 -14 : December 31, 2009. 800 o Baseline file hash: SHA-256. 802 -13 : November 22, 2009. 803 o Metalink/XML for partial file cryptographic hashes. 805 -12 : November 11, 2009. 806 o Clarifications. 808 -11 : October 23, 2009. 809 o Mirror changes. 811 -10 : October 15, 2009. 812 o Mirror coordination changes. 814 -09 : October 13, 2009. 815 o Mirror location, coordination, and depth. 816 o Split HTTP Digest Algorithm Values Registration into 817 draft-bryan-http-digest-algorithm-values-update. 819 -08 : October 4, 2009. 820 o Clarifications. 822 -07 : September 29, 2009. 823 o Preferred mirror servers. 825 -06 : September 24, 2009. 826 o Add Mismatch Detection, Error Recovery, and Digest Algorithm 827 values. 828 o Remove Content-MD5 and Want-Digest. 830 -05 : September 19, 2009. 832 o ETags, preferably matching the Instance Digests. 834 -04 : September 17, 2009. 835 o Temporarily remove .torrent. 837 -03 : September 16, 2009. 838 o Mention HEAD request, negotiate mirrors if Want-Digest is used. 840 -02 : September 7, 2009. 841 o Content-MD5 for partial file cryptographic hashes. 843 -01 : September 1, 2009. 844 o Link Relation Type Registration: "duplicate" 846 -00 : August 24, 2009. 847 o Initial draft. 849 Authors' Addresses 851 Anthony Bryan 852 Pompano Beach, FL 853 USA 855 Email: anthonybryan@gmail.com 856 URI: http://www.metalinker.org 858 Neil McNab 860 Email: neil@nabber.org 861 URI: http://www.nabber.org 863 Henrik Nordstrom 865 Email: henrik@henriknordstrom.net 866 URI: http://www.henriknordstrom.net/ 868 Tatsuhiro Tsujikawa 869 Shiga 870 Japan 872 Email: tatsuhiro.t@gmail.com 873 URI: http://aria2.sourceforge.net 874 Dr. med. Peter Poeml 875 MirrorBrain 876 Venloer Str. 317 877 Koeln 50823 878 DE 880 Phone: +49 221 6778 333 8 881 Email: peter@poeml.de 882 URI: http://mirrorbrain.org/~poeml/ 884 Alan Ford 885 Roke Manor Research 886 Old Salisbury Lane 887 Romsey, Hampshire SO51 0ZN 888 UK 890 Phone: +44 1794 833 465 891 Email: alan.ford@roke.co.uk