idnits 2.17.1 draft-bryan-metalinkhttp-15.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 20, 2010) is 5179 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'FIPS-180-3' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 3230 (Obsoleted by RFC 9530) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Bryan 3 Internet-Draft N. McNab 4 Intended status: Standards Track H. Nordstrom 5 Expires: August 24, 2010 6 A. Ford 7 Roke Manor Research 8 February 20, 2010 10 Metalink/HTTP: Mirrors and Cryptographic Hashes in HTTP Headers 11 draft-bryan-metalinkhttp-15 13 Abstract 15 This document specifies Metalink/HTTP: Mirrors and Cryptographic 16 Hashes in HTTP Headers, a different way to get information that is 17 usually contained in the Metalink XML-based download description 18 format. Metalink/HTTP describes multiple download locations 19 (mirrors), Peer-to-Peer, cryptographic hashes, digital signatures, 20 and other information using existing standards for HTTP headers. 21 Clients can transparently use this information to make file transfers 22 more robust and reliable. 24 Status of this Memo 26 This Internet-Draft is submitted to IETF in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF), its areas, and its working groups. Note that 31 other groups may also distribute working documents as Internet- 32 Drafts. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 The list of current Internet-Drafts can be accessed at 40 http://www.ietf.org/ietf/1id-abstracts.txt. 42 The list of Internet-Draft Shadow Directories can be accessed at 43 http://www.ietf.org/shadow.html. 45 This Internet-Draft will expire on August 24, 2010. 47 Copyright Notice 48 Copyright (c) 2010 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.1. Operation Overview . . . . . . . . . . . . . . . . . . . . 4 65 1.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 4 66 1.3. Notational Conventions . . . . . . . . . . . . . . . . . . 5 67 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 5 68 3. Mirrors / Multiple Download Locations . . . . . . . . . . . . 6 69 3.1. Mirror Priority . . . . . . . . . . . . . . . . . . . . . 6 70 3.2. Mirror Geographical Location . . . . . . . . . . . . . . . 6 71 3.3. Coordinated Mirror Policies . . . . . . . . . . . . . . . 7 72 3.4. Mirror Depth . . . . . . . . . . . . . . . . . . . . . . . 7 73 4. Peer-to-Peer / Metainfo . . . . . . . . . . . . . . . . . . . 7 74 4.1. Metalink/XML Files . . . . . . . . . . . . . . . . . . . . 8 75 5. OpenPGP Signatures . . . . . . . . . . . . . . . . . . . . . . 8 76 6. Cryptographic Hashes of Whole Files . . . . . . . . . . . . . 8 77 7. Client / Server Multi-source Download Interaction . . . . . . 9 78 7.1. Error Prevention, Detection, and Correction . . . . . . . 11 79 7.1.1. Error Prevention (Early File Mismatch Detection) . . . 11 80 7.1.2. Error Correction . . . . . . . . . . . . . . . . . . . 12 81 8. Multi-server Performance . . . . . . . . . . . . . . . . . . . 13 82 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 83 10. Security Considerations . . . . . . . . . . . . . . . . . . . 14 84 10.1. URIs and IRIs . . . . . . . . . . . . . . . . . . . . . . 14 85 10.2. Spoofing . . . . . . . . . . . . . . . . . . . . . . . . . 14 86 10.3. Cryptographic Hashes . . . . . . . . . . . . . . . . . . . 14 87 10.4. Signing . . . . . . . . . . . . . . . . . . . . . . . . . 15 88 11. Normative References . . . . . . . . . . . . . . . . . . . . . 15 89 Appendix A. Acknowledgements and Contributors . . . . . . . . . . 16 90 Appendix B. Comparisons to Similar Options . . . . . . . . . . . 16 91 Appendix C. Document History . . . . . . . . . . . . . . . . . . 17 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 94 1. Introduction 96 Metalink/HTTP is an alternative representation of Metalink 97 information, which is usually presented as an XML-based document 98 format [draft-bryan-metalink]. Metalink/HTTP attempts to provide as 99 much functionality as the Metalink/XML format by using existing 100 standards such as Web Linking [draft-nottingham-http-link-header], 101 Instance Digests in HTTP [RFC3230], and ETags. Metalink/HTTP is used 102 to list information about a file to be downloaded. This can include 103 lists of multiple URIs (mirrors), Peer-to-Peer information, 104 cryptographic hashes, and digital signatures. 106 Identical copies of a file are frequently accessible in multiple 107 locations on the Internet over a variety of protocols (such as FTP, 108 HTTP, and Peer-to-Peer). In some cases, users are shown a list of 109 these multiple download locations (mirrors) and must manually select 110 a single one on the basis of geographical location, priority, or 111 bandwidth. This distributes the load across multiple servers, and 112 should also increase throughput and resilience. At times, however, 113 individual servers can be slow, outdated, or unreachable, but this 114 can not be determined until the download has been initiated. Users 115 will rarely have sufficient information to choose the most 116 appropriate server, and will often choose the first in a list which 117 may not be optimal for their needs, and will lead to a particular 118 server getting a disproportionate share of load. The use of 119 suboptimal mirrors can lead to the user canceling and restarting the 120 download to try to manually find a better source. During downloads, 121 errors in transmission can corrupt the file. There are no easy ways 122 to repair these files. For large downloads this can be extremely 123 troublesome. Any of the number of problems that can occur during a 124 download lead to frustration on the part of users. 126 Some popular sites automate the process of selecting mirrors using 127 DNS load balancing, both to approximately balance load between 128 servers, and to direct clients to nearby servers with the hope that 129 this improves throughput. Indeed, DNS load balancing can balance 130 long-term server load fairly effectively, but it is less effective at 131 delivering the best throughput to users when the bottleneck is not 132 the server but the network. 134 This document describes a mechanism by which the benefit of mirrors 135 can be automatically and more effectively realized. All the 136 information about a download, including mirrors, cryptographic 137 hashes, digital signatures, and more can be transferred in 138 coordinated HTTP Headers. This Metalink transfers the knowledge of 139 the download server (and mirror database) to the client. Clients can 140 fallback to other mirrors if the current one has an issue. With this 141 knowledge, the client is enabled to work its way to a successful 142 download even under adverse circumstances. All this is done 143 transparently to the user and the download is much more reliable and 144 efficient. In contrast, a traditional HTTP redirect to a mirror 145 conveys only extremely minimal information - one link to one server, 146 and there is no provision in the HTTP protocol to handle failures. 147 Furthermore, in order to provide better load distribution across 148 servers and potentially faster downloads to users, Metalink/HTTP 149 facilitates multi-source downloads, where portions of a file are 150 downloaded from multiple mirrors (and optionally, Peer-to-Peer) 151 simultaneously. 153 [[ Discussion of this draft should take place on IETF HTTP WG mailing 154 list at ietf-http-wg@w3.org or the Metalink discussion mailing list 155 located at metalink-discussion@googlegroups.com. To join the list, 156 visit http://groups.google.com/group/metalink-discussion . ]] 158 1.1. Operation Overview 160 Detailed discussion of Metalink operation is covered in Section 2; 161 this section will present a very brief, high-level overview of how 162 Metalink achieves its goals. 164 Upon connection to a Metalink/HTTP server, a client will receive 165 information about other sources of the same resource and a 166 cryptographic hash of the whole resource. The client will then be 167 able to request chunks of the file from the various sources, 168 scheduling appropriately in order to maximise the download rate. 170 1.2. Examples 172 A brief Metalink server response with ETag, mirrors, .metalink, 173 OpenPGP signature, and a cryptographic hash of the whole file: 175 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 176 Link: ; rel="duplicate" 177 Link: ; rel="duplicate" 178 Link: ; rel="describedby"; 179 type="application/x-bittorrent" 180 Link: ; rel="describedby"; 181 type="application/metalink4+xml" 182 Link: ; rel="describedby"; 183 type="application/pgp-signature" 184 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 185 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 187 1.3. Notational Conventions 189 This specification describes conformance of Metalink/HTTP. 191 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 192 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 193 document are to be interpreted as described in BCP 14, [RFC2119], as 194 scoped to those conformance targets. 196 2. Requirements 198 In this context, "Metalink" refers to Metalink/HTTP which consists of 199 mirrors and cryptographic hashes in HTTP Headers as described in this 200 document. "Metalink/XML" refers to the XML format described in 201 [draft-bryan-metalink]. 203 Metalink resources include a Link header 204 [draft-nottingham-http-link-header] to present a list of mirrors in 205 the response to a client request for the resource. The cryptographic 206 hash of a resource must be included via Instance Digests in HTTP 207 [RFC3230]. 209 Metalink servers are HTTP servers with one or more Metalink 210 resources. Mirror and cryptographic hash information provided by the 211 originating Metalink server MUST be considered authoritative. 212 Metalink servers and their associated mirror servers SHOULD all share 213 the same ETag policy (ETag Synchronization), i.e. based on the file 214 contents (cryptographic hash) and not server-unique filesystem 215 metadata. The emitted ETag MAY be implemented the same as the 216 Instance Digest for simplicity. Metalink servers MAY offer Metalink/ 217 XML documents that contain cryptographic hashes of parts of the file 218 and other information. 220 Mirror servers are typically FTP or HTTP servers that "mirror" 221 another server. That is, they provide identical copies of (at least 222 some) files that are also on the mirrored server. Mirror servers MAY 223 be Metalink servers. Mirror servers MUST support serving partial 224 content. HTTP mirror servers SHOULD share the same ETag policy as 225 the originating Metalink server. HTTP Mirror servers SHOULD support 226 Instance Digests in HTTP [RFC3230]. 228 Metalink clients use the mirrors provided by a Metalink server with 229 Link header [draft-nottingham-http-link-header]. Metalink clients 230 MUST support HTTP and MAY support FTP, BitTorrent, or other download 231 methods. Metalink clients MUST switch downloads from one mirror to 232 another if the mirror becomes unreachable. Metalink clients SHOULD 233 support multi-source, or parallel, downloads, where portions of a 234 file are downloaded from multiple mirrors simultaneously (and 235 optionally, from Peer-to-Peer sources). Metalink clients MUST 236 support Instance Digests in HTTP [RFC3230] by requesting and 237 verifying cryptographic hashes. Metalink clients MAY make use of 238 digital signatures if they are offered. 240 3. Mirrors / Multiple Download Locations 242 Mirrors are specified with the Link header 243 [draft-nottingham-http-link-header] and a relation type of 244 "duplicate" as defined in Section 9. 246 A brief Metalink server response with two mirrors only: 248 Link: ; rel="duplicate"; 249 pri=1; pref=1 250 Link: ; rel="duplicate"; 251 pri=2; geo="gb"; depth=1 253 [[Some organizations have many mirrors. Only send a few mirrors, or 254 only use the Link header if Want-Digest is used?]] 256 It is up to the server to choose how many Link headers to send. Such 257 a decision could be a hard-coded limit, a random selection, based on 258 file size, or based on server load. 260 3.1. Mirror Priority 262 Mirror servers are listed in order of priority (from most preferred 263 to least) or have a "pri" value, where mirrors with lower values are 264 used first. 266 This is purely an expression of the server's preferences; it is up to 267 the client what it does with this information, particularly with 268 reference to how many servers to use at any one time. A client MUST 269 respect the server's priority ordering, however. 271 [[Would it make more sense to use qvalue-style policies here, i.e. 272 q=1.0 through q=0.0 ?]] 274 3.2. Mirror Geographical Location 276 Mirror servers MAY have a "geo" value, which is a [ISO3166-1] alpha-2 277 two letter country code for the geographical location of the physical 278 server the URI is used to access. A client may use this information 279 to select a mirror, or set of mirrors, that are geographically near 280 (if the client has access to such information), with the aim of 281 reducing network load at inter-country bottlenecks. 283 3.3. Coordinated Mirror Policies 285 There are two types of mirror servers: preferred and normal. 286 Preferred mirror servers are HTTP mirror servers that MUST share the 287 same ETag policy as the originating Metalink server. Optimally, they 288 will do both. Preferred mirrors make it possible to detect early on, 289 before data is transferred, if the file requested matches the desired 290 file. Preferred HTTP mirror servers have a "pref" value of 1. By 291 default, if unspecified then mirrors are considered "normal" and do 292 not share the same ETag policy. FTP mirrors, as they do not emit 293 ETags, MUST always be considered "normal". 295 HTTP Mirror servers SHOULD support Instance Digests in HTTP 296 [RFC3230]. 298 [[Suggestion: In order for clients to identify servers that have 299 coordinated ETag policies, the ETag MUST begin with "Metalink:", e.g. 301 ETag: "Metalink:SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=" 303 ]] 305 3.4. Mirror Depth 307 Some mirrors may mirror single files, whole directories, or multiple 308 directories. 310 Mirror servers MAY have a "depth" value, where "depth=0" is the 311 default. A value of 0 means ONLY that file is mirrored. A value of 312 1 means that file and all other files and subdirectories in the 313 directory are mirrored. A value of 2 means the directory above, and 314 all files and subdirectories, are mirrored. 316 A mirror with a depth value of 4: 318 Link: ; 319 rel="duplicate"; pri=1; pref=1; depth=4 321 Is the above example, 4 directories up are mirrored, from /dir2/ on 322 down. 324 4. Peer-to-Peer / Metainfo 326 Metainfo files, which describe ways to download a file over Peer-to- 327 Peer networks or otherwise, are specified with the Link header 329 [draft-nottingham-http-link-header] and a relation type of 330 "describedby" and a type parameter that indicates the MIME type of 331 the metadata available at the URI. 333 A brief Metalink server response with .torrent and .metalink: 335 Link: ; rel="describedby"; 336 type="application/x-bittorrent" 337 Link: ; rel="describedby"; 338 type="application/metalink4+xml" 340 Metalink clients MAY support the use of metainfo files for 341 downloading files. 343 4.1. Metalink/XML Files 345 Full Metalink/XML files for a given resource can be specified as 346 shown in Section 4. This is particularly useful for providing 347 metadata such as cryptographic hashes of parts of a file, allowing a 348 client to recover from partial errors (see Section 7.1.2). 350 5. OpenPGP Signatures 352 OpenPGP signatures are specified with the Link header 353 [draft-nottingham-http-link-header] and a relation type of 354 "describedby" and a type parameter of "application/pgp-signature". 356 A brief Metalink server response with OpenPGP signature only: 358 Link: ; rel="describedby"; 359 type="application/pgp-signature" 361 Metalink clients MAY support the use of OpenPGP signatures. 363 6. Cryptographic Hashes of Whole Files 365 Metalink servers MUST provide Instance Digests in HTTP [RFC3230] for 366 files they describe with mirrors. Mirror servers SHOULD as well. 368 A brief Metalink server response with cryptographic hash: 370 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 371 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 373 7. Client / Server Multi-source Download Interaction 375 Metalink clients begin a download with a standard HTTP [RFC2616] GET 376 request to the Metalink server. A Range limit is optional, not 377 required. Alternatively, Metalink clients can begin with a HEAD 378 request to the Metalink server to discover mirrors via Link headers. 379 After that, the client follows with a GET request to the desired 380 mirrors. 382 GET /distribution/example.ext HTTP/1.1 383 Host: www.example.com 385 The Metalink server responds with the data and these headers: 387 HTTP/1.1 200 OK 388 Accept-Ranges: bytes 389 Content-Length: 14867603 390 Content-Type: application/x-cd-image 391 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 392 Link: ; rel="duplicate" pref=1 393 Link: ; rel="duplicate" 394 Link: ; rel="describedby"; 395 type="application/x-bittorrent" 396 Link: ; rel="describedby"; 397 type="application/metalink4+xml" 398 Link: ; rel="describedby"; 399 type="application/pgp-signature" 400 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 401 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 403 From the Metalink server response the client learns some or all of 404 the following metadata about the requested object, in addition to 405 also starting to receive the object: 407 o Object size. 408 o ETag. 409 o Mirror profile link, which may describe the mirror's priority, 410 whether it shares the ETag policy of the originating Metalink 411 server, geographical location, and mirror depth. 412 o Peer-to-peer information. 413 o Metalink/XML, which can include partial file cryptographic hashes 414 to repair a file. 415 o Digital signature. 416 o Instance Digest, which is the whole file cryptographic hash. 418 (Alternatively, the client could have requested a HEAD only, and then 419 skipped to making the following decisions on every available mirror 420 server found via the Link headers) 422 If the object is large and gets delivered slower than expected then 423 the Metalink client starts a number of parallel ranged downloads (one 424 per selected mirror server other than the first) using mirrors 425 provided by the Link header with "duplicate" relation type, using the 426 location of the original GET request in the "Referer" header field. 427 The size and number of ranges requested from each server is for the 428 client to decide, based upon the performance observed from each 429 server. Further discussion of performance considerations is 430 presented in Section 8. 432 If no range limit was given in the original request then work from 433 the tail of the object (the first request is still running and will 434 eventually catch up), otherwise continue after the range requested in 435 the first request. If no Range was provided, the original connection 436 must be terminated once all parts of the resource have been 437 retrieved. It is recommended that a HEAD request is undertaken 438 first, so that the client can find out if there are any Link headers, 439 and then Range-based requests are undertaken to the mirror servers as 440 well as on the original connection. 442 Preferred mirrors have coordinated ETags, as described in 443 Section 3.3, and If-Match conditions based on the ETag SHOULD be used 444 to quickly detect out-of-date mirrors by using the ETag from the 445 Metalink server response. If no indication of ETag syncronisation/ 446 knowledge is given then If-Match should not be used, and optimally 447 there will be an Instance Digest in the mirror response which we can 448 use to detect a mismatch early, and if not then a mismatch won't be 449 detected until the completed object is verified. Early file mismatch 450 detection is described in detail in Section 7.1.1. 452 One of the client requests to a mirror server: 454 GET /example.ext HTTP/1.1 455 Host: www2.example.com 456 Range: bytes=7433802- 457 If-Match: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 458 Referer: http://www.example.com/distribution/example.ext 460 The mirror servers respond with a 206 Partial Content HTTP status 461 code and appropriate "Content-Length" and "Content Range" header 462 fields. The mirror server response, with data, to the above request: 464 HTTP/1.1 206 Partial Content 465 Accept-Ranges: bytes 466 Content-Length: 7433801 467 Content-Range: bytes 7433802-14867602/14867603 468 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 469 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 470 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 472 If the first request was not Range limited then abort it by closing 473 the connection when it catches up with the other parallel downloads 474 of the same object. 476 Downloads from mirrors that do not have the same file size as the 477 Metalink server MUST be aborted. 479 Once the download has completed, the Metalink client MUST verify the 480 cryptographic hash of the file. 482 7.1. Error Prevention, Detection, and Correction 484 Error prevention, or early file mismatch detection, is possible 485 before file transfers with the use of file sizes, ETags, and Instance 486 Digests. Error dectection requires Instance Digests, or 487 cryptographic hashes, to determine after transfers if there has been 488 an error. Error correction, or download repair, is possible with 489 partial file cryptographic hashes. 491 7.1.1. Error Prevention (Early File Mismatch Detection) 493 In HTTP terms, the requirement is that merging of ranges from 494 multiple responses must be verified with a strong validator, which in 495 this context is the same as either Instance Digest or a strong ETag. 496 In most cases it is sufficient that the Metalink server provides 497 mirrors and Instance Digest information, but operation will be more 498 robust and efficient if the mirror servers do implement a 499 synchronized ETag as well. In fact, the emitted ETag may be 500 implemented the same as the Instance Digest for simplicity, but there 501 is no need to specify how the ETag is generated, just that it needs 502 to be shared among the mirror servers. If the mirror server provides 503 neither synchronized ETag or Instance Digest, then early detection of 504 mismatches is not possible unless file length also differs. Finally, 505 the error is still detectable, after the download has completed, when 506 the merged response is verified. 508 ETag can not be used for verifying the integrity of the received 509 content. But it is a guarantee issued by the Metalink server that 510 the content is correct for that ETag. And if the ETag given by the 511 mirror server matches the ETag given by the master server, then we 512 have a chain of trust where the master server authorizes these 513 responses as valid for that object. 515 This guarantees that a mismatch will be detected by using only the 516 synchronized ETag from a master server and mirror server, even 517 alerted by the mirror servers themselves by responding with an error, 518 preventing accidental merges of ranges from different versions of 519 files with the same name. This even includes many malicious attacks 520 where the data on the mirror has been replaced by some other file, 521 but not all. 523 Synchronized ETag can not strictly protect against malicious attacks 524 or server or network errors replacing content, but neither can 525 Instance Digest on the mirror servers as the attacker most certainly 526 can make the server seemingly respond with the expected Instance 527 Digest even if the file contents have been modified, just as he can 528 with ETag, and the same for various system failures also causing bad 529 data to be returned. The Metalink client has to rely on the Instance 530 Digest returned by the Metalink master server in the first response 531 for the verification of the downloaded object as a whole. 533 If the mirror servers do return an Instance Digest, then that is a 534 bonus, just as having them return the right set of Link headers is. 535 The set of trusted mirrors doing that can be substituted as master 536 servers accepting the initial request if one likes. 538 The benefit of having slave mirror servers (those not trusted as 539 masters) return Instance Digest is that the client then can detect 540 mismatches early even if ETag is not used. Both ETag and slave 541 mirror Instance Digest do provide value, but just one is sufficient 542 for early detection of mismatches. If none is provided then early 543 detection of mismatches is not possible unless the file length also 544 differs, but the error is still detected when the merged response is 545 verified. 547 7.1.2. Error Correction 549 Partial file cryptographic hashes can be used to detect errors during 550 the download. Metalink servers are not required to offer partial 551 file cryptographic hashes, but they are encouraged to do so. 553 If the object cryptographic hash does not match the Instance Digest 554 then fetch the Metalink/XML as specified in Section 4.1, where 555 partial file cryptographic hashes may be found, allowing detection of 556 which server returned incorrect data. If the Instance Digest 557 computation does not match then the client needs to fetch the partial 558 file cryptographic hashes, if available, and from there figure out 559 what of the downloaded data can be recovered and what needs to be 560 fetched again. If no partial cryptographic hashes are available, 561 then the client MUST fetch the complete object from other mirrors. 563 8. Multi-server Performance 565 When opting to download simultaneously from multiple mirrors, there 566 are a number of factors (both within and outside the influence of the 567 client software) that are relevant to the performance achieved: 569 o The number of servers used simultaneously. 570 o The ability to pipeline sufficient or sufficiently large range 571 requests to each server so as to avoid connections going idle. 572 o The ability to pipeline sufficiently few or sufficiently small 573 range requests to servers so that all the servers finish their 574 final chunks simultaneously. 575 o The ability to switch between mirrors dynamically so as to use the 576 fastest mirrors at any moment in time 578 Obviously we do not want to use too many simultaneous connections, or 579 other traffic sharing a bottleneck link will be starved. But at the 580 same time, good performance requires that the client can 581 simultaneously download from at least one fast mirror while exploring 582 whether any other mirror is faster. Based on laboratory experiments, 583 we suggest a good default number of simultaneous connections is 584 probably four, with three of these being used for the best three 585 mirrors found so far, and one being used to evaluate whether any 586 other mirror might offer better performance. 588 The size of chunks chosen by the client should be sufficiently large 589 that the chunk request headers and reponse headers represent neglible 590 overhead, and sufficiently large that they can be pipelined 591 effectively without needing a very high rate of chunk requests. At 592 the same time, the amount of time wasted waiting for the last chunk 593 to download from the last server after all the other servers have 594 finished should be minimized. Thus we currently recommend that a 595 chunk size of at least 10KBytes should be used. If the file being 596 transfered is very large, or the download speed very high, this can 597 be increased to perhaps 1MByte. As network bandwidths increase, we 598 expect these numbers to increase appropriately, so that the time to 599 transfer a chunk remains significantly larger than the latency of 600 requesting a chunk from a server. 602 9. IANA Considerations 604 Accordingly, IANA has made the following registration to the Link 605 Relation Type registry. 607 o Relation Name: duplicate 609 o Description: Refers to a resource whose available representations 610 are byte-for-byte identical with the corresponding representations of 611 the context IRI. 613 o Reference: This specification. 615 o Notes: This relation is for static resources. That is, an HTTP GET 616 request on any duplicate will return the same representation. It 617 does not make sense for dynamic or POSTable resources and should not 618 be used for them. 620 10. Security Considerations 622 10.1. URIs and IRIs 624 Metalink clients handle URIs and IRIs. See Section 7 of [RFC3986] 625 and Section 8 of [RFC3987] for security considerations related to 626 their handling and use. 628 10.2. Spoofing 630 There is potential for spoofing attacks where the attacker publishes 631 Metalinks with false information. In that case, this could deceive 632 unaware downloaders that they are downloading a malicious or 633 worthless file. Also, malicious publishers could attempt a 634 distributed denial of service attack by inserting unrelated URIs into 635 Metalinks. 637 10.3. Cryptographic Hashes 639 Currently, some of the digest values defined in Instance Digests in 640 HTTP [RFC3230] are considered insecure. These include the whole 641 Message Digest family of algorithms which are not suitable for 642 cryptographically strong verification. Malicious people could 643 provide files that appear to be identical to another file because of 644 a collision, i.e. the weak cryptographic hashes of the intended file 645 and a substituted malicious file could match. 647 If a Metalink contains whole file hashes as described in Section 6, 648 it SHOULD include "sha-256" which is SHA-256, as specified in 649 [FIPS-180-3], or stronger. It MAY also include other hashes. 651 10.4. Signing 653 Metalinks should include digital signatures, as described in 654 Section 5. 656 Digital signatures provide authentication, message integrity, and 657 non-repudiation with proof of origin. 659 11. Normative References 661 [FIPS-180-3] 662 National Institute of Standards and Technology (NIST), 663 "Secure Hash Standard (SHS)", FIPS PUB 180-3, 664 October 2008. 666 [ISO3166-1] 667 International Organization for Standardization, "ISO 3166- 668 1:2006. Codes for the representation of names of 669 countries and their subdivisions -- Part 1: Country 670 codes", November 2006. 672 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 673 Requirement Levels", BCP 14, RFC 2119, March 1997. 675 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 676 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 677 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 679 [RFC3230] Mogul, J. and A. Van Hoff, "Instance Digests in HTTP", 680 RFC 3230, January 2002. 682 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 683 Resource Identifier (URI): Generic Syntax", STD 66, 684 RFC 3986, January 2005. 686 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 687 Identifiers (IRIs)", RFC 3987, January 2005. 689 [draft-bryan-metalink] 690 Bryan, A., Ed., Tsujikawa, T., McNab, N., and P. Poeml, 691 "The Metalink Download Description Format", 692 draft-bryan-metalink-28 (work in progress), February 2010. 694 [draft-nottingham-http-link-header] 695 Nottingham, M., "Web Linking", 696 draft-nottingham-http-link-header-07 (work in progress), 697 January 2010. 699 Appendix A. Acknowledgements and Contributors 701 Thanks to the Metalink community, Mark Handley, Mark Nottingham, 702 Daniel Stenberg, Tatsuhiro Tsujikawa, Peter Poeml, Matt Domsch, Micah 703 Cowan, and David Morris. 705 Support for simultaneous download from multiple mirrors is based upon 706 work by Mark Handley and Javier Vela Diago, who also provided 707 validation of the benefits of this approach. 709 Appendix B. Comparisons to Similar Options 711 [[ to be removed by the RFC editor before publication as an RFC. ]] 713 This draft, compared to the Metalink/XML format 714 [draft-bryan-metalink] : 716 o (+) Reuses existing HTTP standards without much new besides a Link 717 Relation Type. It's more of a collection/coordinated feature set. 718 o (?) The existing standards don't seem to be widely implemented. 719 o (+) No XML dependency, except for Metalink/XML for partial file 720 cryptographic hashes. 721 o (+) Existing Metalink/XML clients can be easily converted to 722 support this as well. 723 o (+) Coordination of mirror servers is preferred, but not required. 724 Coordination may be difficult or impossible unless you are in 725 control of all servers on the mirror network. 726 o (-) Requires software or configuration changes to originating 727 server. 728 o (-?) Tied to HTTP, not as generic. FTP/P2P clients won't be 729 using it unless they also support HTTP, unlike Metalink/XML. 730 o (-) Requires server-side support. Metalink/XML can be created by 731 user (or server, but server component/changes not required). 732 o (-) Also, Metalink/XML files are easily mirrored on all servers. 733 Even if usage in that case is not as transparent, it still gives 734 access to users at all mirrors (FTP included) to all download 735 information with no changes needed to the server. 736 o (-) Not portable/archivable/emailable. Metalink/XML is used to 737 import/export transfer queues. Not as easy for search engines to 738 index? 739 o (-) Not as rich metadata. 740 o (-) Not able to add multiple files to a download queue or create 741 directory structure. 743 Appendix C. Document History 745 [[ to be removed by the RFC editor before publication as an RFC. ]] 747 Known issues concerning this draft: 748 o Some organizations have many mirrors. Should all be sent, or only 749 a certain number? All should be included in the Metalink/XML, if 750 used. 751 o Would it make more sense to use qvalue-style policies to describe 752 mirror priority, i.e. q=1.0 through q=0.0 ? 753 o Using Metalink/XML for partial file cryptographic hashes. That 754 adds XML dependency to apps for an important feature. Is there a 755 better method? 756 o Do we need an "official" MIME type for .torrent files or allow 757 "application/x-bittorrent"? 759 -15 : December 31, 2009. 760 o Update references and terminology. 762 -14 : December 31, 2009. 763 o Baseline file hash: SHA-256. 765 -13 : November 22, 2009. 766 o Metalink/XML for partial file cryptographic hashes. 768 -12 : November 11, 2009. 769 o Clarifications. 771 -11 : October 23, 2009. 772 o Mirror changes. 774 -10 : October 15, 2009. 775 o Mirror coordination changes. 777 -09 : October 12, 2009. 778 o Mirror location, coordination, and depth. 779 o Split HTTP Digest Algorithm Values Registration into 780 draft-bryan-http-digest-algorithm-values-update. 782 -08 : October 4, 2009. 783 o Clarifications. 785 -07 : September 29, 2009. 786 o Preferred mirror servers. 788 -06 : September 24, 2009. 790 o Add Mismatch Detection, Error Recovery, and Digest Algorithm 791 values. 792 o Remove Content-MD5 and Want-Digest. 794 -05 : September 19, 2009. 795 o ETags, preferably matching the Instance Digests. 797 -04 : September 17, 2009. 798 o Temporarily remove .torrent. 800 -03 : September 16, 2009. 801 o Mention HEAD request, negotiate mirrors if Want-Digest is used. 803 -02 : September 6, 2009. 804 o Content-MD5 for partial file cryptographic hashes. 806 -01 : September 1, 2009. 807 o Link Relation Type Registration: "duplicate" 809 -00 : August 24, 2009. 810 o Initial draft. 812 Authors' Addresses 814 Anthony Bryan 815 Pompano Beach, FL 816 USA 818 Email: anthonybryan@gmail.com 819 URI: http://www.metalinker.org 821 Neil McNab 823 Email: neil@nabber.org 824 URI: http://www.nabber.org 826 Henrik Nordstrom 828 Email: henrik@henriknordstrom.net 829 URI: http://www.henriknordstrom.net/ 830 Alan Ford 831 Roke Manor Research 832 Old Salisbury Lane 833 Romsey, Hampshire SO51 0ZN 834 UK 836 Phone: +44 1794 833 465 837 Email: alan.ford@roke.co.uk