idnits 2.17.1 draft-bryan-metalinkhttp-14.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 31, 2009) is 5230 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'FIPS-180-3' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 3230 (Obsoleted by RFC 9530) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Bryan 3 Internet-Draft N. McNab 4 Intended status: Standards Track H. Nordstrom 5 Expires: July 4, 2010 6 A. Ford 7 Roke Manor Research 8 December 31, 2009 10 Metalink/HTTP: Mirrors and Checksums in HTTP Headers 11 draft-bryan-metalinkhttp-14 13 Abstract 15 This document specifies Metalink/HTTP: Mirrors and Checksums in HTTP 16 Headers, a different way to get information that is usually contained 17 in the Metalink XML-based download description format. Metalink/HTTP 18 describes multiple download locations (mirrors), Peer-to-Peer, 19 checksums, digital signatures, and other information using existing 20 standards for HTTP headers. Clients can transparently use this 21 information to make file transfers more robust and reliable. 23 Status of this Memo 25 This Internet-Draft is submitted to IETF in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF), its areas, and its working groups. Note that 30 other groups may also distribute working documents as Internet- 31 Drafts. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt. 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html. 44 This Internet-Draft will expire on July 4, 2010. 46 Copyright Notice 48 Copyright (c) 2009 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.1. Operation Overview . . . . . . . . . . . . . . . . . . . . 4 65 1.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 4 66 1.3. Notational Conventions . . . . . . . . . . . . . . . . . . 5 67 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 5 68 3. Mirrors / Multiple Download Locations . . . . . . . . . . . . 6 69 3.1. Mirror Priority . . . . . . . . . . . . . . . . . . . . . 6 70 3.2. Mirror Geographical Location . . . . . . . . . . . . . . . 6 71 3.3. Coordinated Mirror Policies . . . . . . . . . . . . . . . 7 72 3.4. Mirror Depth . . . . . . . . . . . . . . . . . . . . . . . 7 73 4. Peer-to-Peer / Metainfo . . . . . . . . . . . . . . . . . . . 7 74 4.1. Metalink/XML Files . . . . . . . . . . . . . . . . . . . . 8 75 5. OpenPGP Signatures . . . . . . . . . . . . . . . . . . . . . . 8 76 6. Checksums of Whole Files . . . . . . . . . . . . . . . . . . . 8 77 7. Client / Server Multi-source Download Interaction . . . . . . 9 78 7.1. Error Prevention, Detection, and Correction . . . . . . . 11 79 7.1.1. Error Prevention (Early File Mismatch Detection) . . . 11 80 7.1.2. Error Correction . . . . . . . . . . . . . . . . . . . 12 81 8. Multi-server Performance . . . . . . . . . . . . . . . . . . . 13 82 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 83 10. Security Considerations . . . . . . . . . . . . . . . . . . . 14 84 10.1. URIs and IRIs . . . . . . . . . . . . . . . . . . . . . . 14 85 10.2. Spoofing . . . . . . . . . . . . . . . . . . . . . . . . . 14 86 10.3. Cryptographic Hashes . . . . . . . . . . . . . . . . . . . 14 87 10.4. Signing . . . . . . . . . . . . . . . . . . . . . . . . . 15 88 11. Normative References . . . . . . . . . . . . . . . . . . . . . 15 89 Appendix A. Acknowledgements and Contributors . . . . . . . . . . 16 90 Appendix B. Comparisons to Similar Options . . . . . . . . . . . 16 91 Appendix C. Document History . . . . . . . . . . . . . . . . . . 17 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 94 1. Introduction 96 Metalink/HTTP is an alternative representation of Metalink 97 information, which is usually presented as an XML-based document 98 format [draft-bryan-metalink]. Metalink/HTTP attempts to provide as 99 much functionality as the Metalink/XML format by using existing 100 standards such as Web Linking [draft-nottingham-http-link-header], 101 Instance Digests in HTTP [RFC3230], and ETags. Metalink/HTTP is used 102 to list information about a file to be downloaded. This can include 103 lists of multiple URIs (mirrors), Peer-to-Peer information, 104 checksums, and digital signatures. 106 Identical copies of a file are frequently accessible in multiple 107 locations on the Internet over a variety of protocols (such as FTP, 108 HTTP, and Peer-to-Peer). In some cases, users are shown a list of 109 these multiple download locations (mirrors) and must manually select 110 a single one on the basis of geographical location, priority, or 111 bandwidth. This distributes the load across multiple servers, and 112 should also increase throughput and resilience. At times, however, 113 individual servers can be slow, outdated, or unreachable, but this 114 can not be determined until the download has been initiated. Users 115 will rarely have sufficient information to choose the most 116 appropriate server, and will often choose the first in a list which 117 may not be optimal for their needs, and will lead to a particular 118 server getting a disproportionate share of load. The use of 119 suboptimal mirrors can lead to the user canceling and restarting the 120 download to try to manually find a better source. During downloads, 121 errors in transmission can corrupt the file. There are no easy ways 122 to repair these files. For large downloads this can be extremely 123 troublesome. Any of the number of problems that can occur during a 124 download lead to frustration on the part of users. 126 Some popular sites automate the process of selecting mirrors using 127 DNS load balancing, both to approximately balance load between 128 servers, and to direct clients to nearby servers with the hope that 129 this improves throughput. Indeed, DNS load balancing can balance 130 long-term server load fairly effectively, but it is less effective at 131 delivering the best throughput to users when the bottleneck is not 132 the server but the network. 134 This document describes a mechanism by which the benefit of mirrors 135 can be automatically and more effectively realized. All the 136 information about a download, including mirrors, checksums, digital 137 signatures, and more can be transferred in coordinated HTTP Headers. 138 This Metalink transfers the knowledge of the download server (and 139 mirror database) to the client. Clients can fallback to other 140 mirrors if the current one has an issue. With this knowledge, the 141 client is enabled to work its way to a successful download even under 142 adverse circumstances. All this is done transparently to the user 143 and the download is much more reliable and efficient. In contrast, a 144 traditional HTTP redirect to a mirror conveys only extremely minimal 145 information - one link to one server, and there is no provision in 146 the HTTP protocol to handle failures. Furthermore, in order to 147 provide better load distribution across servers and potentially 148 faster downloads to users, Metalink/HTTP facilitates multi-source 149 downloads, where portions of a file are downloaded from multiple 150 mirrors (and optionally, Peer-to-Peer) simultaneously. 152 [[ Discussion of this draft should take place on IETF HTTP WG mailing 153 list at ietf-http-wg@w3.org or the Metalink discussion mailing list 154 located at metalink-discussion@googlegroups.com. To join the list, 155 visit http://groups.google.com/group/metalink-discussion . ]] 157 1.1. Operation Overview 159 Detailed discussion of Metalink operation is covered in Section 2; 160 this section will present a very brief, high-level overview of how 161 Metalink achieves its goals. 163 Upon connection to a Metalink/HTTP server, a client will receive 164 information about other sources of the same resource and a checksum 165 of the whole resource. The client will then be able to request 166 chunks of the file from the various sources, scheduling appropriately 167 in order to maximise the download rate. 169 1.2. Examples 171 A brief Metalink server response with ETag, mirrors, .metalink, 172 OpenPGP signature, and whole file checksum: 174 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 175 Link: ; rel="duplicate" 176 Link: ; rel="duplicate" 177 Link: ; rel="describedby"; 178 type="application/x-bittorrent" 179 Link: ; rel="describedby"; 180 type="application/metalink4+xml" 181 Link: ; rel="describedby"; 182 type="application/pgp-signature" 183 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 184 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 186 1.3. Notational Conventions 188 This specification describes conformance of Metalink/HTTP. 190 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 191 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 192 document are to be interpreted as described in BCP 14, [RFC2119], as 193 scoped to those conformance targets. 195 2. Requirements 197 In this context, "Metalink" refers to Metalink/HTTP which consists of 198 mirrors and checksums in HTTP Headers as described in this document. 199 "Metalink/XML" refers to the XML format described in 200 [draft-bryan-metalink]. 202 Metalink resources include a Link header 203 [draft-nottingham-http-link-header] to present a list of mirrors in 204 the response to a client request for the resource. The checksum of a 205 resource must be included via Instance Digests in HTTP [RFC3230]. 207 Metalink servers are HTTP servers with one or more Metalink 208 resources. Mirror and checksum information provided by the 209 originating Metalink server MUST be considered authoritative. 210 Metalink servers and their associated mirror servers SHOULD all share 211 the same ETag policy (ETag Synchronization), i.e. based on the file 212 contents (checksum) and not server-unique filesystem metadata. The 213 emitted ETag MAY be implemented the same as the Instance Digest for 214 simplicity. Metalink servers MAY offer Metalink/XML documents that 215 contain partial file checksums and other information. 217 Mirror servers are typically FTP or HTTP servers that "mirror" 218 another server. That is, they provide identical copies of (at least 219 some) files that are also on the mirrored server. Mirror servers MAY 220 be Metalink servers. Mirror servers MUST support serving partial 221 content. HTTP mirror servers SHOULD share the same ETag policy as 222 the originating Metalink server. HTTP Mirror servers SHOULD support 223 Instance Digests in HTTP [RFC3230]. 225 Metalink clients use the mirrors provided by a Metalink server with 226 Link header [draft-nottingham-http-link-header]. Metalink clients 227 MUST support HTTP and MAY support FTP, BitTorrent, or other download 228 methods. Metalink clients MUST switch downloads from one mirror to 229 another if the mirror becomes unreachable. Metalink clients SHOULD 230 support multi-source, or parallel, downloads, where portions of a 231 file are downloaded from multiple mirrors simultaneously (and 232 optionally, from Peer-to-Peer sources). Metalink clients MUST 233 support Instance Digests in HTTP [RFC3230] by requesting and 234 verifying checksums. Metalink clients MAY make use of digital 235 signatures if they are offered. 237 3. Mirrors / Multiple Download Locations 239 Mirrors are specified with the Link header 240 [draft-nottingham-http-link-header] and a relation type of 241 "duplicate" as defined in Section 9. 243 A brief Metalink server response with two mirrors only: 245 Link: ; rel="duplicate"; 246 pri=1; pref=1 247 Link: ; rel="duplicate"; 248 pri=2; geo="gb"; depth=1 250 [[Some organizations have many mirrors. Only send a few mirrors, or 251 only use the Link header if Want-Digest is used?]] 253 It is up to the server to choose how many Link headers to send. Such 254 a decision could be a hard-coded limit, a random selection, based on 255 file size, or based on server load. 257 3.1. Mirror Priority 259 Mirror servers are listed in order of priority (from most preferred 260 to least) or have a "pri" value, where mirrors with lower values are 261 used first. 263 This is purely an expression of the server's preferences; it is up to 264 the client what it does with this information, particularly with 265 reference to how many servers to use at any one time. A client MUST 266 respect the server's priority ordering, however. 268 [[Would it make more sense to use qvalue-style policies here, i.e. 269 q=1.0 through q=0.0 ?]] 271 3.2. Mirror Geographical Location 273 Mirror servers MAY have a "geo" value, which is a [ISO3166-1] alpha-2 274 two letter country code for the geographical location of the physical 275 server the URI is used to access. A client may use this information 276 to select a mirror, or set of mirrors, that are geographically near 277 (if the client has access to such information), with the aim of 278 reducing network load at inter-country bottlenecks. 280 3.3. Coordinated Mirror Policies 282 There are two types of mirror servers: preferred and normal. 283 Preferred mirror servers are HTTP mirror servers that MUST share the 284 same ETag policy as the originating Metalink server. Optimally, they 285 will do both. Preferred mirrors make it possible to detect early on, 286 before data is transferred, if the file requested matches the desired 287 file. Preferred HTTP mirror servers have a "pref" value of 1. By 288 default, if unspecified then mirrors are considered "normal" and do 289 not share the same ETag policy. FTP mirrors, as they do not emit 290 ETags, MUST always be considered "normal". 292 HTTP Mirror servers SHOULD support Instance Digests in HTTP 293 [RFC3230]. 295 [[Suggestion: In order for clients to identify servers that have 296 coordinated ETag policies, the ETag MUST begin with "Metalink:", e.g. 298 ETag: "Metalink:SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=" 300 ]] 302 3.4. Mirror Depth 304 Some mirrors may mirror single files, whole directories, or multiple 305 directories. 307 Mirror servers MAY have a "depth" value, where "depth=0" is the 308 default. A value of 0 means ONLY that file is mirrored. A value of 309 1 means that file and all other files and subdirectories in the 310 directory are mirrored. A value of 2 means the directory above, and 311 all files and subdirectories, are mirrored. 313 A mirror with a depth value of 4: 315 Link: ; 316 rel="duplicate"; pri=1; pref=1; depth=4 318 Is the above example, 4 directories up are mirrored, from /dir2/ on 319 down. 321 4. Peer-to-Peer / Metainfo 323 Metainfo files, which describe ways to download a file over Peer-to- 324 Peer networks or otherwise, are specified with the Link header 325 [draft-nottingham-http-link-header] and a relation type of 326 "describedby" and a type parameter that indicates the MIME type of 327 the metadata available at the URI. 329 A brief Metalink server response with .torrent and .metalink: 331 Link: ; rel="describedby"; 332 type="application/x-bittorrent" 333 Link: ; rel="describedby"; 334 type="application/metalink4+xml" 336 Metalink clients MAY support the use of metainfo files for 337 downloading files. 339 4.1. Metalink/XML Files 341 Full Metalink/XML files for a given resource can be specified as 342 shown in Section 4. This is particularly useful for providing 343 metadata such as checksums of chunks, allowing a client to recover 344 from partial errors (see Section 7.1.2). 346 5. OpenPGP Signatures 348 OpenPGP signatures are specified with the Link header 349 [draft-nottingham-http-link-header] and a relation type of 350 "describedby" and a type parameter of "application/pgp-signature". 352 A brief Metalink server response with OpenPGP signature only: 354 Link: ; rel="describedby"; 355 type="application/pgp-signature" 357 Metalink clients MAY support the use of OpenPGP signatures. 359 6. Checksums of Whole Files 361 Metalink servers MUST provide Instance Digests in HTTP [RFC3230] for 362 files they describe with mirrors. Mirror servers SHOULD as well. 364 A brief Metalink server response with checksum: 366 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 367 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 369 7. Client / Server Multi-source Download Interaction 371 Metalink clients begin a download with a standard HTTP [RFC2616] GET 372 request to the Metalink server. A Range limit is optional, not 373 required. Alternatively, Metalink clients can begin with a HEAD 374 request to the Metalink server to discover mirrors via Link headers. 375 After that, the client follows with a GET request to the desired 376 mirrors. 378 GET /distribution/example.ext HTTP/1.1 379 Host: www.example.com 381 The Metalink server responds with the data and these headers: 383 HTTP/1.1 200 OK 384 Accept-Ranges: bytes 385 Content-Length: 14867603 386 Content-Type: application/x-cd-image 387 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 388 Link: ; rel="duplicate" pref=1 389 Link: ; rel="duplicate" 390 Link: ; rel="describedby"; 391 type="application/x-bittorrent" 392 Link: ; rel="describedby"; 393 type="application/metalink4+xml" 394 Link: ; rel="describedby"; 395 type="application/pgp-signature" 396 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 397 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 399 From the Metalink server response the client learns some or all of 400 the following metadata about the requested object, in addition to 401 also starting to receive the object: 403 o Object size. 404 o ETag. 405 o Mirror profile link, which may describe the mirror's priority, 406 whether it shares the ETag policy of the originating Metalink 407 server, geographical location, and mirror depth. 408 o Peer-to-peer information. 409 o Metalink/XML, which can include partial file checksums to repair a 410 file. 411 o Digital signature. 412 o Instance Digest, which is the whole file checksum. 414 (Alternatively, the client could have requested a HEAD only, and then 415 skipped to making the following decisions on every available mirror 416 server found via the Link headers) 418 If the object is large and gets delivered slower than expected then 419 the Metalink client starts a number of parallel ranged downloads (one 420 per selected mirror server other than the first) using mirrors 421 provided by the Link header with "duplicate" relation type, using the 422 location of the original GET request in the "Referer" header field. 423 The size and number of ranges requested from each server is for the 424 client to decide, based upon the performance observed from each 425 server. Further discussion of performance considerations is 426 presented in Section 8. 428 If no range limit was given in the original request then work from 429 the tail of the object (the first request is still running and will 430 eventually catch up), otherwise continue after the range requested in 431 the first request. If no Range was provided, the original connection 432 must be terminated once all parts of the resource have been 433 retrieved. It is recommended that a HEAD request is undertaken 434 first, so that the client can find out if there are any Link headers, 435 and then Range-based requests are undertaken to the mirror servers as 436 well as on the original connection. 438 Preferred mirrors have coordinated ETags, as described in 439 Section 3.3, and If-Match conditions based on the ETag SHOULD be used 440 to quickly detect out-of-date mirrors by using the ETag from the 441 Metalink server response. If no indication of ETag syncronisation/ 442 knowledge is given then If-Match should not be used, and optimally 443 there will be an Instance Digest in the mirror response which we can 444 use to detect a mismatch early, and if not then a mismatch won't be 445 detected until the completed object is verified. Early file mismatch 446 detection is described in detail in Section 7.1.1. 448 One of the client requests to a mirror server: 450 GET /example.ext HTTP/1.1 451 Host: www2.example.com 452 Range: bytes=7433802- 453 If-Match: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 454 Referer: http://www.example.com/distribution/example.ext 456 The mirror servers respond with a 206 Partial Content HTTP status 457 code and appropriate "Content-Length" and "Content Range" header 458 fields. The mirror server response, with data, to the above request: 460 HTTP/1.1 206 Partial Content 461 Accept-Ranges: bytes 462 Content-Length: 7433801 463 Content-Range: bytes 7433802-14867602/14867603 464 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 465 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 466 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 468 If the first request was not Range limited then abort it by closing 469 the connection when it catches up with the other parallel downloads 470 of the same object. 472 Downloads from mirrors that do not have the same file size as the 473 Metalink server MUST be aborted. 475 Once the download has completed, the Metalink client MUST verify the 476 checksum of the file. 478 7.1. Error Prevention, Detection, and Correction 480 Error prevention, or early file mismatch detection, is possible 481 before file transfers with the use of file sizes, ETags, and Instance 482 Digests. Error dectection requires Instance Digests, or checksums, 483 to determine after transfers if there has been an error. Error 484 correction, or download repair, is possible with partial file 485 checksums. 487 7.1.1. Error Prevention (Early File Mismatch Detection) 489 In HTTP terms, the requirement is that merging of ranges from 490 multiple responses must be verified with a strong validator, which in 491 this context is the same as either Instance Digest or a strong ETag. 492 In most cases it is sufficient that the Metalink server provides 493 mirrors and Instance Digest information, but operation will be more 494 robust and efficient if the mirror servers do implement a 495 synchronized ETag as well. In fact, the emitted ETag may be 496 implemented the same as the Instance Digest for simplicity, but there 497 is no need to specify how the ETag is generated, just that it needs 498 to be shared among the mirror servers. If the mirror server provides 499 neither synchronized ETag or Instance Digest, then early detection of 500 mismatches is not possible unless file length also differs. Finally, 501 the error is still detectable, after the download has completed, when 502 the merged response is verified. 504 ETag can not be used for verifying the integrity of the received 505 content. But it is a guarantee issued by the Metalink server that 506 the content is correct for that ETag. And if the ETag given by the 507 mirror server matches the ETag given by the master server, then we 508 have a chain of trust where the master server authorizes these 509 responses as valid for that object. 511 This guarantees that a mismatch will be detected by using only the 512 synchronized ETag from a master server and mirror server, even 513 alerted by the mirror servers themselves by responding with an error, 514 preventing accidental merges of ranges from different versions of 515 files with the same name. This even includes many malicious attacks 516 where the data on the mirror has been replaced by some other file, 517 but not all. 519 Synchronized ETag can not strictly protect against malicious attacks 520 or server or network errors replacing content, but neither can 521 Instance Digest on the mirror servers as the attacker most certainly 522 can make the server seemingly respond with the expected Instance 523 Digest even if the file contents have been modified, just as he can 524 with ETag, and the same for various system failures also causing bad 525 data to be returned. The Metalink client has to rely on the Instance 526 Digest returned by the Metalink master server in the first response 527 for the verification of the downloaded object as a whole. 529 If the mirror servers do return an Instance Digest, then that is a 530 bonus, just as having them return the right set of Link headers is. 531 The set of trusted mirrors doing that can be substituted as master 532 servers accepting the initial request if one likes. 534 The benefit of having slave mirror servers (those not trusted as 535 masters) return Instance Digest is that the client then can detect 536 mismatches early even if ETag is not used. Both ETag and slave 537 mirror Instance Digest do provide value, but just one is sufficient 538 for early detection of mismatches. If none is provided then early 539 detection of mismatches is not possible unless the file length also 540 differs, but the error is still detected when the merged response is 541 verified. 543 7.1.2. Error Correction 545 Partial file checksums can be used to detect errors during the 546 download. Metalink servers are not required to offer partial file 547 checksums, but they are encouraged to do so. 549 If the object checksum does not match the Instance Digest then fetch 550 the Metalink/XML as specified in Section 4.1, where partial file 551 checksums may be found, allowing detection of which server returned 552 incorrect data. If the Instance Digest computation does not match 553 then the client needs to fetch the partial file checksums, if 554 available, and from there figure out what of the downloaded data can 555 be recovered and what needs to be fetched again. If no partial 556 checksums are available, then the client MUST fetch the complete 557 object from other mirrors. 559 8. Multi-server Performance 561 When opting to download simultaneously from multiple mirrors, there 562 are a number of factors (both within and outside the influence of the 563 client software) that are relevant to the performance achieved: 565 o The number of servers used simultaneously. 566 o The ability to pipeline sufficient or sufficiently large range 567 requests to each server so as to avoid connections going idle. 568 o The ability to pipeline sufficiently few or sufficiently small 569 range requests to servers so that all the servers finish their 570 final chunks simultaneously. 571 o The ability to switch between mirrors dynamically so as to use the 572 fastest mirrors at any moment in time 574 Obviously we do not want to use too many simultaneous connections, or 575 other traffic sharing a bottleneck link will be starved. But at the 576 same time, good performance requires that the client can 577 simultaneously download from at least one fast mirror while exploring 578 whether any other mirror is faster. Based on laboratory experiments, 579 we suggest a good default number of simultaneous connections is 580 probably four, with three of these being used for the best three 581 mirrors found so far, and one being used to evaluate whether any 582 other mirror might offer better performance. 584 The size of chunks chosen by the client should be sufficiently large 585 that the chunk request headers and reponse headers represent neglible 586 overhead, and sufficiently large that they can be pipelined 587 effectively without needing a very high rate of chunk requests. At 588 the same time, the amount of time wasted waiting for the last chunk 589 to download from the last server after all the other servers have 590 finished should be minimized. Thus we currently recommend that a 591 chunk size of at least 10KBytes should be used. If the file being 592 transfered is very large, or the download speed very high, this can 593 be increased to perhaps 1MByte. As network bandwidths increase, we 594 expect these numbers to increase appropriately, so that the time to 595 transfer a chunk remains significantly larger than the latency of 596 requesting a chunk from a server. 598 9. IANA Considerations 600 Accordingly, IANA has made the following registration to the Link 601 Relation Type registry. 603 o Relation Name: duplicate 605 o Description: Refers to a resource whose available representations 606 are byte-for-byte identical with the corresponding representations of 607 the context IRI. 609 o Reference: This specification. 611 o Notes: This relation is for static resources. That is, an HTTP GET 612 request on any duplicate will return the same representation. It 613 does not make sense for dynamic or POSTable resources and should not 614 be used for them. 616 10. Security Considerations 618 10.1. URIs and IRIs 620 Metalink clients handle URIs and IRIs. See Section 7 of [RFC3986] 621 and Section 8 of [RFC3987] for security considerations related to 622 their handling and use. 624 10.2. Spoofing 626 There is potential for spoofing attacks where the attacker publishes 627 Metalinks with false information. In that case, this could deceive 628 unaware downloaders that they are downloading a malicious or 629 worthless file. Also, malicious publishers could attempt a 630 distributed denial of service attack by inserting unrelated URIs into 631 Metalinks. 633 10.3. Cryptographic Hashes 635 Currently, some of the digest values defined in Instance Digests in 636 HTTP [RFC3230] are considered insecure. These include the whole 637 Message Digest family of algorithms which are not suitable for 638 cryptographically strong verification. Malicious people could 639 provide files that appear to be identical to another file because of 640 a collision, i.e. the weak cryptographic hashes of the intended file 641 and a substituted malicious file could match. 643 If a Metalink contains whole file hashes as described in Section 6, 644 it SHOULD include "sha-256" which is SHA-256, as specified in 645 [FIPS-180-3], or stronger. It MAY also include other hashes. 647 10.4. Signing 649 Metalinks should include digital signatures, as described in 650 Section 5. 652 Digital signatures provide authentication, message integrity, and 653 non-repudiation with proof of origin. 655 11. Normative References 657 [FIPS-180-3] 658 National Institute of Standards and Technology (NIST), 659 "Secure Hash Standard (SHS)", FIPS PUB 180-3, 660 October 2008. 662 [ISO3166-1] 663 International Organization for Standardization, "ISO 3166- 664 1:2006. Codes for the representation of names of 665 countries and their subdivisions -- Part 1: Country 666 codes", November 2006. 668 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 669 Requirement Levels", BCP 14, RFC 2119, March 1997. 671 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 672 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 673 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 675 [RFC3230] Mogul, J. and A. Van Hoff, "Instance Digests in HTTP", 676 RFC 3230, January 2002. 678 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 679 Resource Identifier (URI): Generic Syntax", STD 66, 680 RFC 3986, January 2005. 682 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 683 Identifiers (IRIs)", RFC 3987, January 2005. 685 [draft-bryan-metalink] 686 Bryan, A., Ed., Tsujikawa, T., McNab, N., and P. Poeml, 687 "The Metalink Download Description Format", 688 draft-bryan-metalink-16 (work in progress), August 2009. 690 [draft-nottingham-http-link-header] 691 Nottingham, M., "Web Linking", 692 draft-nottingham-http-link-header-06 (work in progress), 693 July 2009. 695 Appendix A. Acknowledgements and Contributors 697 Thanks to the Metalink community, Mark Handley, Mark Nottingham, 698 Daniel Stenberg, Tatsuhiro Tsujikawa, Peter Poeml, Matt Domsch, Micah 699 Cowan, and David Morris. 701 Support for simultaneous download from multiple mirrors is based upon 702 work by Mark Handley and Javier Vela Diago, who also provided 703 validation of the benefits of this approach. 705 Appendix B. Comparisons to Similar Options 707 [[ to be removed by the RFC editor before publication as an RFC. ]] 709 This draft, compared to the Metalink/XML format 710 [draft-bryan-metalink] : 712 o (+) Reuses existing HTTP standards without much new besides a Link 713 Relation Type. It's more of a collection/coordinated feature set. 714 o (?) The existing standards don't seem to be widely implemented. 715 o (+) No XML dependency, except for Metalink/XML for partial file 716 checksums. 717 o (+) Existing Metalink/XML clients can be easily converted to 718 support this as well. 719 o (+) Coordination of mirror servers is preferred, but not required. 720 Coordination may be difficult or impossible unless you are in 721 control of all servers on the mirror network. 722 o (-) Requires software or configuration changes to originating 723 server. 724 o (-?) Tied to HTTP, not as generic. FTP/P2P clients won't be 725 using it unless they also support HTTP, unlike Metalink/XML. 726 o (-) Requires server-side support. Metalink/XML can be created by 727 user (or server, but server component/changes not required). 728 o (-) Also, Metalink/XML files are easily mirrored on all servers. 729 Even if usage in that case is not as transparent, it still gives 730 access to users at all mirrors (FTP included) to all download 731 information with no changes needed to the server. 732 o (-) Not portable/archivable/emailable. Metalink/XML is used to 733 import/export transfer queues. Not as easy for search engines to 734 index? 735 o (-) Not as rich metadata. 736 o (-) Not able to add multiple files to a download queue or create 737 directory structure. 739 Appendix C. Document History 741 [[ to be removed by the RFC editor before publication as an RFC. ]] 743 Known issues concerning this draft: 744 o Use of Link header to describe Mirrors. Only send a few mirrors 745 with Link header, or only send them if Want-Digest is used? Some 746 organizations have many mirrors. 747 o Would it make more sense to use qvalue-style policies to describe 748 mirror priority, i.e. q=1.0 through q=0.0 ? 749 o Using Metalink/XML for partial file checksums. That adds XML 750 dependency to apps for an important feature. Is there a better 751 method? 752 o Do we need an "official" MIME type for .torrent files or allow 753 "application/x-bittorrent"? 755 -14 : December 31, 2009. 756 o Baseline file hash: SHA-256. 758 -13 : November 22, 2009. 759 o Metalink/XML for partial file checksums. 761 -12 : November 11, 2009. 762 o Clarifications. 764 -11 : October 23, 2009. 765 o Mirror changes. 767 -10 : October 15, 2009. 768 o Mirror coordination changes. 770 -09 : October 12, 2009. 771 o Mirror location, coordination, and depth. 772 o Split HTTP Digest Algorithm Values Registration into 773 draft-bryan-http-digest-algorithm-values-update. 775 -08 : October 4, 2009. 776 o Clarifications. 778 -07 : September 29, 2009. 779 o Preferred mirror servers. 781 -06 : September 24, 2009. 782 o Add Mismatch Detection, Error Recovery, and Digest Algorithm 783 values. 784 o Remove Content-MD5 and Want-Digest. 786 -05 : September 19, 2009. 788 o ETags, preferably matching the Instance Digests. 790 -04 : September 17, 2009. 791 o Temporarily remove .torrent. 793 -03 : September 16, 2009. 794 o Mention HEAD request, negotiate mirrors if Want-Digest is used. 796 -02 : September 6, 2009. 797 o Content-MD5 for partial file checksums. 799 -01 : September 1, 2009. 800 o Link Relation Type Registration: "duplicate" 802 -00 : August 24, 2009. 803 o Initial draft. 805 Authors' Addresses 807 Anthony Bryan 808 Pompano Beach, FL 809 USA 811 Email: anthonybryan@gmail.com 812 URI: http://www.metalinker.org 814 Neil McNab 816 Email: neil@nabber.org 817 URI: http://www.nabber.org 819 Henrik Nordstrom 821 Email: henrik@henriknordstrom.net 822 URI: http://www.henriknordstrom.net/ 823 Alan Ford 824 Roke Manor Research 825 Old Salisbury Lane 826 Romsey, Hampshire SO51 0ZN 827 UK 829 Phone: +44 1794 833 465 830 Email: alan.ford@roke.co.uk