idnits 2.17.1 draft-bryan-metalinkhttp-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 23, 2009) is 5289 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Downref: Normative reference to an Informational RFC: RFC 3174 ** Obsolete normative reference: RFC 3230 (Obsoleted by RFC 9530) Summary: 4 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Bryan 3 Internet-Draft N. McNab 4 Intended status: Standards Track H. Nordstrom 5 Expires: April 26, 2010 6 A. Ford 7 Roke Manor Research 8 October 23, 2009 10 Metalink/HTTP: Mirrors and Checksums in HTTP Headers 11 draft-bryan-metalinkhttp-11 13 Status of this Memo 15 This Internet-Draft is submitted to IETF in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on April 26, 2010. 36 Copyright Notice 38 Copyright (c) 2009 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents in effect on the date of 43 publication of this document (http://trustee.ietf.org/license-info). 44 Please review these documents carefully, as they describe your rights 45 and restrictions with respect to this document. 47 Abstract 49 This document specifies Metalink/HTTP: Mirrors and Checksums in HTTP 50 Headers, an alternative to the Metalink XML-based download 51 description format. Metalink/HTTP describes multiple download 52 locations (mirrors), Peer-to-Peer, checksums, digital signatures, and 53 other information using existing standards for HTTP headers. Clients 54 can transparently use this information to make file transfers more 55 robust and reliable. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Operation Overview . . . . . . . . . . . . . . . . . . . . 4 61 1.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 4 62 1.3. Notational Conventions . . . . . . . . . . . . . . . . . . 4 63 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 5 64 3. Mirrors / Multiple Download Locations . . . . . . . . . . . . 5 65 3.1. Mirror Priority . . . . . . . . . . . . . . . . . . . . . 6 66 3.2. Mirror Geographical Location . . . . . . . . . . . . . . . 6 67 3.3. Coordinated Mirror Policies . . . . . . . . . . . . . . . 6 68 3.4. Mirror Depth . . . . . . . . . . . . . . . . . . . . . . . 7 69 4. Peer-to-Peer / Metainfo . . . . . . . . . . . . . . . . . . . 7 70 4.1. Metalink/XML Files . . . . . . . . . . . . . . . . . . . . 8 71 5. OpenPGP Signatures . . . . . . . . . . . . . . . . . . . . . . 8 72 6. Checksums of Whole Files . . . . . . . . . . . . . . . . . . . 8 73 7. Client / Server Multi-source Download Interaction . . . . . . 8 74 7.1. Error Prevention, Detection, and Correction . . . . . . . 11 75 7.1.1. Error Prevention (Early File Mismatch Detection) . . . 11 76 7.1.2. Error Correction . . . . . . . . . . . . . . . . . . . 12 77 8. Multi-server Performance . . . . . . . . . . . . . . . . . . . 12 78 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 79 10. Security Considerations . . . . . . . . . . . . . . . . . . . 13 80 10.1. URIs and IRIs . . . . . . . . . . . . . . . . . . . . . . 14 81 10.2. Spoofing . . . . . . . . . . . . . . . . . . . . . . . . . 14 82 10.3. Cryptographic Hashes . . . . . . . . . . . . . . . . . . . 14 83 10.4. Signing . . . . . . . . . . . . . . . . . . . . . . . . . 14 84 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 85 11.1. Normative References . . . . . . . . . . . . . . . . . . . 14 86 11.2. Informative References . . . . . . . . . . . . . . . . . . 15 87 Appendix A. Acknowledgements and Contributors . . . . . . . . . . 15 88 Appendix B. Comparisons to Similar Options . . . . . . . . . . . 15 89 Appendix C. Document History . . . . . . . . . . . . . . . . . . 16 90 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17 92 1. Introduction 94 Metalink/HTTP is an alternative representation of Metalink 95 information, which is usually presented as an XML-based document 96 format [draft-bryan-metalink]. Metalink/HTTP attempts to provide as 97 much functionality as the Metalink/XML format by using existing 98 standards such as Web Linking [draft-nottingham-http-link-header], 99 Instance Digests in HTTP [RFC3230], and ETags. Metalink/HTTP is used 100 to list information about a file to be downloaded. This can include 101 lists of multiple URIs (mirrors), Peer-to-Peer information, 102 checksums, and digital signatures. 104 Identical copies of a file are frequently accessible in multiple 105 locations on the Internet over a variety of protocols (such as FTP, 106 HTTP, and Peer-to-Peer). In some cases, users are shown a list of 107 these multiple download locations (mirrors) and must manually select 108 a single one on the basis of geographical location, priority, or 109 bandwidth. This distributes the load across multiple servers, and 110 should also increase throughput and resilience. At times, however, 111 individual servers can be slow, outdated, or unreachable, but this 112 can not be determined until the download has been initiated. Users 113 will rarely have sufficient information to choose the most 114 appropriate server, and will often choose the first in a list which 115 may not be optimal for their needs, and will lead to a particular 116 server getting a disproportionate share of load. The use of 117 suboptimal mirrors can lead to the user canceling and restarting the 118 download to try to manually find a better source. During downloads, 119 errors in transmission can corrupt the file. There are no easy ways 120 to repair these files. For large downloads this can be extremely 121 troublesome. Any of the number of problems that can occur during a 122 download lead to frustration on the part of users. 124 Some popular sites automate the process of selecting mirrors using 125 DNS load balancing, both to approximately balance load between 126 servers, and to direct clients to nearby servers with the hope that 127 this improves throughput. Indeed, DNS load balancing can balance 128 long-term server load fairly effectively, but it is less effective at 129 delivering the best throughput to users when the bottleneck is not 130 the server but the network. 132 This document describes a mechanism by which the benefit of mirrors 133 can be automatically and more effectively realized. All the 134 information about a download, including mirrors, checksums, digital 135 signatures, and more can be transferred in coordinated HTTP Headers. 136 This Metalink transfers the knowledge of the download server (and 137 mirror database) to the client. Clients can fallback to other 138 mirrors if the current one has an issue. With this knowledge, the 139 client is enabled to work its way to a successful download even under 140 adverse circumstances. All this is done transparently to the user 141 and the download is much more reliable and efficient. In contrast, a 142 traditional HTTP redirect to a mirror conveys only extremely minimal 143 information - one link to one server, and there is no provision in 144 the HTTP protocol to handle failures. Furthermore, in order to 145 provide better load distribution across servers and potentially 146 faster downloads to users, Metalink/HTTP facilitates multi-source 147 downloads, where portions of a file are downloaded from multiple 148 mirrors (and optionally, Peer-to-Peer) simultaneously. 150 [[ Discussion of this draft should take place on IETF HTTP WG mailing 151 list at ietf-http-wg@w3.org or the Metalink discussion mailing list 152 located at metalink-discussion@googlegroups.com. To join the list, 153 visit http://groups.google.com/group/metalink-discussion . ]] 155 1.1. Operation Overview 157 Detailed discussion of Metalink operation is covered in Section 2; 158 this section will present a very brief, high-level overview of how 159 Metalink achieves its goals. 161 Upon connection to a Metalink/HTTP server, a client will receive 162 information about other sources of the same resource and a checksum 163 of the whole resource. The client will then be able to request 164 chunks of the file from the various sources, scheduling appropriately 165 in order to maximise the download rate. 167 1.2. Examples 169 A brief Metalink server response with ETag, mirrors, .metalink, 170 OpenPGP signature, and whole file checksum: 172 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 173 Link: ; rel="duplicate" 174 Link: ; rel="duplicate" 175 Link: ; rel="describedby"; 176 type="application/x-bittorrent" 177 Link: ; rel="describedby"; 178 type="application/metalink4+xml" 179 Link: ; rel="describedby"; 180 type="application/pgp-signature" 181 Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5= 183 1.3. Notational Conventions 185 This specification describes conformance of Metalink/HTTP. 187 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 188 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 189 document are to be interpreted as described in BCP 14, [RFC2119], as 190 scoped to those conformance targets. 192 2. Requirements 194 In this context, "Metalink" refers to Metalink/HTTP which consists of 195 mirrors and checksums in HTTP Headers as described in this document. 196 "Metalink/XML" refers to the XML format described in 197 [draft-bryan-metalink]. 199 Metalink servers are HTTP servers that use the Link header 200 [draft-nottingham-http-link-header] to present a list of mirrors of a 201 resource to a client. They MUST provide checksums of files via 202 Instance Digests in HTTP [RFC3230], whether requested or not. Mirror 203 and checksum information provided by the originating Metalink server 204 MUST be considered authoritative. Metalink servers and their 205 associated mirror servers SHOULD all share the same ETag policy (ETag 206 Synchronization), i.e. based on the file contents (checksum) and not 207 server-unique filesystem metadata. The emitted ETag MAY be 208 implemented the same as the Instance Digest for simplicity. 210 Mirror servers are typically FTP or HTTP servers that "mirror" 211 another server. That is, they provide identical copies of (at least 212 some) files that are also on the mirrored server. Mirror servers MAY 213 be Metalink servers. Mirror servers MUST support serving partial 214 content. HTTP mirror servers SHOULD share the same ETag policy as 215 the originating Metalink server. HTTP Mirror servers SHOULD support 216 Instance Digests in HTTP [RFC3230]. 218 Metalink clients use the mirrors provided by a Metalink server with 219 Link header [draft-nottingham-http-link-header]. Metalink clients 220 MUST support HTTP and MAY support FTP, BitTorrent, or other download 221 methods. Metalink clients MUST switch downloads from one mirror to 222 another if the mirror becomes unreachable. Metalink clients SHOULD 223 support multi-source, or parallel, downloads, where portions of a 224 file are downloaded from multiple mirrors simultaneously (and 225 optionally, from Peer-to-Peer sources). Metalink clients MUST 226 support Instance Digests in HTTP [RFC3230] by requesting and 227 verifying checksums. Metalink clients MAY make use of digital 228 signatures if they are offered. 230 3. Mirrors / Multiple Download Locations 232 Mirrors are specified with the Link header 233 [draft-nottingham-http-link-header] and a relation type of 234 "duplicate" as defined in Section 9. 236 A brief Metalink server response with two mirrors only: 238 Link: ; rel="duplicate"; 239 pri=1; pref=1 240 Link: ; rel="duplicate"; 241 pri=2; geo="gb"; depth=1 243 [[Some organizations have many mirrors. Only send a few mirrors, or 244 only use the Link header if Want-Digest is used?]] 246 It is up to the server to choose how many Link headers to send. Such 247 a decision could be a hard-coded limit, a random selection, based on 248 file size, or based on server load. 250 3.1. Mirror Priority 252 Mirror servers are listed in order of priority (from most preferred 253 to least) or have a "pri" value, where mirrors with lower values are 254 used first. 256 This is purely an expression of the server's preferences; it is up to 257 the client what it does with this information, particularly with 258 reference to how many servers to use at any one time. A client MUST 259 respect the server's priority ordering, however. 261 [[Would it make more sense to use qvalue-style policies here, i.e. 262 q=1.0 through q=0.0 ?]] 264 3.2. Mirror Geographical Location 266 Mirror servers MAY have a "geo" value, which is a [ISO3166-1] alpha-2 267 two letter country code for the geographical location of the physical 268 server the IRI is used to access. A client may use this information 269 to select a mirror, or set of mirrors, that are geographically near 270 (if the client has access to such information), with the aim of 271 reducing network load at inter-country bottlenecks. 273 3.3. Coordinated Mirror Policies 275 There are two types of mirror servers: preferred and normal. 276 Preferred mirror servers are HTTP mirror servers that MUST share the 277 same ETag policy as the originating Metalink server. Optimally, they 278 will do both. Preferred mirrors make it possible to detect early on, 279 before data is transferred, if the file requested matches the desired 280 file. Preferred mirror servers Preferred HTTP mirror servers have a 281 "pref" value of 1. By default, if unspecified then mirrors are 282 considered "normal" and do not share the same ETag policy. FTP 283 mirrors, as they do not emit ETags, MUST always be considered 284 "normal". 286 HTTP Mirror servers SHOULD support Instance Digests in HTTP 287 [RFC3230]. 289 [[Suggestion: In order for clients to identify servers that have 290 coordinated ETag policies, the ETag MUST begin with "Metalink:", e.g. 292 ETag: "Metalink:SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=" 294 ]] 296 3.4. Mirror Depth 298 Some mirrors may mirror single files, whole directories, or multiple 299 directories. 301 Mirror servers MAY have a "depth" value, where "depth=0" is the 302 default. A value of 0 means ONLY that file is mirrored. A value of 303 1 means that file and all other files and subdirectories in the 304 directory are mirrored. A value of 2 means the directory above, and 305 all files and subdirectories, are mirrored. 307 A mirror with a depth value of 4: 309 Link: ; 310 rel="duplicate"; pri=1; pref=1; depth=4 312 Is the above example, 4 directories up are mirrored, from /dir2/ on 313 down. 315 4. Peer-to-Peer / Metainfo 317 Metainfo files, which describe ways to download a file over Peer-to- 318 Peer networks or otherwise, are specified with the Link header 319 [draft-nottingham-http-link-header] and a relation type of 320 "describedby" and a type parameter that indicates the MIME type of 321 the metadata available at the IRI. 323 A brief Metalink server response with .torrent and .metalink: 325 Link: ; rel="describedby"; 326 type="application/x-bittorrent" 327 Link: ; rel="describedby"; 328 type="application/metalink4+xml" 330 4.1. Metalink/XML Files 332 Full Metalink/XML files for a given resource can be specified as 333 shown in Section 4. This is particularly useful for providing 334 metadata such as checksums of chunks, allowing a client to recover 335 from partial errors (see Section 7.1.2). 337 5. OpenPGP Signatures 339 OpenPGP signatures are specified with the Link header 340 [draft-nottingham-http-link-header] and a relation type of 341 "describedby" and a type parameter of "application/pgp-signature". 343 A brief Metalink server response with OpenPGP signature only: 345 Link: ; rel="describedby"; 346 type="application/pgp-signature" 348 6. Checksums of Whole Files 350 Metalink servers MUST provide Instance Digests in HTTP [RFC3230] for 351 files they describe with mirrors. Mirror servers SHOULD as well. 353 A brief Metalink server response with checksum: 355 Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5= 357 7. Client / Server Multi-source Download Interaction 359 Metalink clients begin a download with a standard HTTP [RFC2616] GET 360 request to the Metalink server. A Range limit is optional, not 361 required. Alternatively, Metalink clients can begin with a HEAD 362 request to the Metalink server to discover mirrors via Link headers. 363 After that, the client follows with a GET request to the desired 364 mirrors. 366 GET /distribution/example.ext HTTP/1.1 367 Host: www.example.com 369 The Metalink server responds with the data and these headers: 371 HTTP/1.1 200 OK 372 Accept-Ranges: bytes 373 Content-Length: 14867603 374 Content-Type: application/x-cd-image 375 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 376 Link: ; rel="duplicate" pref=1 377 Link: ; rel="duplicate" 378 Link: ; rel="describedby"; 379 type="application/x-bittorrent" 380 Link: ; rel="describedby"; 381 type="application/metalink4+xml" 382 Link: ; rel="describedby"; 383 type="application/pgp-signature" 384 Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5= 386 From the Metalink server response the client learns some or all of 387 the following metadata about the requested object, in addition to 388 also starting to receive the object: 390 o Object size. 391 o ETag. 392 o Mirror profile link, which may describe the mirror's priority, 393 whether it shares the ETag policy of the originating Metalink 394 server, geographical location, and mirror depth. 395 o Peer-to-peer information. 396 o Metalink/XML, which can include partial file checksums to repair a 397 file. 398 o Digital signature. 399 o Instance Digest, which is the whole file checksum. 401 (Alternatively, the client could have requested a HEAD only, and then 402 skipped to making the following decisions on every available mirror 403 server found via the Link headers) 405 If the object is large and gets delivered slower than expected then 406 the Metalink client starts a number of parallel ranged downloads (one 407 per selected mirror server other than the first) using mirrors 408 provided by the Link header with "duplicate" relation type, using the 409 location of the original GET request in the "Referer" header field. 410 The size and number of ranges requested from each server is for the 411 client to decide, based upon the performance observed from each 412 server. Further discussion of performance considerations is 413 presented in Section 8. 415 If no Range limit was given in the original request then work from 416 the tail of the object (the first request is still running and will 417 eventually catch up), otherwise continue after the range requested in 418 the first request. If no Range was provided, the original connection 419 must be terminated once all parts of the resource have been 420 retrieved. It is recommended that a HEAD request is undertaken 421 first, so that the client can find out if there are any Link headers, 422 and then Range-based requests are undertaken to the mirror servers as 423 well as on the original connection. 425 Preferred mirrors have coordinated ETags, as described in 426 Section 3.3, and If-Match conditions based on the ETag SHOULD be used 427 to quickly detect out-of-date mirrors by using the ETag from the 428 Metalink server response. If no indication of ETag syncronisation/ 429 knowledge is given then If-Match should not be used, and optimally 430 there will be an Instance Digest in the mirror response which we can 431 use to detect a mismatch early, and if not then a mismatch won't be 432 detected until the completed object is verified. Early file mismatch 433 detection is described in detail in Section 7.1.1. 435 One of the client requests to a mirror server: 437 GET /example.ext HTTP/1.1 438 Host: www2.example.com 439 Range: bytes=7433802- 440 If-Match: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 441 Referer: http://www.example.com/distribution/example.ext 443 The mirror servers respond with a 206 Partial Content HTTP status 444 code and appropriate "Content-Length" and "Content Range" header 445 fields. The mirror server response, with data, to the above request: 447 HTTP/1.1 206 Partial Content 448 Accept-Ranges: bytes 449 Content-Length: 7433801 450 Content-Range: bytes 7433802-14867602/14867603 451 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 452 Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5= 454 If the first request was not Range limited then abort it by closing 455 the connection when it catches up with the other parallel downloads 456 of the same object. 458 Downloads from mirrors that do not have the same file size as the 459 Metalink server MUST be aborted. 461 Once the download has completed, the Metalink client MUST verify the 462 checksum of the file. 464 7.1. Error Prevention, Detection, and Correction 466 Error prevention, or early file mismatch detection, is possible 467 before file transfers with the use of file sizes, ETags, and Instance 468 Digests. Error dectection requires Instance Digests, or checksums, 469 to determine after transfers if there has been an error. Error 470 correction, or download repair, is possible with partial file 471 checksums. 473 7.1.1. Error Prevention (Early File Mismatch Detection) 475 In HTTP terms, the requirement is that merging of ranges from 476 multiple responses must be verified with a strong validator, which in 477 this context is the same as either Instance Digest or a strong ETag. 478 In most cases it is sufficient that the Metalink server provides 479 mirrors and Instance Digest information, but operation will be more 480 robust and efficient if the mirror servers do implement a 481 synchronized ETag as well. In fact, the emitted ETag may be 482 implemented the same as the Instance Digest for simplicity, but there 483 is no need to specify how the ETag is generated, just that it needs 484 to be shared among the mirror servers. If the mirror server provides 485 neither synchronized ETag or Instance Digest, then early detection of 486 mismatches is not possible unless file length also differs. Finally, 487 the error is still detectable, after the download has completed, when 488 the merged response is verified. 490 ETag can not be used for verifying the integrity of the received 491 content. But it is a guarantee issued by the Metalink server that 492 the content is correct for that ETag. And if the ETag given by the 493 mirror server matches the ETag given by the master server, then we 494 have a chain of trust where the master server authorizes these 495 responses as valid for that object. 497 This guarantees that a mismatch will be detected by using only the 498 synchronized ETag from a master server and mirror server, even 499 alerted by the mirror servers themselves by responding with an error, 500 preventing accidental merges of ranges from different versions of 501 files with the same name. This even includes many malicious attacks 502 where the data on the mirror has been replaced by some other file, 503 but not all. 505 Synchronized ETag can not strictly protect against malicious attacks 506 or server or network errors replacing content, but neither can 507 Instance Digest on the mirror servers as the attacker most certainly 508 can make the server seemingly respond with the expected Instance 509 Digest even if the file contents have been modified, just as he can 510 with ETag, and the same for various system failures also causing bad 511 data to be returned. The Metalink client has to rely on the Instance 512 Digest returned by the Metalink master server in the first response 513 for the verification of the downloaded object as a whole. 515 If the mirror servers do return an Instance Digest, then that is a 516 bonus, just as having them return the right set of Link headers is. 517 The set of trusted mirrors doing that can be substituted as master 518 servers accepting the initial request if one likes. 520 The benefit of having slave mirror servers (those not trusted as 521 masters) return Instance Digest is that the client then can detect 522 mismatches early even if ETag is not used. Both ETag and slave 523 mirror Instance Digest do provide value, but just one is sufficient 524 for early detection of mismatches. If none is provided then early 525 detection of mismatches is not possible unless the file length also 526 differs, but the error is still detected when the merged response is 527 verified. 529 7.1.2. Error Correction 531 If the object checksum does not match the Instance Digest then fetch 532 the Metalink/XML or other recovery profile link, where partial file 533 checksums can be found, allowing detection of which server returned 534 bad information. If the Instance Digest computation does not match 535 then the client needs to fetch the partial file checksums and from 536 there figure out what of the downloaded data can be recovered and 537 what needs to be fetched again. If no partial checksums are 538 available, then the client MUST fetch the complete object from a 539 trusted Metalink server. 541 Partial file checksums can be used to detect errors during the 542 download. 544 8. Multi-server Performance 546 When opting to download simultaneously from multiple mirrors, there 547 are a number of factors (both within and outside the influence of the 548 client software) that are relevant to the performance achieved: 550 o The number of servers used simultaneously. 551 o The ability to pipeline sufficient or sufficiently large range 552 requests to each server so as to avoid connections going idle. 553 o The ability to pipeline sufficiently few or sufficiently small 554 range requests to servers so that all the servers finish their 555 final chunks simultaneously. 556 o The ability to switch between mirrors dynamically so as to use the 557 fastest mirrors at any moment in time 559 Obviously we do not want to use too many simultaneous connections, or 560 other traffic sharing a bottleneck link will be starved. But at the 561 same time, good performance requires that the client can 562 simultaneously download from at least one fast mirror while exploring 563 whether any other mirror is faster. Based on laboratory experiments, 564 we suggest a good default number of simultaneous connections is 565 probably four, with three of these being used for the best three 566 mirrors found so far, and one being used to evaluate whether any 567 other mirror might offer better performance. 569 The size of chunks chosen by the client should be sufficiently large 570 that the chunk request headers and reponse headers represent neglible 571 overhead, and sufficiently large that they can be pipelined 572 effectively without needing a very high rate of chunk requests. At 573 the same time, the amount of time wasted waiting for the last chunk 574 to download from the last server after all the other servers have 575 finished should be minimized. Thus we currently recommend that a 576 chunk size of at least 10KBytes should be used. If the file being 577 transfered is very large, or the download speed very high, this can 578 be increased to perhaps 1MByte. As network bandwidths increase, we 579 expect these numbers to increase appropriately, so that the time to 580 transfer a chunk remains significantly larger than the latency of 581 requesting a chunk from a server. 583 9. IANA Considerations 585 Accordingly, IANA has made the following registration to the Link 586 Relation Type registry. 588 o Relation Name: duplicate 590 o Description: Refers to a resource whose available representations 591 are byte-for-byte identical with the corresponding representations of 592 the context IRI. 594 o Reference: This specification. 596 o Notes: This relation is for static resources. That is, an HTTP GET 597 request on any duplicate will return the same representation. It 598 does not make sense for dynamic or POSTable resources and should not 599 be used for them. 601 10. Security Considerations 602 10.1. URIs and IRIs 604 Metalink clients handle URIs and IRIs. See Section 7 of [RFC3986] 605 and Section 8 of [RFC3987] for security considerations related to 606 their handling and use. 608 10.2. Spoofing 610 There is potential for spoofing attacks where the attacker publishes 611 Metalinks with false information. In that case, this could deceive 612 unaware downloaders that they are downloading a malicious or 613 worthless file. Also, malicious publishers could attempt a 614 distributed denial of service attack by inserting unrelated IRIs into 615 Metalinks. 617 10.3. Cryptographic Hashes 619 Currently, some of the digest values defined in Instance Digests in 620 HTTP [RFC3230] are considered insecure. These include the whole 621 Message Digest family of algorithms which are not suitable for 622 cryptographically strong verification. Malicious people could 623 provide files that appear to be identical to another file because of 624 a collision, i.e. the weak cryptographic hashes of the intended file 625 and a substituted malicious file could match. 627 If a Metalink contains whole file hashes as described in Section 6, 628 it SHOULD include "sha" which is SHA-1, as specified in [RFC3174], or 629 stronger. It MAY also include other hashes. 631 10.4. Signing 633 Metalinks should include digital signatures, as described in 634 Section 5. 636 Digital signatures provide authentication, message integrity, and 637 non-repudiation with proof of origin. 639 11. References 641 11.1. Normative References 643 [ISO3166-1] 644 International Organization for Standardization, "ISO 3166- 645 1:2006. Codes for the representation of names of 646 countries and their subdivisions -- Part 1: Country 647 codes", November 2006. 649 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 650 Requirement Levels", BCP 14, RFC 2119, March 1997. 652 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 653 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 654 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 656 [RFC3174] Eastlake, D. and P. Jones, "US Secure Hash Algorithm 1 657 (SHA1)", RFC 3174, September 2001. 659 [RFC3230] Mogul, J. and A. Van Hoff, "Instance Digests in HTTP", 660 RFC 3230, January 2002. 662 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 663 Resource Identifier (URI): Generic Syntax", STD 66, 664 RFC 3986, January 2005. 666 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 667 Identifiers (IRIs)", RFC 3987, January 2005. 669 [draft-nottingham-http-link-header] 670 Nottingham, M., "Web Linking", 671 draft-nottingham-http-link-header-06 (work in progress), 672 July 2009. 674 11.2. Informative References 676 [draft-bryan-metalink] 677 Bryan, A., Ed., Tsujikawa, T., McNab, N., and P. Poeml, 678 "The Metalink Download Description Format", 679 draft-bryan-metalink-16 (work in progress), August 2009. 681 Appendix A. Acknowledgements and Contributors 683 Thanks to the Metalink community, Mark Handley, Mark Nottingham, 684 Daniel Stenberg, Tatsuhiro Tsujikawa, Peter Poeml, and Matt Domsch. 686 Support for simultaneous download from multiple mirrors is based upon 687 work by Mark Handley and Javier Vela Diago, who also provided 688 validation of the benefits of this approach. 690 Appendix B. Comparisons to Similar Options 692 [[ to be removed by the RFC editor before publication as an RFC. ]] 694 This draft, compared to the Metalink/XML format 696 [draft-bryan-metalink] : 698 o (+) Reuses existing HTTP standards without much new besides a Link 699 Relation Type. It's more of a collection/coordinated feature set. 700 o (?) The existing standards don't seem to be widely implemented. 701 o (+) No XML dependency, unless we use Metalink/XML for partial file 702 checksums. 703 o (+) Existing Metalink/XML clients can be easily converted to 704 support this as well. 705 o (+) Coordination of mirror servers is preferred, but not required. 706 Coordination may be difficult or impossible unless you are in 707 control of all servers on the mirror network. 708 o (-) Requires software or configuration changes to originating 709 server. 710 o (-?) Tied to HTTP, not as generic. FTP/P2P clients won't be 711 using it unless they also support HTTP, unlike Metalink/XML. 712 o (-) Requires server-side support. Metalink/XML can be created by 713 user (or server, but server component/changes not required). 714 o (-) Also, Metalink/XML files are easily mirrored on all servers. 715 Even if usage in that case is not as transparent, it still gives 716 access to users at all mirrors (FTP included) to all download 717 information with no changes needed to the server. 718 o (-) Not portable/archivable/emailable. Metalink/XML is used to 719 import/export transfer queues. Not as easy for search engines to 720 index? 721 o (-) No way to show mirror geographical location (yet). 722 o (-) Not as rich metadata. 723 o (-) Not able to add multiple files to a download queue or create 724 directory structure. 726 Appendix C. Document History 728 [[ to be removed by the RFC editor before publication as an RFC. ]] 730 Known issues concerning this draft: 731 o Use of Link header to describe Mirrors. Only send a few mirrors 732 with Link header, or only send them if Want-Digest is used? Some 733 organizations have many mirrors. 734 o Would it make more sense to use qvalue-style policies to describe 735 mirror priority, i.e. q=1.0 through q=0.0 ? 736 o Will we use Metalink/XML for partial file checksums? That would 737 add XML dependency to apps for an important feature. 738 o Do we need an official MIME type for .torrent files or allow 739 "application/x-bittorrent"? 741 -11 : October 23, 2009. 743 o Mirror changes. 745 -10 : October 15, 2009. 746 o Mirror coordination changes. 748 -09 : October 12, 2009. 749 o Mirror location, coordination, and depth. 750 o Split HTTP Digest Algorithm Values Registration into 751 draft-bryan-http-digest-algorithm-values-update. 753 -08 : October 4, 2009. 754 o Clarifications. 756 -07 : September 29, 2009. 757 o Preferred mirror servers. 759 -06 : September 24, 2009. 760 o Add Mismatch Detection, Error Recovery, and Digest Algorithm 761 values. 762 o Remove Content-MD5 and Want-Digest. 764 -05 : September 19, 2009. 765 o ETags, preferably matching the Instance Digests. 767 -04 : September 17, 2009. 768 o Temporarily remove .torrent. 770 -03 : September 16, 2009. 771 o Mention HEAD request, negotiate mirrors if Want-Digest is used. 773 -02 : September 6, 2009. 774 o Content-MD5 for partial file checksums. 776 -01 : September 1, 2009. 777 o Link Relation Type Registration: "duplicate" 779 -00 : August 24, 2009. 780 o Initial draft. 782 Authors' Addresses 784 Anthony Bryan 785 Pompano Beach, FL 786 USA 788 Email: anthonybryan@gmail.com 789 URI: http://www.metalinker.org 791 Neil McNab 793 Email: neil@nabber.org 794 URI: http://www.nabber.org 796 Henrik Nordstrom 798 Email: henrik@henriknordstrom.net 799 URI: http://www.henriknordstrom.net/ 801 Alan Ford 802 Roke Manor Research 803 Old Salisbury Lane 804 Romsey, Hampshire SO51 0ZN 805 UK 807 Phone: +44 1794 833 465 808 Email: alan.ford@roke.co.uk