idnits 2.17.1 draft-bryan-metalinkhttp-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 15, 2009) is 5305 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Downref: Normative reference to an Informational RFC: RFC 3174 ** Obsolete normative reference: RFC 3230 (Obsoleted by RFC 9530) Summary: 4 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Bryan 3 Internet-Draft N. McNab 4 Intended status: Standards Track H. Nordstrom 5 Expires: April 18, 2010 6 A. Ford 7 Roke Manor Research 8 October 15, 2009 10 Metalink/HTTP: Mirrors and Checksums in HTTP Headers 11 draft-bryan-metalinkhttp-10 13 Status of this Memo 15 This Internet-Draft is submitted to IETF in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on April 18, 2010. 36 Copyright Notice 38 Copyright (c) 2009 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents in effect on the date of 43 publication of this document (http://trustee.ietf.org/license-info). 44 Please review these documents carefully, as they describe your rights 45 and restrictions with respect to this document. 47 Abstract 49 This document specifies Metalink/HTTP: Mirrors and Checksums in HTTP 50 Headers, an alternative to the Metalink XML-based download 51 description format. Metalink/HTTP describes multiple download 52 locations (mirrors), Peer-to-Peer, checksums, digital signatures, and 53 other information using existing standards for HTTP headers. Clients 54 can transparently use this information to make file transfers more 55 robust and reliable. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Operation Overview . . . . . . . . . . . . . . . . . . . . 4 61 1.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 4 62 1.3. Notational Conventions . . . . . . . . . . . . . . . . . . 4 63 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 5 64 3. Mirrors / Multiple Download Locations . . . . . . . . . . . . 5 65 3.1. Mirror Priority . . . . . . . . . . . . . . . . . . . . . 6 66 3.2. Mirror Geographical Location . . . . . . . . . . . . . . . 6 67 3.3. Coordinated Mirror Policies . . . . . . . . . . . . . . . 6 68 3.4. Mirror Depth . . . . . . . . . . . . . . . . . . . . . . . 7 69 4. Peer-to-Peer / Metainfo . . . . . . . . . . . . . . . . . . . 7 70 4.1. Metalink/XML Files . . . . . . . . . . . . . . . . . . . . 8 71 5. OpenPGP Signatures . . . . . . . . . . . . . . . . . . . . . . 8 72 6. Checksums of Whole Files . . . . . . . . . . . . . . . . . . . 8 73 7. Client / Server Multi-source Download Interaction . . . . . . 8 74 7.1. Error Prevention, Detection, and Correction . . . . . . . 11 75 7.1.1. Error Prevention (Early File Mismatch Detection) . . . 11 76 7.1.2. Error Correction . . . . . . . . . . . . . . . . . . . 12 77 8. Multi-server Performance . . . . . . . . . . . . . . . . . . . 12 78 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 79 10. Security Considerations . . . . . . . . . . . . . . . . . . . 13 80 10.1. URIs and IRIs . . . . . . . . . . . . . . . . . . . . . . 14 81 10.2. Spoofing . . . . . . . . . . . . . . . . . . . . . . . . . 14 82 10.3. Cryptographic Hashes . . . . . . . . . . . . . . . . . . . 14 83 10.4. Signing . . . . . . . . . . . . . . . . . . . . . . . . . 14 84 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 85 11.1. Normative References . . . . . . . . . . . . . . . . . . . 14 86 11.2. Informative References . . . . . . . . . . . . . . . . . . 15 87 Appendix A. Acknowledgements and Contributors . . . . . . . . . . 15 88 Appendix B. Comparisons to Similar Options (to be removed by 89 RFC Editor before publication) . . . . . . . . . . . 15 90 Appendix C. Document History (to be removed by RFC Editor 91 before publication) . . . . . . . . . . . . . . . . . 16 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17 94 1. Introduction 96 Metalink/HTTP is an alternative representation of Metalink 97 information, which is usually presented as an XML-based document 98 format [draft-bryan-metalink]. Metalink/HTTP attempts to provide as 99 much functionality as the Metalink/XML format by using existing 100 standards such as Web Linking [draft-nottingham-http-link-header], 101 Instance Digests in HTTP [RFC3230], and ETags. Metalink/HTTP is used 102 to list information about a file to be downloaded. This can include 103 lists of multiple URIs (mirrors), Peer-to-Peer information, 104 checksums, and digital signatures. 106 Identical copies of a file are frequently accessible in multiple 107 locations on the Internet over a variety of protocols (such as FTP, 108 HTTP, and Peer-to-Peer). In some cases, users are shown a list of 109 these multiple download locations (mirrors) and must manually select 110 a single one on the basis of geographical location, priority, or 111 bandwidth. This distributes the load across multiple servers, and 112 should also increase throughput and resilience. At times, however, 113 individual servers can be slow, outdated, or unreachable, but this 114 can not be determined until the download has been initiated. Users 115 will rarely have sufficient information to choose the most 116 appropriate server, and will often choose the first in a list which 117 may not be optimal for their needs, and will lead to a particular 118 server getting a disproportionate share of load. The use of 119 suboptimal mirrors can lead to the user canceling and restarting the 120 download to try to manually find a better source. During downloads, 121 errors in transmission can corrupt the file. There are no easy ways 122 to repair these files. For large downloads this can be extremely 123 troublesome. Any of the number of problems that can occur during a 124 download lead to frustration on the part of users. 126 Some popular sites automate the process of selecting mirrors using 127 DNS load balancing, both to approximately balance load between 128 servers, and to direct clients to nearby servers with the hope that 129 this improves throughput. Indeed, DNS load balancing can balance 130 long-term server load fairly effectively, but it is less effective at 131 delivering the best throughput to users when the bottleneck is not 132 the server but the network. 134 This document describes a mechanism by which the benefit of mirrors 135 can be automatically and more effectively realized. All the 136 information about a download, including mirrors, checksums, digital 137 signatures, and more can be transferred in coordinated HTTP Headers. 138 This Metalink transfers the knowledge of the download server (and 139 mirror database) to the client. Clients can fallback to other 140 mirrors if the current one has an issue. With this knowledge, the 141 client is enabled to work its way to a successful download even under 142 adverse circumstances. All this is done transparently to the user 143 and the download is much more reliable and efficient. In contrast, a 144 traditional HTTP redirect to a mirror conveys only extremely minimal 145 information - one link to one server, and there is no provision in 146 the HTTP protocol to handle failures. Furthermore, in order to 147 provide better load distribution across servers and potentially 148 faster downloads to users, Metalink/HTTP facilitates multi-source 149 downloads, where portions of a file are downloaded from multiple 150 mirrors (and optionally, Peer-to-Peer) simultaneously. 152 [[ Discussion of this draft should take place on IETF HTTP WG mailing 153 list at ietf-http-wg@w3.org or the Metalink discussion mailing list 154 located at metalink-discussion@googlegroups.com. To join the list, 155 visit http://groups.google.com/group/metalink-discussion . ]] 157 1.1. Operation Overview 159 Detailed discussion of Metalink operation is covered in Section 2; 160 this section will present a very brief, high-level overview of how 161 Metalink achieves its goals. 163 Upon connection to a Metalink/HTTP server, a client will receive 164 information about other sources of the same resource and a checksum 165 of the whole resource. The client will then be able to request 166 chunks of the file from the various sources, scheduling appropriately 167 in order to maximise the download rate. 169 1.2. Examples 171 A brief Metalink server response with ETag, mirrors, .metalink, 172 OpenPGP signature, and whole file checksum: 174 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 175 Link: ; rel="duplicate" 176 Link: ; rel="duplicate" 177 Link: ; rel="describedby"; 178 type="application/x-bittorrent" 179 Link: ; rel="describedby"; 180 type="application/metalink4+xml" 181 Link: ; rel="describedby"; 182 type="application/pgp-signature" 183 Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5= 185 1.3. Notational Conventions 187 This specification describes conformance of Metalink/HTTP. 189 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 190 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 191 document are to be interpreted as described in BCP 14, [RFC2119], as 192 scoped to those conformance targets. 194 2. Requirements 196 In this context, "Metalink" refers to Metalink/HTTP which consists of 197 mirrors and checksums in HTTP Headers as described in this document. 198 "Metalink/XML" refers to the XML format described in 199 [draft-bryan-metalink]. 201 Metalink servers are HTTP servers that use the Link header 202 [draft-nottingham-http-link-header] to present a list of mirrors of a 203 resource to a client. They MUST provide checksums of files via 204 Instance Digests in HTTP [RFC3230], whether requested or not. Mirror 205 and checksum information provided by the originating Metalink server 206 MUST be considered authoritative. Metalink servers and their 207 associated mirror servers SHOULD all share the same ETag policy (ETag 208 Synchronization), i.e. based on the file contents (checksum) and not 209 server-unique filesystem metadata. The emitted ETag MAY be 210 implemented the same as the Instance Digest for simplicity. 212 Mirror servers are typically FTP or HTTP servers that "mirror" 213 another server. That is, they provide identical copies of (at least 214 some) files that are also on the mirrored server. Mirror servers MAY 215 be Metalink servers. Mirror servers MUST support serving partial 216 content. HTTP mirror servers SHOULD share the same ETag policy as 217 the originating Metalink server. HTTP Mirror servers SHOULD support 218 Instance Digests in HTTP [RFC3230]. 220 Metalink clients use the mirrors provided by a Metalink server with 221 Link header [draft-nottingham-http-link-header]. Metalink clients 222 MUST support HTTP and MAY support FTP, BitTorrent, or other download 223 methods. Metalink clients MUST switch downloads from one mirror to 224 another if the mirror becomes unreachable. Metalink clients SHOULD 225 support multi-source, or parallel, downloads, where portions of a 226 file are downloaded from multiple mirrors simultaneously (and 227 optionally, from Peer-to-Peer sources). Metalink clients MUST 228 support Instance Digests in HTTP [RFC3230] by requesting and 229 verifying checksums. Metalink clients MAY make use of digital 230 signatures if they are offered. 232 3. Mirrors / Multiple Download Locations 234 Mirrors are specified with the Link header 235 [draft-nottingham-http-link-header] and a relation type of 236 "duplicate" as defined in Section 9. 238 A brief Metalink server response with two mirrors only: 240 Link: ; rel="duplicate"; 241 pri=1; pref=1 242 Link: ; rel="duplicate"; 243 pri=2; geo="gb"; depth=1 245 [[Some organizations have many mirrors. Only send a few mirrors, or 246 only use the Link header if Want-Digest is used?]] 248 It is up to the server to choose how many Link headers to send. Such 249 a decision could be a hard-coded limit, a random selection, based on 250 file size, or based on server load. 252 3.1. Mirror Priority 254 Mirror servers are listed in order of priority (from most preferred 255 to least) or have a "pri" value, where mirrors with lower values are 256 used first. 258 This is purely an expression of the server's preferences; it is up to 259 the client what it does with this information, particularly with 260 reference to how many servers to use at any one time. A client MUST 261 respect the server's priority ordering, however. 263 [[Would it make more sense to use qvalue-style policies here, i.e. 264 q=1.0 through q=0.0 ?]] 266 3.2. Mirror Geographical Location 268 Mirror servers MAY have a "geo" value, which is a [ISO3166-1] alpha-2 269 two letter country code for the geographical location of the physical 270 server the IRI is used to access. A client may use this information 271 to select a mirror, or set of mirrors, that are geographically near 272 (if the client has access to such information), with the aim of 273 reducing network load at inter-country bottlenecks. 275 3.3. Coordinated Mirror Policies 277 There are two types of mirror servers: preferred and normal. 278 Preferred mirror servers are HTTP mirror servers that MUST share the 279 same ETag policy as the originating Metalink server. Optimally, they 280 will do both. Preferred mirrors make it possible to detect early on, 281 before data is transferred, if the file requested matches the desired 282 file. Preferred mirror servers Preferred HTTP mirror servers have a 283 "pref" value of 1. By default, if unspecified then mirrors are 284 considered "normal" and do not share the same ETag policy. FTP 285 mirrors, as they do not emit ETags, MUST always be considered 286 "normal". 288 HTTP Mirror servers SHOULD support Instance Digests in HTTP 289 [RFC3230]. 291 [[Suggestion: In order for clients to identify servers that have 292 coordinated ETag policies, the ETag MUST begin with "Metalink:", e.g. 294 ETag: "Metalink:SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=" 296 ]] 298 3.4. Mirror Depth 300 Some mirrors may mirror single files, whole directories, or multiple 301 directories. 303 Mirror servers MAY have a "depth" value, where "depth=0" is the 304 default. A value of 0 means ONLY that file is mirrored. A value of 305 1 means that file and all other files and subdirectories in the 306 directory are mirrored. A value of 2 means the directory above, and 307 all files and subdirectories, are mirrored. 309 A mirror with a depth value of 4: 311 Link: ; 312 rel="duplicate"; pri=1; pref=1; depth=4 314 Is the above example, 4 directories up are mirrored, from /dir2/ on 315 down. 317 4. Peer-to-Peer / Metainfo 319 Metainfo files, which describe ways to download a file over Peer-to- 320 Peer networks or otherwise, are specified with the Link header 321 [draft-nottingham-http-link-header] and a relation type of 322 "describedby" and a type parameter that indicates the MIME type of 323 the metadata available at the IRI. 325 A brief Metalink server response with .torrent and .metalink: 327 Link: ; rel="describedby"; 328 type="application/x-bittorrent" 329 Link: ; rel="describedby"; 330 type="application/metalink4+xml" 332 4.1. Metalink/XML Files 334 Full Metalink/XML files for a given resource can be specified as 335 shown in Section 4. This is particularly useful for providing 336 metadata such as checksums of chunks, allowing a client to recover 337 from partial errors (see Section 7.1.2). 339 5. OpenPGP Signatures 341 OpenPGP signatures are specified with the Link header 342 [draft-nottingham-http-link-header] and a relation type of 343 "describedby" and a type parameter of "application/pgp-signature". 345 A brief Metalink server response with OpenPGP signature only: 347 Link: ; rel="describedby"; 348 type="application/pgp-signature" 350 6. Checksums of Whole Files 352 Metalink servers MUST provide Instance Digests in HTTP [RFC3230] for 353 files they describe with mirrors. Mirror servers SHOULD as well. 355 A brief Metalink server response with checksum: 357 Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5= 359 7. Client / Server Multi-source Download Interaction 361 Metalink clients begin a download with a standard HTTP [RFC2616] GET 362 request to the Metalink server. A Range limit is optional, not 363 required. Alternatively, Metalink clients can begin with a HEAD 364 request to the Metalink server to discover mirrors via Link headers. 365 After that, the client follows with a GET request to the desired 366 mirrors. 368 GET /distribution/example.ext HTTP/1.1 369 Host: www.example.com 371 The Metalink server responds with the data and these headers: 373 HTTP/1.1 200 OK 374 Accept-Ranges: bytes 375 Content-Length: 14867603 376 Content-Type: application/x-cd-image 377 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 378 Link: ; rel="duplicate" pref=1 379 Link: ; rel="duplicate" 380 Link: ; rel="describedby"; 381 type="application/x-bittorrent" 382 Link: ; rel="describedby"; 383 type="application/metalink4+xml" 384 Link: ; rel="describedby"; 385 type="application/pgp-signature" 386 Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5= 388 From the Metalink server response the client learns some or all of 389 the following metadata about the requested object, in addition to 390 also starting to receive the object: 392 o Object size. 393 o ETag. 394 o Mirror profile link, which may describe the mirror's priority, 395 whether it shares the ETag policy of the originating Metalink 396 server, geographical location, and mirror depth. 397 o Peer-to-peer information. 398 o Metalink/XML, which can include partial file checksums to repair a 399 file. 400 o Digital signature. 401 o Instance Digest, which is the whole file checksum. 403 (Alternatively, the client could have requested a HEAD only, and then 404 skipped to making the following decisions on every available mirror 405 server found via the Link headers) 407 If the object is large and gets delivered slower than expected then 408 the Metalink client starts a number of parallel ranged downloads (one 409 per selected mirror server other than the first) using mirrors 410 provided by the Link header with "duplicate" relation type, using the 411 location of the original GET request in the "Referer" header field. 412 The size and number of ranges requested from each server is for the 413 client to decide, based upon the performance observed from each 414 server. Further discussion of performance considerations is 415 presented in Section 8. 417 If no Range limit was given in the original request then work from 418 the tail of the object (the first request is still running and will 419 eventually catch up), otherwise continue after the range requested in 420 the first request. If no Range was provided, the original connection 421 must be terminated once all parts of the resource have been 422 retrieved. It is recommended that a HEAD request is undertaken 423 first, so that the client can find out if there are any Link headers, 424 and then Range-based requests are undertaken to the mirror servers as 425 well as on the original connection. 427 Preferred mirrors have coordinated ETags, as described in 428 Section 3.3, and If-Match conditions based on the ETag SHOULD be used 429 to quickly detect out-of-date mirrors by using the ETag from the 430 Metalink server response. If no indication of ETag syncronisation/ 431 knowledge is given then If-Match should not be used, and optimally 432 there will be an Instance Digest in the mirror response which we can 433 use to detect a mismatch early, and if not then a mismatch won't be 434 detected until the completed object is verified. Early file mismatch 435 detection is described in detail in Section 7.1.1. 437 One of the client requests to a mirror server: 439 GET /example.ext HTTP/1.1 440 Host: www2.example.com 441 Range: bytes=7433802- 442 If-Match: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 443 Referer: http://www.example.com/distribution/example.ext 445 The mirror servers respond with a 206 Partial Content HTTP status 446 code and appropriate "Content-Length" and "Content Range" header 447 fields. The mirror server response, with data, to the above request: 449 HTTP/1.1 206 Partial Content 450 Accept-Ranges: bytes 451 Content-Length: 7433801 452 Content-Range: bytes 7433802-14867602/14867603 453 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 454 Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5= 456 If the first request was not Range limited then abort it by closing 457 the connection when it catches up with the other parallel downloads 458 of the same object. 460 Downloads from mirrors that do not have the same file size as the 461 Metalink server MUST be aborted. 463 Once the download has completed, the Metalink client MUST verify the 464 checksum of the file. 466 7.1. Error Prevention, Detection, and Correction 468 Error prevention, or early file mismatch detection, is possible 469 before file transfers with the use of file sizes, ETags, and Instance 470 Digests. Error dectection requires Instance Digests, or checksums, 471 to determine after transfers if there has been an error. Error 472 correction, or download repair, is possible with partial file 473 checksums. 475 7.1.1. Error Prevention (Early File Mismatch Detection) 477 In HTTP terms, the requirement is that merging of ranges from 478 multiple responses must be verified with a strong validator, which in 479 this context is the same as either Instance Digest or a strong ETag. 480 In most cases it is sufficient that the Metalink server provides 481 mirrors and Instance Digest information, but operation will be more 482 robust and efficient if the mirror servers do implement a 483 synchronized ETag as well. In fact, the emitted ETag may be 484 implemented the same as the Instance Digest for simplicity, but there 485 is no need to specify how the ETag is generated, just that it needs 486 to be shared among the mirror servers. If the mirror server provides 487 neither synchronized ETag or Instance Digest, then early detection of 488 mismatches is not possible unless file length also differs. Finally, 489 the error is still detectable, after the download has completed, when 490 the merged response is verified. 492 ETag can not be used for verifying the integrity of the received 493 content. But it is a guarantee issued by the Metalink server that 494 the content is correct for that ETag. And if the ETag given by the 495 mirror server matches the ETag given by the master server, then we 496 have a chain of trust where the master server authorizes these 497 responses as valid for that object. 499 This guarantees that a mismatch will be detected by using only the 500 synchronized ETag from a master server and mirror server, even 501 alerted by the mirror servers themselves by responding with an error, 502 preventing accidental merges of ranges from different versions of 503 files with the same name. This even includes many malicious attacks 504 where the data on the mirror has been replaced by some other file, 505 but not all. 507 Synchronized ETag can not strictly protect against malicious attacks 508 or server or network errors replacing content, but neither can 509 Instance Digest on the mirror servers as the attacker most certainly 510 can make the server seemingly respond with the expected Instance 511 Digest even if the file contents have been modified, just as he can 512 with ETag, and the same for various system failures also causing bad 513 data to be returned. The Metalink client has to rely on the Instance 514 Digest returned by the Metalink master server in the first response 515 for the verification of the downloaded object as a whole. 517 If the mirror servers do return an Instance Digest, then that is a 518 bonus, just as having them return the right set of Link headers is. 519 The set of trusted mirrors doing that can be substituted as master 520 servers accepting the initial request if one likes. 522 The benefit of having slave mirror servers (those not trusted as 523 masters) return Instance Digest is that the client then can detect 524 mismatches early even if ETag is not used. Both ETag and slave 525 mirror Instance Digest do provide value, but just one is sufficient 526 for early detection of mismatches. If none is provided then early 527 detection of mismatches is not possible unless the file length also 528 differs, but the error is still detected when the merged response is 529 verified. 531 7.1.2. Error Correction 533 If the object checksum does not match the Instance Digest then fetch 534 the Metalink/XML or other recovery profile link, where partial file 535 checksums can be found, allowing detection of which server returned 536 bad information. If the Instance Digest computation does not match 537 then the client needs to fetch the partial file checksums and from 538 there figure out what of the downloaded data can be recovered and 539 what needs to be fetched again. If no partial checksums are 540 available, then the client MUST fetch the complete object from a 541 trusted Metalink server. 543 Partial file checksums can be used to detect errors during the 544 download. 546 8. Multi-server Performance 548 When opting to download simultaneously from multiple mirrors, there 549 are a number of factors (both within and outside the influence of the 550 client software) that are relevant to the performance achieved: 552 o The number of servers used simultaneously. 553 o The ability to pipeline sufficient or sufficiently large range 554 requests to each server so as to avoid connections going idle. 555 o The ability to pipeline sufficiently few or sufficiently small 556 range requests to servers so that all the servers finish their 557 final chunks simultaneously. 558 o The ability to switch between mirrors dynamically so as to use the 559 fastest mirrors at any moment in time 561 Obviously we do not want to use too many simultaneous connections, or 562 other traffic sharing a bottleneck link will be starved. But at the 563 same time, good performance requires that the client can 564 simultaneously download from at least one fast mirror while exploring 565 whether any other mirror is faster. Based on laboratory experiments, 566 we suggest a good default number of simultaneous connections is 567 probably four, with three of these being used for the best three 568 mirrors found so far, and one being used to evaluate whether any 569 other mirror might offer better performance. 571 The size of chunks chosen by the client should be sufficiently large 572 that the chunk request headers and reponse headers represent neglible 573 overhead, and sufficiently large that they can be pipelined 574 effectively without needing a very high rate of chunk requests. At 575 the same time, the amount of time wasted waiting for the last chunk 576 to download from the last server after all the other servers have 577 finished should be minimized. Thus we currently recommend that a 578 chunk size of at least 10KBytes should be used. If the file being 579 transfered is very large, or the download speed very high, this can 580 be increased to perhaps 1MByte. As network bandwidths increase, we 581 expect these numbers to increase appropriately, so that the time to 582 transfer a chunk remains significantly larger than the latency of 583 requesting a chunk from a server. 585 9. IANA Considerations 587 Accordingly, IANA has made the following registration to the Link 588 Relation Type registry. 590 o Relation Name: duplicate 592 o Description: Refers to a resource whose available representations 593 are byte-for-byte identical with the corresponding representations of 594 the context IRI. 596 o Reference: This specification. 598 o Notes: This relation is for static resources. That is, an HTTP GET 599 request on any duplicate will return the same representation. It 600 does not make sense for dynamic or POSTable resources and should not 601 be used for them. 603 10. Security Considerations 604 10.1. URIs and IRIs 606 Metalink clients handle URIs and IRIs. See Section 7 of [RFC3986] 607 and Section 8 of [RFC3987] for security considerations related to 608 their handling and use. 610 10.2. Spoofing 612 There is potential for spoofing attacks where the attacker publishes 613 Metalinks with false information. In that case, this could deceive 614 unaware downloaders that they are downloading a malicious or 615 worthless file. Also, malicious publishers could attempt a 616 distributed denial of service attack by inserting unrelated IRIs into 617 Metalinks. 619 10.3. Cryptographic Hashes 621 Currently, some of the digest values defined in Instance Digests in 622 HTTP [RFC3230] are considered insecure. These include the whole 623 Message Digest family of algorithms which are not suitable for 624 cryptographically strong verification. Malicious people could 625 provide files that appear to be identical to another file because of 626 a collision, i.e. the weak cryptographic hashes of the intended file 627 and a substituted malicious file could match. 629 If a Metalink contains whole file hashes as described in Section 6, 630 it SHOULD include "sha" which is SHA-1, as specified in [RFC3174], or 631 stronger. It MAY also include other hashes. 633 10.4. Signing 635 Metalinks should include digital signatures, as described in 636 Section 5. 638 Digital signatures provide authentication, message integrity, and 639 non-repudiation with proof of origin. 641 11. References 643 11.1. Normative References 645 [ISO3166-1] 646 International Organization for Standardization, "ISO 3166- 647 1:2006. Codes for the representation of names of 648 countries and their subdivisions -- Part 1: Country 649 codes", November 2006. 651 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 652 Requirement Levels", BCP 14, RFC 2119, March 1997. 654 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 655 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 656 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 658 [RFC3174] Eastlake, D. and P. Jones, "US Secure Hash Algorithm 1 659 (SHA1)", RFC 3174, September 2001. 661 [RFC3230] Mogul, J. and A. Van Hoff, "Instance Digests in HTTP", 662 RFC 3230, January 2002. 664 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 665 Resource Identifier (URI): Generic Syntax", STD 66, 666 RFC 3986, January 2005. 668 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 669 Identifiers (IRIs)", RFC 3987, January 2005. 671 [draft-nottingham-http-link-header] 672 Nottingham, M., "Web Linking", 673 draft-nottingham-http-link-header-06 (work in progress), 674 July 2009. 676 11.2. Informative References 678 [draft-bryan-metalink] 679 Bryan, A., Ed., Tsujikawa, T., McNab, N., and P. Poeml, 680 "The Metalink Download Description Format", 681 draft-bryan-metalink-16 (work in progress), August 2009. 683 Appendix A. Acknowledgements and Contributors 685 Thanks to the Metalink community, Mark Handley, Mark Nottingham, 686 Daniel Stenberg, Tatsuhiro Tsujikawa, Peter Poeml, and Matt Domsch. 688 Support for simultaneous download from multiple mirrors is based upon 689 work by Mark Handley and Javier Vela Diago, who also provided 690 validation of the benefits of this approach. 692 Appendix B. Comparisons to Similar Options (to be removed by RFC Editor 693 before publication) 695 This draft, compared to the Metalink/XML format 696 [draft-bryan-metalink] : 698 o (+) Reuses existing HTTP standards without much new besides a Link 699 Relation Type. It's more of a collection/coordinated feature set. 700 o (?) The existing standards don't seem to be widely implemented. 701 o (+) No XML dependency, unless we use Metalink/XML for partial file 702 checksums. 703 o (+) Existing Metalink/XML clients can be easily converted to 704 support this as well. 705 o (+) Coordination of mirror servers is preferred, but not required. 706 Coordination may be difficult or impossible unless you are in 707 control of all servers on the mirror network. 708 o (-) Requires software or configuration changes to originating 709 server. 710 o (-?) Tied to HTTP, not as generic. FTP/P2P clients won't be 711 using it unless they also support HTTP, unlike Metalink/XML. 712 o (-) Requires server-side support. Metalink/XML can be created by 713 user (or server, but server component/changes not required). 714 o (-) Also, Metalink/XML files are easily mirrored on all servers. 715 Even if usage in that case is not as transparent, it still gives 716 access to users at all mirrors (FTP included) to all download 717 information with no changes needed to the server. 718 o (-) Not portable/archivable/emailable. Metalink/XML is used to 719 import/export transfer queues. Not as easy for search engines to 720 index? 721 o (-) No way to show mirror geographical location (yet). 722 o (-) Not as rich metadata. 723 o (-) Not able to add multiple files to a download queue or create 724 directory structure. 726 Appendix C. Document History (to be removed by RFC Editor before 727 publication) 729 [[ to be removed by the RFC editor before publication as an RFC. ]] 731 Known issues concerning this draft: 732 o Use of Link header to describe Mirrors. Only send a few mirrors 733 with Link header, or only send them if Want-Digest is used? Some 734 organizations have many mirrors. 735 o Would it make more sense to use qvalue-style policies to describe 736 mirror priority, i.e. q=1.0 through q=0.0 ? 737 o Will we use Metalink/XML for partial file checksums? That would 738 add XML dependency to apps for an important feature. 739 o Do we need an official MIME type for .torrent files or allow 740 "application/x-bittorrent"? 742 -10 : October 15, 2009. 744 o Mirror coordination changes. 746 -09 : October 12, 2009. 747 o Mirror location, coordination, and depth. 748 o Split HTTP Digest Algorithm Values Registration into 749 draft-bryan-http-digest-algorithm-values-update. 751 -08 : October 4, 2009. 752 o Clarifications. 754 -07 : September 29, 2009. 755 o Preferred mirror servers. 757 -06 : September 24, 2009. 758 o Add Mismatch Detection, Error Recovery, and Digest Algorithm 759 values. 760 o Remove Content-MD5 and Want-Digest. 762 -05 : September 19, 2009. 763 o ETags, preferably matching the Instance Digests. 765 -04 : September 17, 2009. 766 o Temporarily remove .torrent. 768 -03 : September 16, 2009. 769 o Mention HEAD request, negotiate mirrors if Want-Digest is used. 771 -02 : September 6, 2009. 772 o Content-MD5 for partial file checksums. 774 -01 : September 1, 2009. 775 o Link Relation Type Registration: "duplicate" 777 -00 : August 24, 2009. 778 o Initial draft. 780 Authors' Addresses 782 Anthony Bryan 783 Pompano Beach, FL 784 USA 786 Email: anthonybryan@gmail.com 787 URI: http://www.metalinker.org 788 Neil McNab 790 Email: neil@nabber.org 791 URI: http://www.nabber.org 793 Henrik Nordstrom 795 Email: henrik@henriknordstrom.net 796 URI: http://www.henriknordstrom.net/ 798 Alan Ford 799 Roke Manor Research 800 Old Salisbury Lane 801 Romsey, Hampshire SO51 0ZN 802 UK 804 Phone: +44 1794 833 465 805 Email: alan.ford@roke.co.uk