idnits 2.17.1 draft-bryan-metalinkhttp-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 13, 2009) is 5306 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Downref: Normative reference to an Informational RFC: RFC 3174 ** Obsolete normative reference: RFC 3230 (Obsoleted by RFC 9530) Summary: 4 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Bryan 3 Internet-Draft N. McNab 4 Intended status: Standards Track H. Nordstrom 5 Expires: April 16, 2010 6 A. Ford 7 Roke Manor Research 8 October 13, 2009 10 Metalink/HTTP: Mirrors and Checksums in HTTP Headers 11 draft-bryan-metalinkhttp-09 13 Status of this Memo 15 This Internet-Draft is submitted to IETF in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on April 16, 2010. 36 Copyright Notice 38 Copyright (c) 2009 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents in effect on the date of 43 publication of this document (http://trustee.ietf.org/license-info). 44 Please review these documents carefully, as they describe your rights 45 and restrictions with respect to this document. 47 Abstract 49 This document specifies Metalink/HTTP: Mirrors and Checksums in HTTP 50 Headers, an alternative to the Metalink XML-based download 51 description format. Metalink/HTTP describes multiple download 52 locations (mirrors), Peer-to-Peer, checksums, digital signatures, and 53 other information using existing standards for HTTP headers. Clients 54 can transparently use this information to make file transfers more 55 robust and reliable. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Operation Overview . . . . . . . . . . . . . . . . . . . . 4 61 1.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 4 62 1.3. Notational Conventions . . . . . . . . . . . . . . . . . . 4 63 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 5 64 3. Mirrors / Multiple Download Locations . . . . . . . . . . . . 5 65 3.1. Mirror Priority . . . . . . . . . . . . . . . . . . . . . 6 66 3.2. Mirror Geographical Location . . . . . . . . . . . . . . . 6 67 3.3. Coordinated Mirror Policies . . . . . . . . . . . . . . . 6 68 3.4. Mirror Depth . . . . . . . . . . . . . . . . . . . . . . . 7 69 4. Peer-to-Peer / Metainfo . . . . . . . . . . . . . . . . . . . 7 70 4.1. Metalink/XML Files . . . . . . . . . . . . . . . . . . . . 7 71 5. OpenPGP Signatures . . . . . . . . . . . . . . . . . . . . . . 8 72 6. Checksums of Whole Files . . . . . . . . . . . . . . . . . . . 8 73 7. Client / Server Multi-source Download Interaction . . . . . . 8 74 7.1. Error Prevention, Detection, and Correction . . . . . . . 10 75 7.1.1. Error Prevention (Early File Mismatch Detection) . . . 11 76 7.1.2. Error Correction . . . . . . . . . . . . . . . . . . . 12 77 8. Multi-server Performance . . . . . . . . . . . . . . . . . . . 12 78 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 79 10. Security Considerations . . . . . . . . . . . . . . . . . . . 13 80 10.1. URIs and IRIs . . . . . . . . . . . . . . . . . . . . . . 13 81 10.2. Spoofing . . . . . . . . . . . . . . . . . . . . . . . . . 13 82 10.3. Cryptographic Hashes . . . . . . . . . . . . . . . . . . . 14 83 10.4. Signing . . . . . . . . . . . . . . . . . . . . . . . . . 14 84 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 85 11.1. Normative References . . . . . . . . . . . . . . . . . . . 14 86 11.2. Informative References . . . . . . . . . . . . . . . . . . 15 87 Appendix A. Acknowledgements and Contributors . . . . . . . . . . 15 88 Appendix B. Comparisons to Similar Options (to be removed by 89 RFC Editor before publication) . . . . . . . . . . . 15 90 Appendix C. Document History (to be removed by RFC Editor 91 before publication) . . . . . . . . . . . . . . . . . 16 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17 94 1. Introduction 96 Metalink/HTTP is an alternative representation of Metalink 97 information, which is usually presented as an XML-based document 98 format [draft-bryan-metalink]. Metalink/HTTP attempts to provide as 99 much functionality as the Metalink/XML format by using existing 100 standards such as Web Linking [draft-nottingham-http-link-header], 101 Instance Digests in HTTP [RFC3230], and ETags. Metalink/HTTP is used 102 to list information about a file to be downloaded. This can include 103 lists of multiple URIs (mirrors), Peer-to-Peer information, 104 checksums, and digital signatures. 106 Identical copies of a file are frequently accessible in multiple 107 locations on the Internet over a variety of protocols (such as FTP, 108 HTTP, and Peer-to-Peer). In some cases, users are shown a list of 109 these multiple download locations (mirrors) and must manually select 110 a single one on the basis of geographical location, priority, or 111 bandwidth. This distributes the load across multiple servers, and 112 should also increase throughput and resilience. At times, however, 113 individual servers can be slow, outdated, or unreachable, but this 114 can not be determined until the download has been initiated. Users 115 will rarely have sufficient information to choose the most 116 appropriate server, and will often choose the first in a list which 117 may not be optimal for their needs, and will lead to a particular 118 server getting a disproportionate share of load. The use of 119 suboptimal mirrors can lead to the user canceling and restarting the 120 download to try to manually find a better source. During downloads, 121 errors in transmission can corrupt the file. There are no easy ways 122 to repair these files. For large downloads this can be extremely 123 troublesome. Any of the number of problems that can occur during a 124 download lead to frustration on the part of users. 126 Some popular sites automate the process of selecting mirrors using 127 DNS load balancing, both to approximately balance load between 128 servers, and to direct clients to nearby servers with the hope that 129 this improves throughput. Indeed, DNS load balancing can balance 130 long-term server load fairly effectively, but it is less effective at 131 delivering the best throughput to users when the bottleneck is not 132 the server but the network. 134 This document describes a mechanism by which the benefit of mirrors 135 can be automatically and more effectively realized. All the 136 information about a download, including mirrors, checksums, digital 137 signatures, and more can be transferred in coordinated HTTP Headers. 138 This Metalink transfers the knowledge of the download server (and 139 mirror database) to the client. Clients can fallback to other 140 mirrors if the current one has an issue. With this knowledge, the 141 client is enabled to work its way to a successful download even under 142 adverse circumstances. All this is done transparently to the user 143 and the download is much more reliable and efficient. In contrast, a 144 traditional HTTP redirect to a mirror conveys only extremely minimal 145 information - one link to one server, and there is no provision in 146 the HTTP protocol to handle failures. Furthermore, in order to 147 provide better load distribution across servers and potentially 148 faster downloads to users, Metalink/HTTP facilitates multi-source 149 downloads, where portions of a file are downloaded from multiple 150 mirrors (and optionally, Peer-to-Peer) simultaneously. 152 [[ Discussion of this draft should take place on IETF HTTP WG mailing 153 list at ietf-http-wg@w3.org or the Metalink discussion mailing list 154 located at metalink-discussion@googlegroups.com. To join the list, 155 visit http://groups.google.com/group/metalink-discussion . ]] 157 1.1. Operation Overview 159 Detailed discussion of Metalink operation is covered in Section 2; 160 this section will present a very brief, high-level overview of how 161 Metalink achieves its goals. 163 Upon connection to a Metalink/HTTP server, a client will receive 164 information about other sources of the same resource and a checksum 165 of the whole resource. The client will then be able to request 166 chunks of the file from the various sources, scheduling appropriately 167 in order to maximise the download rate. 169 1.2. Examples 171 A brief Metalink server response with checksum, mirrors, .metalink, 172 and OpenPGP signature: 174 Link: ; rel="duplicate" 175 Link: ; rel="duplicate" 176 Link: ; rel="describedby"; 177 type="application/x-bittorrent" 178 Link: ; rel="describedby"; 179 type="application/metalink4+xml" 180 Link: ; rel="describedby"; 181 type="application/pgp-signature" 182 Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5= 184 1.3. Notational Conventions 186 This specification describes conformance of Metalink/HTTP. 188 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 189 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 190 document are to be interpreted as described in BCP 14, [RFC2119], as 191 scoped to those conformance targets. 193 2. Requirements 195 In this context, "Metalink" refers to Metalink/HTTP which consists of 196 mirrors and checksums in HTTP Headers as described in this document. 197 "Metalink/XML" refers to the XML format described in 198 [draft-bryan-metalink]. 200 Metalink servers are HTTP servers that use the Link header 201 [draft-nottingham-http-link-header] to present a list of mirrors of a 202 resource to a client. They MUST provide checksums of files via 203 Instance Digests in HTTP [RFC3230], whether requested or not. Mirror 204 and checksum information provided by the originating Metalink server 205 MUST be considered authoritative. Metalink servers and their 206 associated mirror servers MUST all share the same ETag policy (ETag 207 Synchronization), i.e. based on the file contents (checksum) and not 208 server-unique filesystem metadata. The emitted ETag MAY be 209 implemented the same as the Instance Digest for simplicity. 211 Mirror servers are typically FTP or HTTP servers that "mirror" 212 another server. That is, they provide identical copies of (at least 213 some) files that are also on the mirrored server. Mirror servers MAY 214 be Metalink servers. Mirror servers MUST support serving partial 215 content. Mirror servers SHOULD support Instance Digests in HTTP 216 [RFC3230]. HTTP mirror servers MUST share the same ETag policy as 217 the originating Metalink server. 219 Metalink clients use the mirrors provided by a Metalink server with 220 Link header [draft-nottingham-http-link-header]. Metalink clients 221 MUST support HTTP and MAY support FTP, BitTorrent, or other download 222 methods. Metalink clients MUST switch downloads from one mirror to 223 another if the mirror becomes unreachable. Metalink clients SHOULD 224 support multi-source, or parallel, downloads, where portions of a 225 file are downloaded from multiple mirrors simultaneously (and 226 optionally, from Peer-to-Peer sources). Metalink clients MUST 227 support Instance Digests in HTTP [RFC3230] by requesting and 228 verifying checksums. Metalink clients MAY make use of digital 229 signatures if they are offered. 231 3. Mirrors / Multiple Download Locations 233 Mirrors are specified with the Link header 234 [draft-nottingham-http-link-header] and a relation type of 235 "duplicate" as defined in Section 9. 237 A brief Metalink server response with two mirrors only: 239 Link: ; rel="duplicate"; 240 pri=1; pref=1 241 Link: ; rel="duplicate"; 242 pri=2; geo="gb"; depth=1 244 [[Some organizations have many mirrors. Only send a few mirrors, or 245 only use the Link header if Want-Digest is used?]] 247 It is up to the server to choose how many Link headers to send. Such 248 a decision could be a hard-coded limit, a random selection, based on 249 file size, or based on server load. 251 3.1. Mirror Priority 253 Mirror servers are listed in order of priority (from most preferred 254 to least) or have a "pri" value, where mirrors with lower values are 255 used first. 257 This is purely an expression of the server's preferences; it is up to 258 the client what it does with this information, particularly with 259 reference to how many servers to use at any one time. A client MUST 260 respect the server's priority ordering, however. 262 [[Would it make more sense to use qvalue-style policies here, i.e. 263 q=1.0 through q=0.0 ?]] 265 3.2. Mirror Geographical Location 267 Mirror servers MAY have a "geo" value, which is a [ISO3166-1] alpha-2 268 two letter country code for the geographical location of the physical 269 server the IRI is used to access. A client may use this information 270 to select a mirror, or set of mirrors, that are geographically near 271 (if the client has access to such information), with the aim of 272 reducing network load at inter-country bottlenecks. 274 3.3. Coordinated Mirror Policies 276 There are two types of mirror servers: preferred and normal. 277 Optimally, HTTP mirror servers will share the same ETag policy as the 278 Metalink server, provide Instance Digests, or both. These mirrors 279 are preferred, and make it possible to detect early on, before data 280 is transferred, if the file requested matches the desired file. 281 Preferred mirror servers MUST share the same ETag policy or MUST 282 support Instance Digests. Preferred HTTP mirror servers have a 283 "pref" value of 1. 285 [[Suggestion to relax earlier MUSTs: In order for clients to identify 286 servers that have coordinated ETag policies, the ETag MUST begin with 287 "Metalink:", e.g. 289 ETag: "Metalink:SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=" 291 ]] 293 3.4. Mirror Depth 295 Some mirrors may mirror single files, whole directories, or multiple 296 directories. 298 Mirror servers MAY have a "depth" value, where "depth=0" is the 299 default. A value of 0 means ONLY that file is mirrored. A value of 300 1 means that file and all other files and subdirectories in the 301 directory are mirrored. A value of 2 means the directory above, and 302 all files and subdirectories, are mirrored. 304 A mirror with a depth value of 4: 306 Link: ; 307 rel="duplicate"; pri=1; pref=1; depth=4 309 Is the above example, 4 directories up are mirrored, from /dir2/ on 310 down. 312 4. Peer-to-Peer / Metainfo 314 Metainfo files, which describe ways to download a file over Peer-to- 315 Peer networks or otherwise, are specified with the Link header 316 [draft-nottingham-http-link-header] and a relation type of 317 "describedby" and a type parameter that indicates the MIME type of 318 the metadata available at the IRI. 320 A brief Metalink server response with .torrent and .metalink: 322 Link: ; rel="describedby"; 323 type="application/x-bittorrent" 324 Link: ; rel="describedby"; 325 type="application/metalink4+xml" 327 4.1. Metalink/XML Files 329 Full Metalink/XML files for a given resource can be specified as 330 shown in Section 4. This is particularly useful for providing 331 metadata such as checksums of chunks, allowing a client to recover 332 from partial errors (see Section 7.1.2). 334 5. OpenPGP Signatures 336 OpenPGP signatures are specified with the Link header 337 [draft-nottingham-http-link-header] and a relation type of 338 "describedby" and a type parameter of "application/pgp-signature". 340 A brief Metalink server response with OpenPGP signature only: 342 Link: ; rel="describedby"; 343 type="application/pgp-signature" 345 6. Checksums of Whole Files 347 Metalink servers MUST provide Instance Digests in HTTP [RFC3230] for 348 files they describe with mirrors. Mirror servers SHOULD as well. 350 A brief Metalink server response with checksum: 352 Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5= 354 7. Client / Server Multi-source Download Interaction 356 Metalink clients begin a download with a standard HTTP [RFC2616] GET 357 request to the Metalink server. A Range limit is optional, not 358 required. 360 GET /distribution/example.ext HTTP/1.1 361 Host: www.example.com 363 The Metalink server responds with the data and these headers: 365 HTTP/1.1 200 OK 366 Accept-Ranges: bytes 367 Content-Length: 14867603 368 Content-Type: application/x-cd-image 369 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 370 Link: ; rel="duplicate" 371 Link: ; rel="duplicate" 372 Link: ; rel="describedby"; 373 type="application/x-bittorrent" 374 Link: ; rel="describedby"; 375 type="application/metalink4+xml" 376 Link: ; rel="describedby"; 377 type="application/pgp-signature" 378 Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5= 380 From the Metalink server response the client learns some or all of 381 the following metadata about the requested object, in addition to 382 also starting to receive the object: 384 o Mirror profile link. 385 o Instance Digest. 386 o Object size. 387 o ETag. 388 o Peer-to-peer information. 389 o Digital signature. 390 o Metalink/XML, which can include partial file checksums to repair a 391 file. 393 (Alternatively, the client could have requested a HEAD only, and then 394 skipped to making the following decisions on every available mirror 395 server found via the Link headers) 397 If the object is large and gets delivered slower than expected then 398 the Metalink client starts a number of parallel ranged downloads (one 399 per selected mirror server other than the first) using mirrors 400 provided by the Link header with "duplicate" relation type, using the 401 location of the original GET request in the "Referer" header field. 402 The size and number of ranges requested from each server is for the 403 client to decide, based upon the performance observed from each 404 server. Further discussion of performance considerations is 405 presented in Section 8. 407 If no Range limit was given in the original request then work from 408 the tail of the object (the first request is still running and will 409 eventually catch up), otherwise continue after the range requested in 410 the first request. If no Range was provided, the original connection 411 must be terminated once all parts of the resource have been 412 retrieved. It is recommended that a HEAD request is undertaken 413 first, so that the client can find out if there are any Link headers, 414 and then Range-based requests are undertaken to the mirror servers as 415 well as on the original connection. 417 If ETags are coordinated between mirrors, If-Match conditions based 418 on the ETag SHOULD be used to quickly detect out-of-date mirrors by 419 using the ETag from the Metalink server response. If no indication 420 of ETag syncronisation/knowledge is given then If-Match should not be 421 used, and optimally there will be an Instance Digest in the mirror 422 response which we can use to detect a mismatch early, and if not then 423 a mismatch won't be detected until the completed object is verified. 424 One of the client requests to a mirror server: 426 GET /example.ext HTTP/1.1 427 Host: www2.example.com 428 Range: bytes=7433802- 429 If-Match: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 430 Referer: http://www.example.com/distribution/example.ext 432 The mirror servers respond with a 206 Partial Content HTTP status 433 code and appropriate "Content-Length" and "Content Range" header 434 fields. The mirror server response, with data, to the above request: 436 HTTP/1.1 206 Partial Content 437 Accept-Ranges: bytes 438 Content-Length: 7433801 439 Content-Range: bytes 7433802-14867602/14867603 440 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 441 Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5= 443 If the first request was not Range limited then abort it by closing 444 the connection when it catches up with the other parallel downloads 445 of the same object. 447 Downloads from mirrors that do not have the same file size as the 448 Metalink server MUST be aborted. 450 Once the download has completed, the Metalink client MUST verify the 451 checksum of the file. 453 7.1. Error Prevention, Detection, and Correction 455 Error prevention, or early file mismatch detection, is possible 456 before file transfers with the use of file sizes, ETags, and Instance 457 Digests. Error dectection requires Instance Digests, or checksums, 458 to determine after transfers if there has been an error. Error 459 correction, or download repair, is possible with partial file 460 checksums. 462 7.1.1. Error Prevention (Early File Mismatch Detection) 464 In HTTP terms, the requirement is that merging of ranges from 465 multiple responses must be verified with a strong validator, which in 466 this context is the same as either Instance Digest or a strong ETag. 467 In most cases it is sufficient that the Metalink server provides 468 mirrors and Instance Digest information, but operation will be more 469 robust and efficient if the mirror servers do implement a 470 synchronized ETag as well. In fact, the emitted ETag may be 471 implemented the same as the Instance Digest for simplicity, but there 472 is no need to specify how the ETag is generated, just that it needs 473 to be shared among the mirror servers. If the mirror server provides 474 neither synchronized ETag or Instance Digest, then early detection of 475 mismatches is not possible unless file length also differs. Finally, 476 the error is still detectable, after the download has completed, when 477 the merged response is verified. 479 ETag can not be used for verifying the integrity of the received 480 content. But it is a guarantee issued by the Metalink server that 481 the content is correct for that ETag. And if the ETag given by the 482 mirror server matches the ETag given by the master server, then we 483 have a chain of trust where the master server authorizes these 484 responses as valid for that object. 486 This guarantees that a mismatch will be detected by using only the 487 synchronized ETag from a master server and mirror server, even 488 alerted by the mirror servers themselves by responding with an error, 489 preventing accidental merges of ranges from different versions of 490 files with the same name. This even includes many malicious attacks 491 where the data on the mirror has been replaced by some other file, 492 but not all. 494 Synchronized ETag can not strictly protect against malicious attacks 495 or server or network errors replacing content, but neither can 496 Instance Digest on the mirror servers as the attacker most certainly 497 can make the server seemingly respond with the expected Instance 498 Digest even if the file contents have been modified, just as he can 499 with ETag, and the same for various system failures also causing bad 500 data to be returned. The Metalink client has to rely on the Instance 501 Digest returned by the Metalink master server in the first response 502 for the verification of the downloaded object as a whole. 504 If the mirror servers do return an Instance Digest, then that is a 505 bonus, just as having them return the right set of Link headers is. 506 The set of trusted mirrors doing that can be substituted as master 507 servers accepting the initial request if one likes. 509 The benefit of having slave mirror servers (those not trusted as 510 masters) return Instance Digest is that the client then can detect 511 mismatches early even if ETag is not used. Both ETag and slave 512 mirror Instance Digest do provide value, but just one is sufficient 513 for early detection of mismatches. If none is provided then early 514 detection of mismatches is not possible unless the file length also 515 differs, but the error is still detected when the merged response is 516 verified. 518 7.1.2. Error Correction 520 If the object checksum does not match the Instance Digest then fetch 521 the Metalink/XML or other recovery profile link, where partial file 522 checksums can be found, allowing detection of which server returned 523 bad information. If the Instance Digest computation does not match 524 then the client needs to fetch the partial file checksums and from 525 there figure out what of the downloaded data can be recovered and 526 what needs to be fetched again. If no partial checksums are 527 available, then the client MUST fetch the complete object from a 528 trusted Metalink server. 530 Partial file checksums can be used to detect errors during the 531 download. 533 8. Multi-server Performance 535 When opting to download simultaneously from multiple mirrors, there 536 are a number of factors (both within and outside the influence of the 537 client software) that are relevant to the performance achieved: 539 o The number of servers used simultaneously. 540 o The ability to pipeline sufficient or sufficiently large range 541 requests to each server so as to avoid connections going idle. 542 o The ability to pipeline sufficiently few or sufficiently small 543 range requests to servers so that all the servers finish their 544 final chunks simultaneously. 545 o The ability to switch between mirrors dynamically so as to use the 546 fastest mirrors at any moment in time 548 Obviously we do not want to use too many simultaneous connections, or 549 other traffic sharing a bottleneck link will be starved. But at the 550 same time, good performance requires that the client can 551 simultaneously download from at least one fast mirror while exploring 552 whether any other mirror is faster. Based on laboratory experiments, 553 we suggest a good default number of simultaneous connections is 554 probably four, with three of these being used for the best three 555 mirrors found so far, and one being used to evaluate whether any 556 other mirror might offer better performance. 558 The size of chunks chosen by the client should be sufficiently large 559 that the chunk request headers and reponse headers represent neglible 560 overhead, and sufficiently large that they can be pipelined 561 effectively without needing a very high rate of chunk requests. At 562 the same time, the amount of time wasted waiting for the last chunk 563 to download from the last server after all the other servers have 564 finished should be minimized. Thus we currently recommend that a 565 chunk size of at least 10KBytes should be used. If the file being 566 transfered is very large, or the download speed very high, this can 567 be increased to perhaps 1MByte. As network bandwidths increase, we 568 expect these numbers to increase appropriately, so that the time to 569 transfer a chunk remains significantly larger than the latency of 570 requesting a chunk from a server. 572 9. IANA Considerations 574 Accordingly, IANA has made the following registration to the Link 575 Relation Type registry. 577 o Relation Name: duplicate 579 o Description: Refers to a resource whose available representations 580 are byte-for-byte identical with the corresponding representations of 581 the context IRI. 583 o Reference: This specification. 585 o Notes: This relation is for static resources. That is, an HTTP GET 586 request on any duplicate will return the same representation. It 587 does not make sense for dynamic or POSTable resources and should not 588 be used for them. 590 10. Security Considerations 592 10.1. URIs and IRIs 594 Metalink clients handle URIs and IRIs. See Section 7 of [RFC3986] 595 and Section 8 of [RFC3987] for security considerations related to 596 their handling and use. 598 10.2. Spoofing 600 There is potential for spoofing attacks where the attacker publishes 601 Metalinks with false information. In that case, this could deceive 602 unaware downloaders that they are downloading a malicious or 603 worthless file. Also, malicious publishers could attempt a 604 distributed denial of service attack by inserting unrelated IRIs into 605 Metalinks. 607 10.3. Cryptographic Hashes 609 Currently, some of the digest values defined in Instance Digests in 610 HTTP [RFC3230] are considered insecure. These include the whole 611 Message Digest family of algorithms which are not suitable for 612 cryptographically strong verification. Malicious people could 613 provide files that appear to be identical to another file because of 614 a collision, i.e. the weak cryptographic hashes of the intended file 615 and a substituted malicious file could match. 617 If a Metalink contains whole file hashes as described in Section 6, 618 it SHOULD include "sha" which is SHA-1, as specified in [RFC3174], or 619 stronger. It MAY also include other hashes. 621 10.4. Signing 623 Metalinks should include digital signatures, as described in 624 Section 5. 626 Digital signatures provide authentication, message integrity, and 627 non-repudiation with proof of origin. 629 11. References 631 11.1. Normative References 633 [ISO3166-1] 634 International Organization for Standardization, "ISO 3166- 635 1:2006. Codes for the representation of names of 636 countries and their subdivisions -- Part 1: Country 637 codes", November 2006. 639 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 640 Requirement Levels", BCP 14, RFC 2119, March 1997. 642 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 643 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 644 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 646 [RFC3174] Eastlake, D. and P. Jones, "US Secure Hash Algorithm 1 647 (SHA1)", RFC 3174, September 2001. 649 [RFC3230] Mogul, J. and A. Van Hoff, "Instance Digests in HTTP", 650 RFC 3230, January 2002. 652 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 653 Resource Identifier (URI): Generic Syntax", STD 66, 654 RFC 3986, January 2005. 656 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 657 Identifiers (IRIs)", RFC 3987, January 2005. 659 [draft-nottingham-http-link-header] 660 Nottingham, M., "Web Linking", 661 draft-nottingham-http-link-header-06 (work in progress), 662 July 2009. 664 11.2. Informative References 666 [draft-bryan-metalink] 667 Bryan, A., Ed., Tsujikawa, T., McNab, N., and P. Poeml, 668 "The Metalink Download Description Format", 669 draft-bryan-metalink-16 (work in progress), August 2009. 671 Appendix A. Acknowledgements and Contributors 673 Thanks to the Metalink community, Mark Handley, Mark Nottingham, 674 Daniel Stenberg, Tatsuhiro Tsujikawa, Peter Poeml, and Matt Domsch. 676 Support for simultaneous download from multiple mirrors is based upon 677 work by Mark Handley and Javier Vela Diago, who also provided 678 validation of the benefits of this approach. 680 Appendix B. Comparisons to Similar Options (to be removed by RFC Editor 681 before publication) 683 This draft, compared to the Metalink/XML format 684 [draft-bryan-metalink] : 686 o (+) Reuses existing HTTP standards without much new besides a Link 687 Relation Type. It's more of a collection/coordinated feature set. 688 o (?) The existing standards don't seem to be widely implemented. 689 o (+) No XML dependency, unless we use Metalink/XML for partial file 690 checksums. 691 o (+) Existing Metalink/XML clients can be easily converted to 692 support this as well. 693 o (+) Coordination of mirror servers is preferred, but not required. 694 Coordination may be difficult or impossible unless you are in 695 control of all servers on the mirror network. 697 o (---) Requires changes to server software. 698 o (-?) Tied to HTTP, not as generic. FTP/P2P clients won't be 699 using it unless they also support HTTP, unlike Metalink/XML. 700 o (-) Requires server-side support. Metalink/XML can be created by 701 user (or server, but server component/changes not required). 702 o (-) Also, Metalink/XML files are easily mirrored on all servers. 703 Even if usage in that case is not as transparent, it still gives 704 access to users at all mirrors (FTP included) to all download 705 information with no changes needed to the server. 706 o (-) Not portable/archivable/emailable. Metalink/XML is used to 707 import/export transfer queues. Not as easy for search engines to 708 index? 709 o (-) No way to show mirror geographical location (yet). 710 o (-) Not as rich metadata. 711 o (-) Not able to add multiple files to a download queue or create 712 directory structure. 714 Appendix C. Document History (to be removed by RFC Editor before 715 publication) 717 [[ to be removed by the RFC editor before publication as an RFC. ]] 719 Known issues concerning this draft: 720 o Use of Link header to describe Mirrors. Only send a few mirrors 721 with Link header, or only send them if Want-Digest is used? Some 722 organizations have many mirrors. 723 o ADDRESSED IN THIS DRAFT: A way to differentiate between mirrors 724 that have synchronized ETags and those that don't. 725 o ADDRESSED IN THIS DRAFT: Do we want a way to show that whole 726 directories are mirrored, instead of individual files? 727 o Will we use Metalink/XML for partial file checksums? That would 728 add XML dependency to apps for an important feature. 729 o Do we need an official MIME type for .torrent files or allow 730 "application/x-bittorrent"? 732 -09 : October 12, 2009. 733 o Mirror location, coordination, and depth. 734 o Split HTTP Digest Algorithm Values Registration into 735 draft-bryan-http-digest-algorithm-values-update. 737 -08 : October 4, 2009. 738 o Clarifications. 740 -07 : September 29, 2009. 741 o Preferred mirror servers. 743 -06 : September 24, 2009. 745 o Add Mismatch Detection, Error Recovery, and Digest Algorithm 746 values. 747 o Remove Content-MD5 and Want-Digest. 749 -05 : September 19, 2009. 750 o ETags, preferably matching the Instance Digests. 752 -04 : September 17, 2009. 753 o Temporarily remove .torrent. 755 -03 : September 16, 2009. 756 o Mention HEAD request, negotiate mirrors if Want-Digest is used. 758 -02 : September 6, 2009. 759 o Content-MD5 for partial file checksums. 761 -01 : September 1, 2009. 762 o Link Relation Type Registration: "duplicate" 764 -00 : August 24, 2009. 765 o Initial draft. 767 Authors' Addresses 769 Anthony Bryan 770 Pompano Beach, FL 771 USA 773 Email: anthonybryan@gmail.com 774 URI: http://www.metalinker.org 776 Neil McNab 778 Email: neil@nabber.org 779 URI: http://www.nabber.org 781 Henrik Nordstrom 783 Email: henrik@henriknordstrom.net 784 URI: http://www.henriknordstrom.net/ 785 Alan Ford 786 Roke Manor Research 787 Old Salisbury Lane 788 Romsey, Hampshire SO51 0ZN 789 UK 791 Phone: +44 1794 833 465 792 Email: alan.ford@roke.co.uk