idnits 2.17.1 draft-bryan-metalinkhttp-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 22, 2009) is 5268 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Downref: Normative reference to an Informational RFC: RFC 3174 ** Obsolete normative reference: RFC 3230 (Obsoleted by RFC 9530) Summary: 4 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Bryan 3 Internet-Draft N. McNab 4 Intended status: Standards Track H. Nordstrom 5 Expires: May 26, 2010 6 A. Ford 7 Roke Manor Research 8 November 22, 2009 10 Metalink/HTTP: Mirrors and Checksums in HTTP Headers 11 draft-bryan-metalinkhttp-13 13 Abstract 15 This document specifies Metalink/HTTP: Mirrors and Checksums in HTTP 16 Headers, a different way to get information that is usually contained 17 in the Metalink XML-based download description format. Metalink/HTTP 18 describes multiple download locations (mirrors), Peer-to-Peer, 19 checksums, digital signatures, and other information using existing 20 standards for HTTP headers. Clients can transparently use this 21 information to make file transfers more robust and reliable. 23 Status of this Memo 25 This Internet-Draft is submitted to IETF in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF), its areas, and its working groups. Note that 30 other groups may also distribute working documents as Internet- 31 Drafts. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt. 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html. 44 This Internet-Draft will expire on May 26, 2010. 46 Copyright Notice 48 Copyright (c) 2009 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.1. Operation Overview . . . . . . . . . . . . . . . . . . . . 4 65 1.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 4 66 1.3. Notational Conventions . . . . . . . . . . . . . . . . . . 4 67 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 5 68 3. Mirrors / Multiple Download Locations . . . . . . . . . . . . 6 69 3.1. Mirror Priority . . . . . . . . . . . . . . . . . . . . . 6 70 3.2. Mirror Geographical Location . . . . . . . . . . . . . . . 6 71 3.3. Coordinated Mirror Policies . . . . . . . . . . . . . . . 6 72 3.4. Mirror Depth . . . . . . . . . . . . . . . . . . . . . . . 7 73 4. Peer-to-Peer / Metainfo . . . . . . . . . . . . . . . . . . . 7 74 4.1. Metalink/XML Files . . . . . . . . . . . . . . . . . . . . 8 75 5. OpenPGP Signatures . . . . . . . . . . . . . . . . . . . . . . 8 76 6. Checksums of Whole Files . . . . . . . . . . . . . . . . . . . 8 77 7. Client / Server Multi-source Download Interaction . . . . . . 8 78 7.1. Error Prevention, Detection, and Correction . . . . . . . 11 79 7.1.1. Error Prevention (Early File Mismatch Detection) . . . 11 80 7.1.2. Error Correction . . . . . . . . . . . . . . . . . . . 12 81 8. Multi-server Performance . . . . . . . . . . . . . . . . . . . 12 82 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 83 10. Security Considerations . . . . . . . . . . . . . . . . . . . 14 84 10.1. URIs and IRIs . . . . . . . . . . . . . . . . . . . . . . 14 85 10.2. Spoofing . . . . . . . . . . . . . . . . . . . . . . . . . 14 86 10.3. Cryptographic Hashes . . . . . . . . . . . . . . . . . . . 14 87 10.4. Signing . . . . . . . . . . . . . . . . . . . . . . . . . 14 88 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 89 11.1. Normative References . . . . . . . . . . . . . . . . . . . 15 90 11.2. Informative References . . . . . . . . . . . . . . . . . . 15 91 Appendix A. Acknowledgements and Contributors . . . . . . . . . . 15 92 Appendix B. Comparisons to Similar Options . . . . . . . . . . . 16 93 Appendix C. Document History . . . . . . . . . . . . . . . . . . 16 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 96 1. Introduction 98 Metalink/HTTP is an alternative representation of Metalink 99 information, which is usually presented as an XML-based document 100 format [draft-bryan-metalink]. Metalink/HTTP attempts to provide as 101 much functionality as the Metalink/XML format by using existing 102 standards such as Web Linking [draft-nottingham-http-link-header], 103 Instance Digests in HTTP [RFC3230], and ETags. Metalink/HTTP is used 104 to list information about a file to be downloaded. This can include 105 lists of multiple URIs (mirrors), Peer-to-Peer information, 106 checksums, and digital signatures. 108 Identical copies of a file are frequently accessible in multiple 109 locations on the Internet over a variety of protocols (such as FTP, 110 HTTP, and Peer-to-Peer). In some cases, users are shown a list of 111 these multiple download locations (mirrors) and must manually select 112 a single one on the basis of geographical location, priority, or 113 bandwidth. This distributes the load across multiple servers, and 114 should also increase throughput and resilience. At times, however, 115 individual servers can be slow, outdated, or unreachable, but this 116 can not be determined until the download has been initiated. Users 117 will rarely have sufficient information to choose the most 118 appropriate server, and will often choose the first in a list which 119 may not be optimal for their needs, and will lead to a particular 120 server getting a disproportionate share of load. The use of 121 suboptimal mirrors can lead to the user canceling and restarting the 122 download to try to manually find a better source. During downloads, 123 errors in transmission can corrupt the file. There are no easy ways 124 to repair these files. For large downloads this can be extremely 125 troublesome. Any of the number of problems that can occur during a 126 download lead to frustration on the part of users. 128 Some popular sites automate the process of selecting mirrors using 129 DNS load balancing, both to approximately balance load between 130 servers, and to direct clients to nearby servers with the hope that 131 this improves throughput. Indeed, DNS load balancing can balance 132 long-term server load fairly effectively, but it is less effective at 133 delivering the best throughput to users when the bottleneck is not 134 the server but the network. 136 This document describes a mechanism by which the benefit of mirrors 137 can be automatically and more effectively realized. All the 138 information about a download, including mirrors, checksums, digital 139 signatures, and more can be transferred in coordinated HTTP Headers. 140 This Metalink transfers the knowledge of the download server (and 141 mirror database) to the client. Clients can fallback to other 142 mirrors if the current one has an issue. With this knowledge, the 143 client is enabled to work its way to a successful download even under 144 adverse circumstances. All this is done transparently to the user 145 and the download is much more reliable and efficient. In contrast, a 146 traditional HTTP redirect to a mirror conveys only extremely minimal 147 information - one link to one server, and there is no provision in 148 the HTTP protocol to handle failures. Furthermore, in order to 149 provide better load distribution across servers and potentially 150 faster downloads to users, Metalink/HTTP facilitates multi-source 151 downloads, where portions of a file are downloaded from multiple 152 mirrors (and optionally, Peer-to-Peer) simultaneously. 154 [[ Discussion of this draft should take place on IETF HTTP WG mailing 155 list at ietf-http-wg@w3.org or the Metalink discussion mailing list 156 located at metalink-discussion@googlegroups.com. To join the list, 157 visit http://groups.google.com/group/metalink-discussion . ]] 159 1.1. Operation Overview 161 Detailed discussion of Metalink operation is covered in Section 2; 162 this section will present a very brief, high-level overview of how 163 Metalink achieves its goals. 165 Upon connection to a Metalink/HTTP server, a client will receive 166 information about other sources of the same resource and a checksum 167 of the whole resource. The client will then be able to request 168 chunks of the file from the various sources, scheduling appropriately 169 in order to maximise the download rate. 171 1.2. Examples 173 A brief Metalink server response with ETag, mirrors, .metalink, 174 OpenPGP signature, and whole file checksum: 176 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 177 Link: ; rel="duplicate" 178 Link: ; rel="duplicate" 179 Link: ; rel="describedby"; 180 type="application/x-bittorrent" 181 Link: ; rel="describedby"; 182 type="application/metalink4+xml" 183 Link: ; rel="describedby"; 184 type="application/pgp-signature" 185 Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5= 187 1.3. Notational Conventions 189 This specification describes conformance of Metalink/HTTP. 191 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 192 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 193 document are to be interpreted as described in BCP 14, [RFC2119], as 194 scoped to those conformance targets. 196 2. Requirements 198 In this context, "Metalink" refers to Metalink/HTTP which consists of 199 mirrors and checksums in HTTP Headers as described in this document. 200 "Metalink/XML" refers to the XML format described in 201 [draft-bryan-metalink]. 203 Metalink resources include a Link header 204 [draft-nottingham-http-link-header] to present a list of mirrors in 205 the response to a client request for the resource. The checksum of a 206 resource must be included via Instance Digests in HTTP [RFC3230]. 208 Metalink servers are HTTP servers with one or more Metalink 209 resources. Mirror and checksum information provided by the 210 originating Metalink server MUST be considered authoritative. 211 Metalink servers and their associated mirror servers SHOULD all share 212 the same ETag policy (ETag Synchronization), i.e. based on the file 213 contents (checksum) and not server-unique filesystem metadata. The 214 emitted ETag MAY be implemented the same as the Instance Digest for 215 simplicity. Metalink servers MAY offer Metalink/XML documents that 216 contain partial file checksums and other information. 218 Mirror servers are typically FTP or HTTP servers that "mirror" 219 another server. That is, they provide identical copies of (at least 220 some) files that are also on the mirrored server. Mirror servers MAY 221 be Metalink servers. Mirror servers MUST support serving partial 222 content. HTTP mirror servers SHOULD share the same ETag policy as 223 the originating Metalink server. HTTP Mirror servers SHOULD support 224 Instance Digests in HTTP [RFC3230]. 226 Metalink clients use the mirrors provided by a Metalink server with 227 Link header [draft-nottingham-http-link-header]. Metalink clients 228 MUST support HTTP and MAY support FTP, BitTorrent, or other download 229 methods. Metalink clients MUST switch downloads from one mirror to 230 another if the mirror becomes unreachable. Metalink clients SHOULD 231 support multi-source, or parallel, downloads, where portions of a 232 file are downloaded from multiple mirrors simultaneously (and 233 optionally, from Peer-to-Peer sources). Metalink clients MUST 234 support Instance Digests in HTTP [RFC3230] by requesting and 235 verifying checksums. Metalink clients MAY make use of digital 236 signatures if they are offered. 238 3. Mirrors / Multiple Download Locations 240 Mirrors are specified with the Link header 241 [draft-nottingham-http-link-header] and a relation type of 242 "duplicate" as defined in Section 9. 244 A brief Metalink server response with two mirrors only: 246 Link: ; rel="duplicate"; 247 pri=1; pref=1 248 Link: ; rel="duplicate"; 249 pri=2; geo="gb"; depth=1 251 [[Some organizations have many mirrors. Only send a few mirrors, or 252 only use the Link header if Want-Digest is used?]] 254 It is up to the server to choose how many Link headers to send. Such 255 a decision could be a hard-coded limit, a random selection, based on 256 file size, or based on server load. 258 3.1. Mirror Priority 260 Mirror servers are listed in order of priority (from most preferred 261 to least) or have a "pri" value, where mirrors with lower values are 262 used first. 264 This is purely an expression of the server's preferences; it is up to 265 the client what it does with this information, particularly with 266 reference to how many servers to use at any one time. A client MUST 267 respect the server's priority ordering, however. 269 [[Would it make more sense to use qvalue-style policies here, i.e. 270 q=1.0 through q=0.0 ?]] 272 3.2. Mirror Geographical Location 274 Mirror servers MAY have a "geo" value, which is a [ISO3166-1] alpha-2 275 two letter country code for the geographical location of the physical 276 server the URI is used to access. A client may use this information 277 to select a mirror, or set of mirrors, that are geographically near 278 (if the client has access to such information), with the aim of 279 reducing network load at inter-country bottlenecks. 281 3.3. Coordinated Mirror Policies 283 There are two types of mirror servers: preferred and normal. 284 Preferred mirror servers are HTTP mirror servers that MUST share the 285 same ETag policy as the originating Metalink server. Optimally, they 286 will do both. Preferred mirrors make it possible to detect early on, 287 before data is transferred, if the file requested matches the desired 288 file. Preferred mirror servers Preferred HTTP mirror servers have a 289 "pref" value of 1. By default, if unspecified then mirrors are 290 considered "normal" and do not share the same ETag policy. FTP 291 mirrors, as they do not emit ETags, MUST always be considered 292 "normal". 294 HTTP Mirror servers SHOULD support Instance Digests in HTTP 295 [RFC3230]. 297 [[Suggestion: In order for clients to identify servers that have 298 coordinated ETag policies, the ETag MUST begin with "Metalink:", e.g. 300 ETag: "Metalink:SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=" 302 ]] 304 3.4. Mirror Depth 306 Some mirrors may mirror single files, whole directories, or multiple 307 directories. 309 Mirror servers MAY have a "depth" value, where "depth=0" is the 310 default. A value of 0 means ONLY that file is mirrored. A value of 311 1 means that file and all other files and subdirectories in the 312 directory are mirrored. A value of 2 means the directory above, and 313 all files and subdirectories, are mirrored. 315 A mirror with a depth value of 4: 317 Link: ; 318 rel="duplicate"; pri=1; pref=1; depth=4 320 Is the above example, 4 directories up are mirrored, from /dir2/ on 321 down. 323 4. Peer-to-Peer / Metainfo 325 Metainfo files, which describe ways to download a file over Peer-to- 326 Peer networks or otherwise, are specified with the Link header 327 [draft-nottingham-http-link-header] and a relation type of 328 "describedby" and a type parameter that indicates the MIME type of 329 the metadata available at the URI. 331 A brief Metalink server response with .torrent and .metalink: 333 Link: ; rel="describedby"; 334 type="application/x-bittorrent" 335 Link: ; rel="describedby"; 336 type="application/metalink4+xml" 338 Metalink clients MAY support the use of metainfo files for 339 downloading files, but that is not required. 341 4.1. Metalink/XML Files 343 Full Metalink/XML files for a given resource can be specified as 344 shown in Section 4. This is particularly useful for providing 345 metadata such as checksums of chunks, allowing a client to recover 346 from partial errors (see Section 7.1.2). 348 5. OpenPGP Signatures 350 OpenPGP signatures are specified with the Link header 351 [draft-nottingham-http-link-header] and a relation type of 352 "describedby" and a type parameter of "application/pgp-signature". 354 A brief Metalink server response with OpenPGP signature only: 356 Link: ; rel="describedby"; 357 type="application/pgp-signature" 359 Metalink clients MAY support the use of OpenPGP signatures, but that 360 is not required. 362 6. Checksums of Whole Files 364 Metalink servers MUST provide Instance Digests in HTTP [RFC3230] for 365 files they describe with mirrors. Mirror servers SHOULD as well. 367 A brief Metalink server response with checksum: 369 Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5= 371 7. Client / Server Multi-source Download Interaction 373 Metalink clients begin a download with a standard HTTP [RFC2616] GET 374 request to the Metalink server. A Range limit is optional, not 375 required. Alternatively, Metalink clients can begin with a HEAD 376 request to the Metalink server to discover mirrors via Link headers. 377 After that, the client follows with a GET request to the desired 378 mirrors. 380 GET /distribution/example.ext HTTP/1.1 381 Host: www.example.com 383 The Metalink server responds with the data and these headers: 385 HTTP/1.1 200 OK 386 Accept-Ranges: bytes 387 Content-Length: 14867603 388 Content-Type: application/x-cd-image 389 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 390 Link: ; rel="duplicate" pref=1 391 Link: ; rel="duplicate" 392 Link: ; rel="describedby"; 393 type="application/x-bittorrent" 394 Link: ; rel="describedby"; 395 type="application/metalink4+xml" 396 Link: ; rel="describedby"; 397 type="application/pgp-signature" 398 Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5= 400 From the Metalink server response the client learns some or all of 401 the following metadata about the requested object, in addition to 402 also starting to receive the object: 404 o Object size. 405 o ETag. 406 o Mirror profile link, which may describe the mirror's priority, 407 whether it shares the ETag policy of the originating Metalink 408 server, geographical location, and mirror depth. 409 o Peer-to-peer information. 410 o Metalink/XML, which can include partial file checksums to repair a 411 file. 412 o Digital signature. 413 o Instance Digest, which is the whole file checksum. 415 (Alternatively, the client could have requested a HEAD only, and then 416 skipped to making the following decisions on every available mirror 417 server found via the Link headers) 419 If the object is large and gets delivered slower than expected then 420 the Metalink client starts a number of parallel ranged downloads (one 421 per selected mirror server other than the first) using mirrors 422 provided by the Link header with "duplicate" relation type, using the 423 location of the original GET request in the "Referer" header field. 424 The size and number of ranges requested from each server is for the 425 client to decide, based upon the performance observed from each 426 server. Further discussion of performance considerations is 427 presented in Section 8. 429 If no Range limit was given in the original request then work from 430 the tail of the object (the first request is still running and will 431 eventually catch up), otherwise continue after the range requested in 432 the first request. If no Range was provided, the original connection 433 must be terminated once all parts of the resource have been 434 retrieved. It is recommended that a HEAD request is undertaken 435 first, so that the client can find out if there are any Link headers, 436 and then Range-based requests are undertaken to the mirror servers as 437 well as on the original connection. 439 Preferred mirrors have coordinated ETags, as described in 440 Section 3.3, and If-Match conditions based on the ETag SHOULD be used 441 to quickly detect out-of-date mirrors by using the ETag from the 442 Metalink server response. If no indication of ETag syncronisation/ 443 knowledge is given then If-Match should not be used, and optimally 444 there will be an Instance Digest in the mirror response which we can 445 use to detect a mismatch early, and if not then a mismatch won't be 446 detected until the completed object is verified. Early file mismatch 447 detection is described in detail in Section 7.1.1. 449 One of the client requests to a mirror server: 451 GET /example.ext HTTP/1.1 452 Host: www2.example.com 453 Range: bytes=7433802- 454 If-Match: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 455 Referer: http://www.example.com/distribution/example.ext 457 The mirror servers respond with a 206 Partial Content HTTP status 458 code and appropriate "Content-Length" and "Content Range" header 459 fields. The mirror server response, with data, to the above request: 461 HTTP/1.1 206 Partial Content 462 Accept-Ranges: bytes 463 Content-Length: 7433801 464 Content-Range: bytes 7433802-14867602/14867603 465 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 466 Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5= 468 If the first request was not Range limited then abort it by closing 469 the connection when it catches up with the other parallel downloads 470 of the same object. 472 Downloads from mirrors that do not have the same file size as the 473 Metalink server MUST be aborted. 475 Once the download has completed, the Metalink client MUST verify the 476 checksum of the file. 478 7.1. Error Prevention, Detection, and Correction 480 Error prevention, or early file mismatch detection, is possible 481 before file transfers with the use of file sizes, ETags, and Instance 482 Digests. Error dectection requires Instance Digests, or checksums, 483 to determine after transfers if there has been an error. Error 484 correction, or download repair, is possible with partial file 485 checksums. 487 7.1.1. Error Prevention (Early File Mismatch Detection) 489 In HTTP terms, the requirement is that merging of ranges from 490 multiple responses must be verified with a strong validator, which in 491 this context is the same as either Instance Digest or a strong ETag. 492 In most cases it is sufficient that the Metalink server provides 493 mirrors and Instance Digest information, but operation will be more 494 robust and efficient if the mirror servers do implement a 495 synchronized ETag as well. In fact, the emitted ETag may be 496 implemented the same as the Instance Digest for simplicity, but there 497 is no need to specify how the ETag is generated, just that it needs 498 to be shared among the mirror servers. If the mirror server provides 499 neither synchronized ETag or Instance Digest, then early detection of 500 mismatches is not possible unless file length also differs. Finally, 501 the error is still detectable, after the download has completed, when 502 the merged response is verified. 504 ETag can not be used for verifying the integrity of the received 505 content. But it is a guarantee issued by the Metalink server that 506 the content is correct for that ETag. And if the ETag given by the 507 mirror server matches the ETag given by the master server, then we 508 have a chain of trust where the master server authorizes these 509 responses as valid for that object. 511 This guarantees that a mismatch will be detected by using only the 512 synchronized ETag from a master server and mirror server, even 513 alerted by the mirror servers themselves by responding with an error, 514 preventing accidental merges of ranges from different versions of 515 files with the same name. This even includes many malicious attacks 516 where the data on the mirror has been replaced by some other file, 517 but not all. 519 Synchronized ETag can not strictly protect against malicious attacks 520 or server or network errors replacing content, but neither can 521 Instance Digest on the mirror servers as the attacker most certainly 522 can make the server seemingly respond with the expected Instance 523 Digest even if the file contents have been modified, just as he can 524 with ETag, and the same for various system failures also causing bad 525 data to be returned. The Metalink client has to rely on the Instance 526 Digest returned by the Metalink master server in the first response 527 for the verification of the downloaded object as a whole. 529 If the mirror servers do return an Instance Digest, then that is a 530 bonus, just as having them return the right set of Link headers is. 531 The set of trusted mirrors doing that can be substituted as master 532 servers accepting the initial request if one likes. 534 The benefit of having slave mirror servers (those not trusted as 535 masters) return Instance Digest is that the client then can detect 536 mismatches early even if ETag is not used. Both ETag and slave 537 mirror Instance Digest do provide value, but just one is sufficient 538 for early detection of mismatches. If none is provided then early 539 detection of mismatches is not possible unless the file length also 540 differs, but the error is still detected when the merged response is 541 verified. 543 7.1.2. Error Correction 545 Partial file checksums can be used to detect errors during the 546 download. Metalink servers are not required to offer partial file 547 checksums, but they are encouraged to do so. 549 If the object checksum does not match the Instance Digest then fetch 550 the Metalink/XML as specified in Section 4.1, where partial file 551 checksums may be found, allowing detection of which server returned 552 incorrect data. If the Instance Digest computation does not match 553 then the client needs to fetch the partial file checksums, if 554 available, and from there figure out what of the downloaded data can 555 be recovered and what needs to be fetched again. If no partial 556 checksums are available, then the client MUST fetch the complete 557 object from other mirrors. 559 8. Multi-server Performance 561 When opting to download simultaneously from multiple mirrors, there 562 are a number of factors (both within and outside the influence of the 563 client software) that are relevant to the performance achieved: 565 o The number of servers used simultaneously. 566 o The ability to pipeline sufficient or sufficiently large range 567 requests to each server so as to avoid connections going idle. 568 o The ability to pipeline sufficiently few or sufficiently small 569 range requests to servers so that all the servers finish their 570 final chunks simultaneously. 571 o The ability to switch between mirrors dynamically so as to use the 572 fastest mirrors at any moment in time 574 Obviously we do not want to use too many simultaneous connections, or 575 other traffic sharing a bottleneck link will be starved. But at the 576 same time, good performance requires that the client can 577 simultaneously download from at least one fast mirror while exploring 578 whether any other mirror is faster. Based on laboratory experiments, 579 we suggest a good default number of simultaneous connections is 580 probably four, with three of these being used for the best three 581 mirrors found so far, and one being used to evaluate whether any 582 other mirror might offer better performance. 584 The size of chunks chosen by the client should be sufficiently large 585 that the chunk request headers and reponse headers represent neglible 586 overhead, and sufficiently large that they can be pipelined 587 effectively without needing a very high rate of chunk requests. At 588 the same time, the amount of time wasted waiting for the last chunk 589 to download from the last server after all the other servers have 590 finished should be minimized. Thus we currently recommend that a 591 chunk size of at least 10KBytes should be used. If the file being 592 transfered is very large, or the download speed very high, this can 593 be increased to perhaps 1MByte. As network bandwidths increase, we 594 expect these numbers to increase appropriately, so that the time to 595 transfer a chunk remains significantly larger than the latency of 596 requesting a chunk from a server. 598 9. IANA Considerations 600 Accordingly, IANA has made the following registration to the Link 601 Relation Type registry. 603 o Relation Name: duplicate 605 o Description: Refers to a resource whose available representations 606 are byte-for-byte identical with the corresponding representations of 607 the context IRI. 609 o Reference: This specification. 611 o Notes: This relation is for static resources. That is, an HTTP GET 612 request on any duplicate will return the same representation. It 613 does not make sense for dynamic or POSTable resources and should not 614 be used for them. 616 10. Security Considerations 618 10.1. URIs and IRIs 620 Metalink clients handle URIs and IRIs. See Section 7 of [RFC3986] 621 and Section 8 of [RFC3987] for security considerations related to 622 their handling and use. 624 10.2. Spoofing 626 There is potential for spoofing attacks where the attacker publishes 627 Metalinks with false information. In that case, this could deceive 628 unaware downloaders that they are downloading a malicious or 629 worthless file. Also, malicious publishers could attempt a 630 distributed denial of service attack by inserting unrelated URIs into 631 Metalinks. 633 10.3. Cryptographic Hashes 635 Currently, some of the digest values defined in Instance Digests in 636 HTTP [RFC3230] are considered insecure. These include the whole 637 Message Digest family of algorithms which are not suitable for 638 cryptographically strong verification. Malicious people could 639 provide files that appear to be identical to another file because of 640 a collision, i.e. the weak cryptographic hashes of the intended file 641 and a substituted malicious file could match. 643 If a Metalink contains whole file hashes as described in Section 6, 644 it SHOULD include "sha" which is SHA-1, as specified in [RFC3174], or 645 stronger. It MAY also include other hashes. 647 10.4. Signing 649 Metalinks should include digital signatures, as described in 650 Section 5. 652 Digital signatures provide authentication, message integrity, and 653 non-repudiation with proof of origin. 655 11. References 656 11.1. Normative References 658 [ISO3166-1] 659 International Organization for Standardization, "ISO 3166- 660 1:2006. Codes for the representation of names of 661 countries and their subdivisions -- Part 1: Country 662 codes", November 2006. 664 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 665 Requirement Levels", BCP 14, RFC 2119, March 1997. 667 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 668 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 669 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 671 [RFC3174] Eastlake, D. and P. Jones, "US Secure Hash Algorithm 1 672 (SHA1)", RFC 3174, September 2001. 674 [RFC3230] Mogul, J. and A. Van Hoff, "Instance Digests in HTTP", 675 RFC 3230, January 2002. 677 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 678 Resource Identifier (URI): Generic Syntax", STD 66, 679 RFC 3986, January 2005. 681 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 682 Identifiers (IRIs)", RFC 3987, January 2005. 684 [draft-nottingham-http-link-header] 685 Nottingham, M., "Web Linking", 686 draft-nottingham-http-link-header-06 (work in progress), 687 July 2009. 689 11.2. Informative References 691 [draft-bryan-metalink] 692 Bryan, A., Ed., Tsujikawa, T., McNab, N., and P. Poeml, 693 "The Metalink Download Description Format", 694 draft-bryan-metalink-16 (work in progress), August 2009. 696 Appendix A. Acknowledgements and Contributors 698 Thanks to the Metalink community, Mark Handley, Mark Nottingham, 699 Daniel Stenberg, Tatsuhiro Tsujikawa, Peter Poeml, Matt Domsch, Micah 700 Cowan, and David Morris. 702 Support for simultaneous download from multiple mirrors is based upon 703 work by Mark Handley and Javier Vela Diago, who also provided 704 validation of the benefits of this approach. 706 Appendix B. Comparisons to Similar Options 708 [[ to be removed by the RFC editor before publication as an RFC. ]] 710 This draft, compared to the Metalink/XML format 711 [draft-bryan-metalink] : 713 o (+) Reuses existing HTTP standards without much new besides a Link 714 Relation Type. It's more of a collection/coordinated feature set. 715 o (?) The existing standards don't seem to be widely implemented. 716 o (+) No XML dependency, except for Metalink/XML for partial file 717 checksums. 718 o (+) Existing Metalink/XML clients can be easily converted to 719 support this as well. 720 o (+) Coordination of mirror servers is preferred, but not required. 721 Coordination may be difficult or impossible unless you are in 722 control of all servers on the mirror network. 723 o (-) Requires software or configuration changes to originating 724 server. 725 o (-?) Tied to HTTP, not as generic. FTP/P2P clients won't be 726 using it unless they also support HTTP, unlike Metalink/XML. 727 o (-) Requires server-side support. Metalink/XML can be created by 728 user (or server, but server component/changes not required). 729 o (-) Also, Metalink/XML files are easily mirrored on all servers. 730 Even if usage in that case is not as transparent, it still gives 731 access to users at all mirrors (FTP included) to all download 732 information with no changes needed to the server. 733 o (-) Not portable/archivable/emailable. Metalink/XML is used to 734 import/export transfer queues. Not as easy for search engines to 735 index? 736 o (-) Not as rich metadata. 737 o (-) Not able to add multiple files to a download queue or create 738 directory structure. 740 Appendix C. Document History 742 [[ to be removed by the RFC editor before publication as an RFC. ]] 744 Known issues concerning this draft: 745 o Use of Link header to describe Mirrors. Only send a few mirrors 746 with Link header, or only send them if Want-Digest is used? Some 747 organizations have many mirrors. 749 o Would it make more sense to use qvalue-style policies to describe 750 mirror priority, i.e. q=1.0 through q=0.0 ? 751 o Using Metalink/XML for partial file checksums. That adds XML 752 dependency to apps for an important feature. Is there a better 753 method? 754 o Do we need an "official" MIME type for .torrent files or allow 755 "application/x-bittorrent"? 757 -13 : November 22, 2009. 758 o Metalink/XML for partial file checksums. 760 -12 : November 11, 2009. 761 o Clarifications. 763 -11 : October 23, 2009. 764 o Mirror changes. 766 -10 : October 15, 2009. 767 o Mirror coordination changes. 769 -09 : October 12, 2009. 770 o Mirror location, coordination, and depth. 771 o Split HTTP Digest Algorithm Values Registration into 772 draft-bryan-http-digest-algorithm-values-update. 774 -08 : October 4, 2009. 775 o Clarifications. 777 -07 : September 29, 2009. 778 o Preferred mirror servers. 780 -06 : September 24, 2009. 781 o Add Mismatch Detection, Error Recovery, and Digest Algorithm 782 values. 783 o Remove Content-MD5 and Want-Digest. 785 -05 : September 19, 2009. 786 o ETags, preferably matching the Instance Digests. 788 -04 : September 17, 2009. 789 o Temporarily remove .torrent. 791 -03 : September 16, 2009. 792 o Mention HEAD request, negotiate mirrors if Want-Digest is used. 794 -02 : September 6, 2009. 796 o Content-MD5 for partial file checksums. 798 -01 : September 1, 2009. 799 o Link Relation Type Registration: "duplicate" 801 -00 : August 24, 2009. 802 o Initial draft. 804 Authors' Addresses 806 Anthony Bryan 807 Pompano Beach, FL 808 USA 810 Email: anthonybryan@gmail.com 811 URI: http://www.metalinker.org 813 Neil McNab 815 Email: neil@nabber.org 816 URI: http://www.nabber.org 818 Henrik Nordstrom 820 Email: henrik@henriknordstrom.net 821 URI: http://www.henriknordstrom.net/ 823 Alan Ford 824 Roke Manor Research 825 Old Salisbury Lane 826 Romsey, Hampshire SO51 0ZN 827 UK 829 Phone: +44 1794 833 465 830 Email: alan.ford@roke.co.uk