idnits 2.17.1 draft-bryan-metalinkhttp-16.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 16, 2010) is 5123 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'FIPS-180-3' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 3230 (Obsoleted by RFC 9530) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Bryan 3 Internet-Draft N. McNab 4 Intended status: Standards Track H. Nordstrom 5 Expires: October 18, 2010 6 A. Ford 7 Roke Manor Research 8 April 16, 2010 10 Metalink/HTTP: Mirrors and Cryptographic Hashes in HTTP Headers 11 draft-bryan-metalinkhttp-16 13 Abstract 15 This document specifies Metalink/HTTP: Mirrors and Cryptographic 16 Hashes in HTTP Headers, a different way to get information that is 17 usually contained in the Metalink XML-based download description 18 format. Metalink/HTTP describes multiple download locations 19 (mirrors), Peer-to-Peer, cryptographic hashes, digital signatures, 20 and other information using existing standards for HTTP headers. 21 Clients can transparently use this information to make file transfers 22 more robust and reliable. 24 Status of this Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on October 18, 2010. 41 Copyright Notice 43 Copyright (c) 2010 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 1.1. Operation Overview . . . . . . . . . . . . . . . . . . . . 4 60 1.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 1.3. Notational Conventions . . . . . . . . . . . . . . . . . . 5 62 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 5 63 3. Mirrors / Multiple Download Locations . . . . . . . . . . . . 6 64 3.1. Mirror Priority . . . . . . . . . . . . . . . . . . . . . 6 65 3.2. Mirror Geographical Location . . . . . . . . . . . . . . . 6 66 3.3. Coordinated Mirror Policies . . . . . . . . . . . . . . . 7 67 3.4. Mirror Depth . . . . . . . . . . . . . . . . . . . . . . . 7 68 4. Peer-to-Peer / Metainfo . . . . . . . . . . . . . . . . . . . 8 69 4.1. Metalink/XML Files . . . . . . . . . . . . . . . . . . . . 8 70 5. OpenPGP Signatures . . . . . . . . . . . . . . . . . . . . . . 8 71 6. Cryptographic Hashes of Whole Files . . . . . . . . . . . . . 8 72 7. Client / Server Multi-source Download Interaction . . . . . . 9 73 7.1. Error Prevention, Detection, and Correction . . . . . . . 11 74 7.1.1. Error Prevention (Early File Mismatch Detection) . . . 11 75 7.1.2. Error Correction . . . . . . . . . . . . . . . . . . . 12 76 8. Multi-server Performance . . . . . . . . . . . . . . . . . . . 13 77 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 78 10. Security Considerations . . . . . . . . . . . . . . . . . . . 14 79 10.1. URIs and IRIs . . . . . . . . . . . . . . . . . . . . . . 14 80 10.2. Spoofing . . . . . . . . . . . . . . . . . . . . . . . . . 14 81 10.3. Cryptographic Hashes . . . . . . . . . . . . . . . . . . . 14 82 10.4. Signing . . . . . . . . . . . . . . . . . . . . . . . . . 15 83 11. Normative References . . . . . . . . . . . . . . . . . . . . . 15 84 Appendix A. Acknowledgements and Contributors . . . . . . . . . . 16 85 Appendix B. Comparisons to Similar Options . . . . . . . . . . . 16 86 Appendix C. Document History . . . . . . . . . . . . . . . . . . 17 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 89 1. Introduction 91 Metalink/HTTP is an alternative representation of Metalink 92 information, which is usually presented as an XML-based document 93 format [draft-bryan-metalink]. Metalink/HTTP attempts to provide as 94 much functionality as the Metalink/XML format by using existing 95 standards such as Web Linking [draft-nottingham-http-link-header], 96 Instance Digests in HTTP [RFC3230], and ETags. Metalink/HTTP is used 97 to list information about a file to be downloaded. This can include 98 lists of multiple URIs (mirrors), Peer-to-Peer information, 99 cryptographic hashes, and digital signatures. 101 Identical copies of a file are frequently accessible in multiple 102 locations on the Internet over a variety of protocols (such as FTP, 103 HTTP, and Peer-to-Peer). In some cases, users are shown a list of 104 these multiple download locations (mirrors) and must manually select 105 a single one on the basis of geographical location, priority, or 106 bandwidth. This distributes the load across multiple servers, and 107 should also increase throughput and resilience. At times, however, 108 individual servers can be slow, outdated, or unreachable, but this 109 can not be determined until the download has been initiated. Users 110 will rarely have sufficient information to choose the most 111 appropriate server, and will often choose the first in a list which 112 may not be optimal for their needs, and will lead to a particular 113 server getting a disproportionate share of load. The use of 114 suboptimal mirrors can lead to the user canceling and restarting the 115 download to try to manually find a better source. During downloads, 116 errors in transmission can corrupt the file. There are no easy ways 117 to repair these files. For large downloads this can be extremely 118 troublesome. Any of the number of problems that can occur during a 119 download lead to frustration on the part of users. 121 Some popular sites automate the process of selecting mirrors using 122 DNS load balancing, both to approximately balance load between 123 servers, and to direct clients to nearby servers with the hope that 124 this improves throughput. Indeed, DNS load balancing can balance 125 long-term server load fairly effectively, but it is less effective at 126 delivering the best throughput to users when the bottleneck is not 127 the server but the network. 129 This document describes a mechanism by which the benefit of mirrors 130 can be automatically and more effectively realized. All the 131 information about a download, including mirrors, cryptographic 132 hashes, digital signatures, and more can be transferred in 133 coordinated HTTP Headers. This Metalink transfers the knowledge of 134 the download server (and mirror database) to the client. Clients can 135 fallback to other mirrors if the current one has an issue. With this 136 knowledge, the client is enabled to work its way to a successful 137 download even under adverse circumstances. All this is done 138 transparently to the user and the download is much more reliable and 139 efficient. In contrast, a traditional HTTP redirect to a mirror 140 conveys only extremely minimal information - one link to one server, 141 and there is no provision in the HTTP protocol to handle failures. 142 Furthermore, in order to provide better load distribution across 143 servers and potentially faster downloads to users, Metalink/HTTP 144 facilitates multi-source downloads, where portions of a file are 145 downloaded from multiple mirrors (and optionally, Peer-to-Peer) 146 simultaneously. 148 [[ Discussion of this draft should take place on IETF HTTP WG mailing 149 list at ietf-http-wg@w3.org or the Metalink discussion mailing list 150 located at metalink-discussion@googlegroups.com. To join the list, 151 visit http://groups.google.com/group/metalink-discussion . ]] 153 1.1. Operation Overview 155 Detailed discussion of Metalink operation is covered in Section 2; 156 this section will present a very brief, high-level overview of how 157 Metalink achieves its goals. 159 Upon connection to a Metalink/HTTP server, a client will receive 160 information about other sources of the same resource and a 161 cryptographic hash of the whole resource. The client will then be 162 able to request chunks of the file from the various sources, 163 scheduling appropriately in order to maximise the download rate. 165 1.2. Examples 167 A brief Metalink server response with ETag, mirrors, .metalink, 168 OpenPGP signature, and a cryptographic hash of the whole file: 170 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 171 Link: ; rel="duplicate" 172 Link: ; rel="duplicate" 173 Link: ; rel="describedby"; 174 type="application/x-bittorrent" 175 Link: ; rel="describedby"; 176 type="application/metalink4+xml" 177 Link: ; rel="describedby"; 178 type="application/pgp-signature" 179 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 180 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 182 1.3. Notational Conventions 184 This specification describes conformance of Metalink/HTTP. 186 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 187 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 188 document are to be interpreted as described in BCP 14, [RFC2119], as 189 scoped to those conformance targets. 191 2. Requirements 193 In this context, "Metalink" refers to Metalink/HTTP which consists of 194 mirrors and cryptographic hashes in HTTP Headers as described in this 195 document. "Metalink/XML" refers to the XML format described in 196 [draft-bryan-metalink]. 198 Metalink resources include a Link header 199 [draft-nottingham-http-link-header] to present a list of mirrors in 200 the response to a client request for the resource. The cryptographic 201 hash of a resource must be included via Instance Digests in HTTP 202 [RFC3230]. 204 Metalink servers are HTTP servers with one or more Metalink 205 resources. Mirror and cryptographic hash information provided by the 206 originating Metalink server MUST be considered authoritative. 207 Metalink servers and their associated mirror servers SHOULD all share 208 the same ETag policy (ETag Synchronization), i.e. based on the file 209 contents (cryptographic hash) and not server-unique filesystem 210 metadata. The emitted ETag MAY be implemented the same as the 211 Instance Digest for simplicity. Metalink servers MAY offer Metalink/ 212 XML documents that contain cryptographic hashes of parts of the file 213 and other information. 215 Mirror servers are typically FTP or HTTP servers that "mirror" 216 another server. That is, they provide identical copies of (at least 217 some) files that are also on the mirrored server. Mirror servers MAY 218 be Metalink servers. Mirror servers MUST support serving partial 219 content. HTTP mirror servers SHOULD share the same ETag policy as 220 the originating Metalink server. HTTP Mirror servers SHOULD support 221 Instance Digests in HTTP [RFC3230]. 223 Metalink clients use the mirrors provided by a Metalink server with 224 Link header [draft-nottingham-http-link-header]. Metalink clients 225 MUST support HTTP and MAY support FTP, BitTorrent, or other download 226 methods. Metalink clients MUST switch downloads from one mirror to 227 another if the mirror becomes unreachable. Metalink clients SHOULD 228 support multi-source, or parallel, downloads, where portions of a 229 file are downloaded from multiple mirrors simultaneously (and 230 optionally, from Peer-to-Peer sources). Metalink clients MUST 231 support Instance Digests in HTTP [RFC3230] by requesting and 232 verifying cryptographic hashes. Metalink clients MAY make use of 233 digital signatures if they are offered. 235 3. Mirrors / Multiple Download Locations 237 Mirrors are specified with the Link header 238 [draft-nottingham-http-link-header] and a relation type of 239 "duplicate" as defined in Section 9. 241 A brief Metalink server response with two mirrors only: 243 Link: ; rel="duplicate"; 244 pri=1; pref=1 245 Link: ; rel="duplicate"; 246 pri=2; geo="gb"; depth=1 248 [[Some organizations have many mirrors. Only send a few mirrors, or 249 only use the Link header if Want-Digest is used?]] 251 It is up to the server to choose how many Link headers to send. Such 252 a decision could be a hard-coded limit, a random selection, based on 253 file size, or based on server load. 255 3.1. Mirror Priority 257 Mirror servers are listed in order of priority (from most preferred 258 to least) or have a "pri" value, where mirrors with lower values are 259 used first. 261 This is purely an expression of the server's preferences; it is up to 262 the client what it does with this information, particularly with 263 reference to how many servers to use at any one time. A client MUST 264 respect the server's priority ordering, however. 266 [[Would it make more sense to use qvalue-style policies here, i.e. 267 q=1.0 through q=0.0 ?]] 269 3.2. Mirror Geographical Location 271 Mirror servers MAY have a "geo" value, which is a [ISO3166-1] alpha-2 272 two letter country code for the geographical location of the physical 273 server the URI is used to access. A client may use this information 274 to select a mirror, or set of mirrors, that are geographically near 275 (if the client has access to such information), with the aim of 276 reducing network load at inter-country bottlenecks. 278 3.3. Coordinated Mirror Policies 280 There are two types of mirror servers: preferred and normal. 281 Preferred mirror servers are HTTP mirror servers that MUST share the 282 same ETag policy as the originating Metalink server. Preferred 283 mirrors make it possible to detect early on, before data is 284 transferred, if the file requested matches the desired file. 285 Preferred HTTP mirror servers have a "pref" value of 1. By default, 286 if unspecified then mirrors are considered "normal" and do not share 287 the same ETag policy. FTP mirrors, as they do not emit ETags, MUST 288 always be considered "normal". ([draft-bryan-ftp-hash] allows for FTP 289 mirrors to be coordinated and provide file hashes). 291 HTTP Mirror servers SHOULD support Instance Digests in HTTP 292 [RFC3230]. Optimally, mirror servers will share the same ETag policy 293 and support Instance Digests in HTTP. 295 [[Suggestion: In order for clients to identify servers that have 296 coordinated ETag policies, the ETag MUST begin with "Metalink:", e.g. 298 ETag: "Metalink:SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=" 300 ]] 302 3.4. Mirror Depth 304 Some mirrors may mirror single files, whole directories, or multiple 305 directories. 307 Mirror servers MAY have a "depth" value, where "depth=0" is the 308 default. A value of 0 means ONLY that file is mirrored. A value of 309 1 means that file and all other files and subdirectories in the 310 directory are mirrored. A value of 2 means the directory above, and 311 all files and subdirectories, are mirrored. 313 A mirror with a depth value of 4: 315 Link: ; 316 rel="duplicate"; pri=1; pref=1; depth=4 318 Is the above example, 4 directories up are mirrored, from /dir2/ on 319 down. 321 4. Peer-to-Peer / Metainfo 323 Metainfo files, which describe ways to download a file over Peer-to- 324 Peer networks or otherwise, are specified with the Link header 325 [draft-nottingham-http-link-header] and a relation type of 326 "describedby" and a type parameter that indicates the MIME type of 327 the metadata available at the URI. 329 A brief Metalink server response with .torrent and .metalink: 331 Link: ; rel="describedby"; 332 type="application/x-bittorrent" 333 Link: ; rel="describedby"; 334 type="application/metalink4+xml" 336 Metalink clients MAY support the use of metainfo files for 337 downloading files. 339 4.1. Metalink/XML Files 341 Full Metalink/XML files for a given resource can be specified as 342 shown in Section 4. This is particularly useful for providing 343 metadata such as cryptographic hashes of parts of a file, allowing a 344 client to recover from partial errors (see Section 7.1.2). 346 5. OpenPGP Signatures 348 OpenPGP signatures are specified with the Link header 349 [draft-nottingham-http-link-header] and a relation type of 350 "describedby" and a type parameter of "application/pgp-signature". 352 A brief Metalink server response with OpenPGP signature only: 354 Link: ; rel="describedby"; 355 type="application/pgp-signature" 357 Metalink clients MAY support the use of OpenPGP signatures. 359 6. Cryptographic Hashes of Whole Files 361 Metalink servers MUST provide Instance Digests in HTTP [RFC3230] for 362 files they describe with mirrors. Mirror servers SHOULD as well. 364 A brief Metalink server response with cryptographic hash: 366 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 367 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 369 7. Client / Server Multi-source Download Interaction 371 Metalink clients begin a download with a standard HTTP [RFC2616] GET 372 request to the Metalink server. A Range limit is optional, not 373 required. Alternatively, Metalink clients can begin with a HEAD 374 request to the Metalink server to discover mirrors via Link headers. 375 After that, the client follows with a GET request to the desired 376 mirrors. 378 GET /distribution/example.ext HTTP/1.1 379 Host: www.example.com 381 The Metalink server responds with the data and these headers: 383 HTTP/1.1 200 OK 384 Accept-Ranges: bytes 385 Content-Length: 14867603 386 Content-Type: application/x-cd-image 387 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 388 Link: ; rel="duplicate" pref=1 389 Link: ; rel="duplicate" 390 Link: ; rel="describedby"; 391 type="application/x-bittorrent" 392 Link: ; rel="describedby"; 393 type="application/metalink4+xml" 394 Link: ; rel="describedby"; 395 type="application/pgp-signature" 396 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 397 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 399 From the Metalink server response the client learns some or all of 400 the following metadata about the requested object, in addition to 401 also starting to receive the object: 403 o Object size. 404 o ETag. 405 o Mirror profile link, which may describe the mirror's priority, 406 whether it shares the ETag policy of the originating Metalink 407 server, geographical location, and mirror depth. 409 o Peer-to-peer information. 410 o Metalink/XML, which can include partial file cryptographic hashes 411 to repair a file. 412 o Digital signature. 413 o Instance Digest, which is the whole file cryptographic hash. 415 (Alternatively, the client could have requested a HEAD only, and then 416 skipped to making the following decisions on every available mirror 417 server found via the Link headers) 419 If the object is large and gets delivered slower than expected then 420 the Metalink client starts a number of parallel ranged downloads (one 421 per selected mirror server other than the first) using mirrors 422 provided by the Link header with "duplicate" relation type, using the 423 location of the original GET request in the "Referer" header field. 424 The size and number of ranges requested from each server is for the 425 client to decide, based upon the performance observed from each 426 server. Further discussion of performance considerations is 427 presented in Section 8. 429 If no range limit was given in the original request then work from 430 the tail of the object (the first request is still running and will 431 eventually catch up), otherwise continue after the range requested in 432 the first request. If no Range was provided, the original connection 433 must be terminated once all parts of the resource have been 434 retrieved. It is recommended that a HEAD request is undertaken 435 first, so that the client can find out if there are any Link headers, 436 and then Range-based requests are undertaken to the mirror servers as 437 well as on the original connection. 439 Preferred mirrors have coordinated ETags, as described in 440 Section 3.3, and If-Match conditions based on the ETag SHOULD be used 441 to quickly detect out-of-date mirrors by using the ETag from the 442 Metalink server response. If no indication of ETag syncronisation/ 443 knowledge is given then If-Match should not be used, and optimally 444 there will be an Instance Digest in the mirror response which we can 445 use to detect a mismatch early, and if not then a mismatch won't be 446 detected until the completed object is verified. Early file mismatch 447 detection is described in detail in Section 7.1.1. 449 One of the client requests to a mirror server: 451 GET /example.ext HTTP/1.1 452 Host: www2.example.com 453 Range: bytes=7433802- 454 If-Match: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 455 Referer: http://www.example.com/distribution/example.ext 456 The mirror servers respond with a 206 Partial Content HTTP status 457 code and appropriate "Content-Length" and "Content Range" header 458 fields. The mirror server response, with data, to the above request: 460 HTTP/1.1 206 Partial Content 461 Accept-Ranges: bytes 462 Content-Length: 7433801 463 Content-Range: bytes 7433802-14867602/14867603 464 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 465 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 466 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 468 If the first request was not Range limited then abort it by closing 469 the connection when it catches up with the other parallel downloads 470 of the same object. 472 Downloads from mirrors that do not have the same file size as the 473 Metalink server MUST be aborted. 475 Once the download has completed, the Metalink client MUST verify the 476 cryptographic hash of the file. 478 7.1. Error Prevention, Detection, and Correction 480 Error prevention, or early file mismatch detection, is possible 481 before file transfers with the use of file sizes, ETags, and 482 cryptographic hashes. Error detection requires Instance Digests, or 483 cryptographic hashes, to determine after transfers if there has been 484 an error. Error correction, or download repair, is possible with 485 partial file cryptographic hashes. 487 7.1.1. Error Prevention (Early File Mismatch Detection) 489 In HTTP terms, the requirement is that merging of ranges from 490 multiple responses must be verified with a strong validator, which in 491 this context is the same as either Instance Digest or a strong ETag. 492 In most cases it is sufficient that the Metalink server provides 493 mirrors and Instance Digest information, but operation will be more 494 robust and efficient if the mirror servers do implement a 495 synchronized ETag as well. In fact, the emitted ETag may be 496 implemented the same as the Instance Digest for simplicity, but there 497 is no need to specify how the ETag is generated, just that it needs 498 to be shared among the mirror servers. If the mirror server provides 499 neither synchronized ETag or Instance Digest, then early detection of 500 mismatches is not possible unless file length also differs. Finally, 501 the error is still detectable, after the download has completed, when 502 the merged response is verified. 504 ETags can not be used for verifying the integrity of the received 505 content. But it is a guarantee issued by the Metalink server that 506 the content is correct for that ETag. And if the ETag given by the 507 mirror server matches the ETag given by the master server, then we 508 have a chain of trust where the master server authorizes these 509 responses as valid for that object. 511 This guarantees that a mismatch will be detected by using only the 512 synchronized ETag from a master server and mirror server, even 513 alerted by the mirror servers themselves by responding with an error, 514 preventing accidental merges of ranges from different versions of 515 files with the same name. This even includes many malicious attacks 516 where the data on the mirror has been replaced by some other file, 517 but not all. 519 Synchronized ETag can not strictly protect against malicious attacks 520 or server or network errors replacing content, but neither can 521 Instance Digest on the mirror servers as the attacker most certainly 522 can make the server seemingly respond with the expected Instance 523 Digest even if the file contents have been modified, just as he can 524 with ETag, and the same for various system failures also causing bad 525 data to be returned. The Metalink client has to rely on the Instance 526 Digest returned by the Metalink master server in the first response 527 for the verification of the downloaded object as a whole. 529 If the mirror servers do return an Instance Digest, then that is a 530 bonus, just as having them return the right set of Link headers is. 531 The set of trusted mirrors doing that can be substituted as master 532 servers accepting the initial request if one likes. 534 The benefit of having slave mirror servers (those not trusted as 535 masters) return Instance Digest is that the client then can detect 536 mismatches early even if ETag is not used. Both ETag and slave 537 mirror Instance Digest do provide value, but just one is sufficient 538 for early detection of mismatches. If none is provided then early 539 detection of mismatches is not possible unless the file length also 540 differs, but the error is still detected when the merged response is 541 verified. 543 If FTP servers support the FTP HASH command [draft-bryan-ftp-hash] 544 and the same hash algorithm as the originating Metalink server, then 545 that information can be used for early file mismatch detection. 547 7.1.2. Error Correction 549 Partial file cryptographic hashes can be used to detect errors during 550 the download. Metalink servers are not required to offer partial 551 file cryptographic hashes, but they are encouraged to do so. 553 If the object cryptographic hash does not match the Instance Digest 554 then fetch the Metalink/XML as specified in Section 4.1, where 555 partial file cryptographic hashes may be found, allowing detection of 556 which server returned incorrect data. If the Instance Digest 557 computation does not match then the client needs to fetch the partial 558 file cryptographic hashes, if available, and from there figure out 559 what of the downloaded data can be recovered and what needs to be 560 fetched again. If no partial cryptographic hashes are available, 561 then the client MUST fetch the complete object from other mirrors. 563 8. Multi-server Performance 565 When opting to download simultaneously from multiple mirrors, there 566 are a number of factors (both within and outside the influence of the 567 client software) that are relevant to the performance achieved: 569 o The number of servers used simultaneously. 570 o The ability to pipeline sufficient or sufficiently large range 571 requests to each server so as to avoid connections going idle. 572 o The ability to pipeline sufficiently few or sufficiently small 573 range requests to servers so that all the servers finish their 574 final chunks simultaneously. 575 o The ability to switch between mirrors dynamically so as to use the 576 fastest mirrors at any moment in time 578 Obviously we do not want to use too many simultaneous connections, or 579 other traffic sharing a bottleneck link will be starved. But at the 580 same time, good performance requires that the client can 581 simultaneously download from at least one fast mirror while exploring 582 whether any other mirror is faster. Based on laboratory experiments, 583 we suggest a good default number of simultaneous connections is 584 probably four, with three of these being used for the best three 585 mirrors found so far, and one being used to evaluate whether any 586 other mirror might offer better performance. 588 The size of chunks chosen by the client should be sufficiently large 589 that the chunk request headers and reponse headers represent neglible 590 overhead, and sufficiently large that they can be pipelined 591 effectively without needing a very high rate of chunk requests. At 592 the same time, the amount of time wasted waiting for the last chunk 593 to download from the last server after all the other servers have 594 finished should be minimized. Thus we currently recommend that a 595 chunk size of at least 10KBytes should be used. If the file being 596 transfered is very large, or the download speed very high, this can 597 be increased to perhaps 1MByte. As network bandwidths increase, we 598 expect these numbers to increase appropriately, so that the time to 599 transfer a chunk remains significantly larger than the latency of 600 requesting a chunk from a server. 602 9. IANA Considerations 604 Accordingly, IANA has made the following registration to the Link 605 Relation Type registry. 607 o Relation Name: duplicate 609 o Description: Refers to a resource whose available representations 610 are byte-for-byte identical with the corresponding representations of 611 the context IRI. 613 o Reference: This specification. 615 o Notes: This relation is for static resources. That is, an HTTP GET 616 request on any duplicate will return the same representation. It 617 does not make sense for dynamic or POSTable resources and should not 618 be used for them. 620 10. Security Considerations 622 10.1. URIs and IRIs 624 Metalink clients handle URIs and IRIs. See Section 7 of [RFC3986] 625 and Section 8 of [RFC3987] for security considerations related to 626 their handling and use. 628 10.2. Spoofing 630 There is potential for spoofing attacks where the attacker publishes 631 Metalinks with false information. In that case, this could deceive 632 unaware downloaders that they are downloading a malicious or 633 worthless file. Also, malicious publishers could attempt a 634 distributed denial of service attack by inserting unrelated URIs into 635 Metalinks. 637 10.3. Cryptographic Hashes 639 Currently, some of the digest values defined in Instance Digests in 640 HTTP [RFC3230] are considered insecure. These include the whole 641 Message Digest family of algorithms which are not suitable for 642 cryptographically strong verification. Malicious people could 643 provide files that appear to be identical to another file because of 644 a collision, i.e. the weak cryptographic hashes of the intended file 645 and a substituted malicious file could match. 647 If a Metalink contains whole file hashes as described in Section 6, 648 it SHOULD include "sha-256" which is SHA-256, as specified in 649 [FIPS-180-3], or stronger. It MAY also include other hashes. 651 10.4. Signing 653 Metalinks should include digital signatures, as described in 654 Section 5. 656 Digital signatures provide authentication, message integrity, and 657 non-repudiation with proof of origin. 659 11. Normative References 661 [FIPS-180-3] 662 National Institute of Standards and Technology (NIST), 663 "Secure Hash Standard (SHS)", FIPS PUB 180-3, 664 October 2008. 666 [ISO3166-1] 667 International Organization for Standardization, "ISO 3166- 668 1:2006. Codes for the representation of names of 669 countries and their subdivisions -- Part 1: Country 670 codes", November 2006. 672 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 673 Requirement Levels", BCP 14, RFC 2119, March 1997. 675 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 676 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 677 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 679 [RFC3230] Mogul, J. and A. Van Hoff, "Instance Digests in HTTP", 680 RFC 3230, January 2002. 682 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 683 Resource Identifier (URI): Generic Syntax", STD 66, 684 RFC 3986, January 2005. 686 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 687 Identifiers (IRIs)", RFC 3987, January 2005. 689 [draft-bryan-ftp-hash] 690 Bryan, A., Kosse, T., and D. Stenberg, "FTP Extensions for 691 Cryptographic Hashes", draft-bryan-ftp-hash-02 (work in 692 progress), April 2010. 694 [draft-bryan-metalink] 695 Bryan, A., Tsujikawa, T., McNab, N., and P. Poeml, "The 696 Metalink Download Description Format", 697 draft-bryan-metalink-28 (work in progress), February 2010. 699 [draft-nottingham-http-link-header] 700 Nottingham, M., "Web Linking", 701 draft-nottingham-http-link-header-09 (work in progress), 702 April 2010. 704 Appendix A. Acknowledgements and Contributors 706 Thanks to the Metalink community, Mark Handley, Mark Nottingham, 707 Daniel Stenberg, Tatsuhiro Tsujikawa, Peter Poeml, Matt Domsch, Micah 708 Cowan, and David Morris. 710 Support for simultaneous download from multiple mirrors is based upon 711 work by Mark Handley and Javier Vela Diago, who also provided 712 validation of the benefits of this approach. 714 Appendix B. Comparisons to Similar Options 716 [[ to be removed by the RFC editor before publication as an RFC. ]] 718 This draft, compared to the Metalink/XML format 719 [draft-bryan-metalink] : 721 o (+) Reuses existing HTTP standards without much new besides a Link 722 Relation Type. It's more of a collection/coordinated feature set. 723 o (?) The existing standards don't seem to be widely implemented. 724 o (+) No XML dependency, except for Metalink/XML for partial file 725 cryptographic hashes. 726 o (+) Existing Metalink/XML clients can be easily converted to 727 support this as well. 728 o (+) Coordination of mirror servers is preferred, but not required. 729 Coordination may be difficult or impossible unless you are in 730 control of all servers on the mirror network. 731 o (-) Requires software or configuration changes to originating 732 server. 733 o (-?) Tied to HTTP, not as generic. FTP/P2P clients won't be 734 using it unless they also support HTTP, unlike Metalink/XML. 735 o (-) Requires server-side support. Metalink/XML can be created by 736 user (or server, but server component/changes not required). 737 o (-) Also, Metalink/XML files are easily mirrored on all servers. 738 Even if usage in that case is not as transparent, it still gives 739 access to users at all mirrors (FTP included) to all download 740 information with no changes needed to the server. 741 o (-) Not portable/archivable/emailable. Metalink/XML is used to 742 import/export transfer queues. Not as easy for search engines to 743 index? 744 o (-) Not as rich metadata. 745 o (-) Not able to add multiple files to a download queue or create 746 directory structure. 748 Appendix C. Document History 750 [[ to be removed by the RFC editor before publication as an RFC. ]] 752 Known issues concerning this draft: 753 o Some organizations have many mirrors. Should all be sent, or only 754 a certain number? All should be included in the Metalink/XML, if 755 used. 756 o Would it make more sense to use qvalue-style policies to describe 757 mirror priority, i.e. q=1.0 through q=0.0 ? 758 o Using Metalink/XML for partial file cryptographic hashes. That 759 adds XML dependency to apps for an important feature. Is there a 760 better method? 761 o Do we need an "official" MIME type for .torrent files or allow 762 "application/x-bittorrent"? 764 -16 : April , 2010. 765 o Add draft-bryan-ftp-hash reference and FTP mirror coordination. 767 -15 : December 31, 2009. 768 o Update references and terminology. 770 -14 : December 31, 2009. 771 o Baseline file hash: SHA-256. 773 -13 : November 22, 2009. 774 o Metalink/XML for partial file cryptographic hashes. 776 -12 : November 11, 2009. 777 o Clarifications. 779 -11 : October 23, 2009. 780 o Mirror changes. 782 -10 : October 15, 2009. 783 o Mirror coordination changes. 785 -09 : October 12, 2009. 787 o Mirror location, coordination, and depth. 788 o Split HTTP Digest Algorithm Values Registration into 789 draft-bryan-http-digest-algorithm-values-update. 791 -08 : October 4, 2009. 792 o Clarifications. 794 -07 : September 29, 2009. 795 o Preferred mirror servers. 797 -06 : September 24, 2009. 798 o Add Mismatch Detection, Error Recovery, and Digest Algorithm 799 values. 800 o Remove Content-MD5 and Want-Digest. 802 -05 : September 19, 2009. 803 o ETags, preferably matching the Instance Digests. 805 -04 : September 17, 2009. 806 o Temporarily remove .torrent. 808 -03 : September 16, 2009. 809 o Mention HEAD request, negotiate mirrors if Want-Digest is used. 811 -02 : September 6, 2009. 812 o Content-MD5 for partial file cryptographic hashes. 814 -01 : September 1, 2009. 815 o Link Relation Type Registration: "duplicate" 817 -00 : August 24, 2009. 818 o Initial draft. 820 Authors' Addresses 822 Anthony Bryan 823 Pompano Beach, FL 824 USA 826 Email: anthonybryan@gmail.com 827 URI: http://www.metalinker.org 828 Neil McNab 830 Email: neil@nabber.org 831 URI: http://www.nabber.org 833 Henrik Nordstrom 835 Email: henrik@henriknordstrom.net 836 URI: http://www.henriknordstrom.net/ 838 Alan Ford 839 Roke Manor Research 840 Old Salisbury Lane 841 Romsey, Hampshire SO51 0ZN 842 UK 844 Phone: +44 1794 833 465 845 Email: alan.ford@roke.co.uk