idnits 2.17.1 draft-bryan-metalinkhttp-19.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 20, 2011) is 4843 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'BITTORRENT' -- Possible downref: Non-RFC (?) normative reference: ref. 'FIPS-180-3' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 3230 (Obsoleted by RFC 9530) ** Obsolete normative reference: RFC 5988 (Obsoleted by RFC 8288) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Bryan 3 Internet-Draft N. McNab 4 Intended status: Standards Track H. Nordstrom 5 Expires: July 24, 2011 T. Tsujikawa 7 P. Poeml 8 MirrorBrain 9 A. Ford 10 Roke Manor Research 11 January 20, 2011 13 Metalink/HTTP: Mirrors and Cryptographic Hashes in HTTP Header Fields 14 draft-bryan-metalinkhttp-19 16 Abstract 18 This document specifies Metalink/HTTP: Mirrors and Cryptographic 19 Hashes in HTTP header fields, a different way to get information that 20 is usually contained in the Metalink XML-based download description 21 format. Metalink/HTTP describes multiple download locations 22 (mirrors), Peer-to-Peer, cryptographic hashes, digital signatures, 23 and other information using existing standards for HTTP header 24 fields. Clients can use this information to make file transfers more 25 robust and reliable. 27 Editorial Note (To be removed by RFC Editor) 29 Discussion of this draft should take place on the HTTPBIS working 30 group mailing list (ietf-http-wg@w3.org), althought this draft is not 31 a WG item. 33 The changes in this draft are summarized in Appendix C. 35 Status of this Memo 37 This Internet-Draft is submitted in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF). Note that other groups may also distribute 42 working documents as Internet-Drafts. The list of current Internet- 43 Drafts is at http://datatracker.ietf.org/drafts/current/. 45 Internet-Drafts are draft documents valid for a maximum of six months 46 and may be updated, replaced, or obsoleted by other documents at any 47 time. It is inappropriate to use Internet-Drafts as reference 48 material or to cite them other than as "work in progress." 49 This Internet-Draft will expire on July 24, 2011. 51 Copyright Notice 53 Copyright (c) 2011 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (http://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with respect 61 to this document. Code Components extracted from this document must 62 include Simplified BSD License text as described in Section 4.e of 63 the Trust Legal Provisions and are provided without warranty as 64 described in the Simplified BSD License. 66 Table of Contents 68 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 69 1.1. Operation Overview . . . . . . . . . . . . . . . . . . . . 5 70 1.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 5 71 1.3. Notational Conventions . . . . . . . . . . . . . . . . . . 5 72 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 6 73 3. Mirrors / Multiple Download Locations . . . . . . . . . . . . 7 74 3.1. Mirror Priority . . . . . . . . . . . . . . . . . . . . . 7 75 3.2. Mirror Geographical Location . . . . . . . . . . . . . . . 7 76 3.3. Coordinated Mirror Policies . . . . . . . . . . . . . . . 7 77 3.4. Mirror Depth . . . . . . . . . . . . . . . . . . . . . . . 8 78 4. Peer-to-Peer / Metainfo . . . . . . . . . . . . . . . . . . . 8 79 4.1. Metalink/XML Files . . . . . . . . . . . . . . . . . . . . 9 80 5. OpenPGP Signatures . . . . . . . . . . . . . . . . . . . . . . 9 81 6. Cryptographic Hashes of Whole Files . . . . . . . . . . . . . 9 82 7. Client / Server Multi-source Download Interaction . . . . . . 10 83 7.1. Error Prevention, Detection, and Correction . . . . . . . 12 84 7.1.1. Error Prevention (Early File Mismatch Detection) . . . 12 85 7.1.2. Error Correction . . . . . . . . . . . . . . . . . . . 14 86 8. Multi-server Performance . . . . . . . . . . . . . . . . . . . 14 87 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 88 10. Security Considerations . . . . . . . . . . . . . . . . . . . 15 89 10.1. URIs and IRIs . . . . . . . . . . . . . . . . . . . . . . 15 90 10.2. Spoofing . . . . . . . . . . . . . . . . . . . . . . . . . 15 91 10.3. Cryptographic Hashes . . . . . . . . . . . . . . . . . . . 16 92 10.4. Signing . . . . . . . . . . . . . . . . . . . . . . . . . 16 93 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16 94 11.1. Normative References . . . . . . . . . . . . . . . . . . . 16 95 11.2. Informative References . . . . . . . . . . . . . . . . . . 17 96 Appendix A. Acknowledgements and Contributors . . . . . . . . . . 17 97 Appendix B. Comparisons to Similar Options . . . . . . . . . . . 17 98 Appendix C. Document History . . . . . . . . . . . . . . . . . . 18 99 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 101 1. Introduction 103 Metalink/HTTP is an alternative representation of Metalink 104 information, which is usually presented as an XML-based document 105 format [RFC5854]. Metalink/HTTP attempts to provide as much 106 functionality as the Metalink/XML format by using existing standards 107 such as Web Linking [RFC5988], Instance Digests in HTTP [RFC3230], 108 and Entity Tags (also known as ETags) [RFC2616]. Metalink/HTTP is 109 used to list information about a file to be downloaded. This can 110 include lists of multiple URIs (mirrors), Peer-to-Peer information, 111 cryptographic hashes, and digital signatures. 113 Identical copies of a file are frequently accessible in multiple 114 locations on the Internet over a variety of protocols (such as FTP, 115 HTTP, and Peer-to-Peer). In some cases, users are shown a list of 116 these multiple download locations (mirrors) and must manually select 117 a single one on the basis of geographical location, priority, or 118 bandwidth. This distributes the load across multiple servers, and 119 should also increase throughput and resilience. At times, however, 120 individual servers can be slow, outdated, or unreachable, but this 121 can not be determined until the download has been initiated. Users 122 will rarely have sufficient information to choose the most 123 appropriate server, and will often choose the first in a list which 124 might not be optimal for their needs, and will lead to a particular 125 server getting a disproportionate share of load. The use of 126 suboptimal mirrors can lead to the user canceling and restarting the 127 download to try to manually find a better source. During downloads, 128 errors in transmission can corrupt the file. There are no easy ways 129 to repair these files. For large downloads this can be extremely 130 troublesome. Any of the number of problems that can occur during a 131 download lead to frustration on the part of users. 133 Some popular sites automate the process of selecting mirrors using 134 DNS load balancing, both to approximately balance load between 135 servers, and to direct clients to nearby servers with the hope that 136 this improves throughput. Indeed, DNS load balancing can balance 137 long-term server load fairly effectively, but it is less effective at 138 delivering the best throughput to users when the bottleneck is not 139 the server but the network. 141 This document describes a mechanism by which the benefit of mirrors 142 can be automatically and more effectively realized. All the 143 information about a download, including mirrors, cryptographic 144 hashes, digital signatures, and more can be transferred in 145 coordinated HTTP header fields hereafter referred to as a Metalink. 146 This Metalink transfers the knowledge of the download server (and 147 mirror database) to the client. Clients can fallback to other 148 mirrors if the current one has an issue. With this knowledge, the 149 client is enabled to work its way to a successful download even under 150 adverse circumstances. All this can be done without complicated user 151 interaction and the download can be much more reliable and efficient. 152 In contrast, a traditional HTTP redirect to a mirror conveys only 153 extremely minimal information - one link to one server, and there is 154 no provision in the HTTP protocol to handle failures. Furthermore, 155 in order to provide better load distribution across servers and 156 potentially faster downloads to users, Metalink/HTTP facilitates 157 multi-source downloads, where portions of a file are downloaded from 158 multiple mirrors (and optionally, Peer-to-Peer) simultaneously. 160 1.1. Operation Overview 162 Detailed discussion of Metalink operation is covered in Section 2; 163 this section will present a very brief, high-level overview of how 164 Metalink achieves its goals. 166 Upon connection to a Metalink/HTTP server, a client will receive 167 information about other sources of the same resource and a 168 cryptographic hash of the whole resource. The client will then be 169 able to request chunks of the file from the various sources, 170 scheduling appropriately in order to maximise the download rate. 172 1.2. Examples 174 A brief Metalink server response with ETag, mirrors, .metalink, 175 OpenPGP signature, and a cryptographic hash of the whole file: 177 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 178 Link: ; rel=duplicate 179 Link: ; rel=duplicate 180 Link: ; rel=describedby; 181 type="application/x-bittorrent" 182 Link: ; rel=describedby; 183 type="application/metalink4+xml" 184 Link: ; rel=describedby; 185 type="application/pgp-signature" 186 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 187 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 189 1.3. Notational Conventions 191 This specification describes conformance of Metalink/HTTP. 193 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 194 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 195 document are to be interpreted as described in BCP 14, [RFC2119], as 196 scoped to those conformance targets. 198 2. Requirements 200 In this context, "Metalink" refers to Metalink/HTTP which consists of 201 mirrors and cryptographic hashes in HTTP header fields as described 202 in this document. "Metalink/XML" refers to the XML format described 203 in [RFC5854]. 205 Metalink resources include Link header fields [RFC5988] to present a 206 list of mirrors in the response to a client request for the resource. 207 Metalink servers MUST include the cryptographic hash of a resource 208 via Instance Digests in HTTP [RFC3230]. Valid algorithms are found 209 in the IANA registry named "Hypertext Transfer Protocol (HTTP) Digest 210 Algorithm Values" at 211 . 212 SHA-256 and SHA-512 were added by [RFC5843]. 214 Metalink servers are HTTP servers with one or more Metalink 215 resources. Metalink servers MUST support the Link header fields for 216 listing mirrors and MUST support Instance Digests in HTTP [RFC3230]. 217 Metalink servers MUST return the same Link header fields and Instance 218 Digests on HEAD requests. Metalink servers and their associated 219 mirror servers SHOULD all share the same ETag policy. To have the 220 same ETag policy means that ETags are synchronized across servers for 221 resources that are mirrored, i.e. byte-for-byte identical files will 222 have the same ETag on mirrors that they have on the Metalink server. 223 ETags could be based on the file contents (cryptographic hash) and 224 not server-unique filesystem metadata. The emitted ETag could be 225 implemented the same as the Instance Digest for simplicity. Metalink 226 servers can offer Metalink/XML documents that contain cryptographic 227 hashes of parts of the file and other information. 229 Mirror servers are typically FTP or HTTP servers that "mirror" 230 another server. That is, they provide identical copies of (at least 231 some) files that are also on the mirrored server. Mirror servers can 232 also be Metalink servers. Mirror servers SHOULD support serving 233 partial content. HTTP mirror servers SHOULD share the same ETag 234 policy as the originating Metalink server. HTTP Mirror servers 235 SHOULD support Instance Digests in HTTP [RFC3230]. 237 Metalink clients use the mirrors provided by a Metalink server with 238 Link header fields [RFC5988]. Metalink clients MUST support HTTP and 239 SHOULD support FTP [RFC0959]. Metalink clients MAY support 240 BitTorrent [BITTORRENT], or other download methods. Metalink clients 241 SHOULD switch downloads from one mirror to another if a mirror 242 becomes unreachable. Metalink clients MAY support multi-source, or 243 parallel, downloads, where portions of a file can be downloaded from 244 multiple mirrors simultaneously (and optionally, from Peer-to-Peer 245 sources). Metalink clients MUST support Instance Digests in HTTP 247 [RFC3230] by requesting and verifying cryptographic hashes. Metalink 248 clients MAY make use of digital signatures if they are offered. 250 3. Mirrors / Multiple Download Locations 252 Mirrors are specified with the Link header fields [RFC5988] and a 253 relation type of "duplicate" as defined in Section 9. 255 A brief Metalink server response with two mirrors only: 257 Link: ; rel=duplicate; 258 pri=1; pref 259 Link: ; rel=duplicate; 260 pri=2; geo=gb; depth=1 262 [[Some organizations have many mirrors. Only send a few mirrors, or 263 only use the Link header fields if Want-Digest is used?]] 265 It is up to the server to choose how many Link header fieldss to 266 send. Such a decision could be a hard-coded limit, a random 267 selection, based on file size, or based on server load. 269 3.1. Mirror Priority 271 Entries for mirror servers are listed in order of priority (from most 272 preferred to least) or have a "pri" value, where mirrors with lower 273 values are used first. 275 This is purely an expression of the server's preferences; it is up to 276 the client what it does with this information, particularly with 277 reference to how many servers to use at any one time. 279 3.2. Mirror Geographical Location 281 Entries for a mirror servers can have a "geo" value, which is a 282 [ISO3166-1] alpha-2 two letter country code for the geographical 283 location of the physical server the URI is used to access. A client 284 can use this information to select a mirror, or set of mirrors, that 285 are geographically near (if the client has access to such 286 information), with the aim of reducing network load at inter-country 287 bottlenecks. 289 3.3. Coordinated Mirror Policies 291 There are two types of mirror servers: preferred and normal. 292 Preferred mirror servers are HTTP mirror servers that MUST share the 293 same ETag policy as the originating Metalink server. Preferred 294 mirrors make it possible to detect early on, before data is 295 transferred, if the file requested matches the desired file. Entries 296 for preferred HTTP mirror servers have a "pref" value. By default, 297 if unspecified then mirrors are considered "normal" and do not 298 necessarily share the same ETag policy. FTP mirrors, as they do not 299 emit ETags, are considered "normal". ([draft-ietf-ftpext2-hash] 300 allows for FTP mirrors to be coordinated and provide file hashes). 302 HTTP Mirror servers SHOULD support Instance Digests in HTTP 303 [RFC3230]. Optimally, mirror servers will share the same ETag policy 304 and support Instance Digests in HTTP. 306 3.4. Mirror Depth 308 Some mirrors can mirror single files, whole directories, or multiple 309 directories. 311 Entries for mirror servers can have a "depth" value, where "depth=0" 312 is the default. A value of 0 means ONLY that file is mirrored and 313 that other URI path segments are not. A value of 1 means that file 314 and all other files and URI path segments contained in the rightmost 315 URI path segment are mirrored. For values of N, you go up N-1 URI 316 path segments above. A value of 2 means means going up one URI path 317 segment above, and all files and URI path segments contained are 318 mirrored. For each higher value, another URI path segment closer to 319 the Host is mirrored. 321 A mirror with a depth value of 4: 323 Link: ; 324 rel=duplicate; pri=1; pref; depth=4 326 In the above example, 4 URI path segments up are mirrored, from 327 /dir2/ on down. 329 4. Peer-to-Peer / Metainfo 331 Entries for metainfo files, which describe ways to download a file 332 over Peer-to-Peer networks or otherwise, are specified with the Link 333 header fields [RFC5988] and a relation type of "describedby" and a 334 type parameter that indicates the MIME type of the metadata available 335 at the URI. Since metainfo files can sometimes describe multiple 336 files, or the filename may not be the same on the Metalink server and 337 in the metainfo file but still have the same content, an optional 338 name parameter can be used. 340 A brief Metalink server response with .torrent and .metalink: 342 Link: ; rel=describedby; 343 type="application/x-bittorrent"; name="differentname.ext" 344 Link: ; rel=describedby; 345 type="application/metalink4+xml" 347 Metalink clients MAY support the use of metainfo files for 348 downloading files. 350 4.1. Metalink/XML Files 352 Full Metalink/XML files for a given resource can be specified as 353 shown in Section 4. This is particularly useful for providing 354 metadata such as cryptographic hashes of parts of a file, allowing a 355 client to recover from partial errors (see Section 7.1.2). 357 5. OpenPGP Signatures 359 OpenPGP signatures [RFC3156] are specified with the Link header 360 fields [RFC5988] and a relation type of "describedby" and a type 361 parameter of "application/pgp-signature". 363 A brief Metalink server response with OpenPGP signature only: 365 Link: ; rel=describedby; 366 type="application/pgp-signature" 368 Metalink clients MAY support the use of OpenPGP signatures. 370 6. Cryptographic Hashes of Whole Files 372 Metalink servers MUST provide Instance Digests in HTTP [RFC3230] for 373 files they describe with mirrors via Link header fields. Mirror 374 servers SHOULD as well. If Instance Digests are not provided by the 375 Metalink servers, the Link header fields MUST be ignored. 377 A brief Metalink server response with cryptographic hash: 379 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 380 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 382 7. Client / Server Multi-source Download Interaction 384 Metalink clients begin a download with a standard HTTP [RFC2616] GET 385 request to the Metalink server. A Range limit is optional, not 386 required. Alternatively, Metalink clients can begin with a HEAD 387 request to the Metalink server to discover mirrors via Link header 388 fieldss. After that, the client follows with a GET request to the 389 desired mirrors. 391 GET /distribution/example.ext HTTP/1.1 392 Host: www.example.com 394 The Metalink server responds with the data and these header fields: 396 HTTP/1.1 200 OK 397 Accept-Ranges: bytes 398 Content-Length: 14867603 399 Content-Type: application/x-cd-image 400 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 401 Link: ; rel=duplicate; pref 402 Link: ; rel=duplicate 403 Link: ; rel=describedby; 404 type="application/x-bittorrent" 405 Link: ; rel=describedby; 406 type="application/metalink4+xml" 407 Link: ; rel=describedby; 408 type="application/pgp-signature" 409 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 410 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 412 From the Metalink server response the client learns some or all of 413 the following metadata about the requested object, in addition to 414 also starting to receive the object: 416 o Object size. 417 o ETag. 418 o Mirror profile link, which can describe the mirror's priority, 419 whether it shares the ETag policy of the originating Metalink 420 server, geographical location, and mirror depth. 421 o Peer-to-peer information. 422 o Metalink/XML, which can include partial file cryptographic hashes 423 to repair a file. 424 o Digital signature. 425 o Instance Digest, which is the whole file cryptographic hash. 427 (Alternatively, the client could have requested a HEAD only, and then 428 skipped to making the following decisions on every available mirror 429 server found via the Link header fieldss) 431 If the object is large and gets delivered slower than expected then 432 the Metalink client starts a number of parallel ranged downloads (one 433 per selected mirror server other than the first) using mirrors 434 provided by the Link header fields with "duplicate" relation type, 435 using the location of the original GET request in the "Referer" 436 header field. The size and number of ranges requested from each 437 server is for the client to decide, based upon the performance 438 observed from each server. Further discussion of performance 439 considerations is presented in Section 8. 441 If no range limit was given in the original request then work from 442 the tail of the object (the first request is still running and will 443 eventually catch up), otherwise continue after the range requested in 444 the first request. If no Range was provided, the original connection 445 must be terminated once all parts of the resource have been 446 retrieved. It is recommended that a HEAD request is undertaken 447 first, so that the client can find out if there are any Link header 448 fieldss, and then Range-based requests are undertaken to the mirror 449 servers as well as on the original connection. 451 Preferred mirrors have coordinated ETags, as described in 452 Section 3.3, and If-Match conditions based on the ETag SHOULD be used 453 to quickly detect out-of-date mirrors by using the ETag from the 454 Metalink server response. If no indication of ETag syncronisation/ 455 knowledge is given then If-Match should not be used, and optimally 456 there will be an Instance Digest in the mirror response which we can 457 use to detect a mismatch early, and if not then a mismatch won't be 458 detected until the completed object is verified. Early file mismatch 459 detection is described in detail in Section 7.1.1. 461 One of the client requests to a mirror server: 463 GET /example.ext HTTP/1.1 464 Host: www2.example.com 465 Range: bytes=7433802- 466 If-Match: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 467 Referer: http://www.example.com/distribution/example.ext 469 The mirror servers respond with a 206 Partial Content HTTP status 470 code and appropriate "Content-Length" and "Content Range" header 471 fields. The mirror server response, with data, to the above request: 473 HTTP/1.1 206 Partial Content 474 Accept-Ranges: bytes 475 Content-Length: 7433801 476 Content-Range: bytes 7433802-14867602/14867603 477 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 478 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 479 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 481 If the first request was not Range limited then abort it by closing 482 the connection when it catches up with the other parallel downloads 483 of the same object. 485 Downloads from mirrors that do not have the same file size as the 486 Metalink server are considered unusable and the client can deal with 487 it as it sees fit. 489 If a Metalink client does not support certain download methods (such 490 as FTP or BitTorrent) that a file is available from, and there are no 491 available download methods that the client supports, then the 492 download will have no way to complete. 494 Once the download has completed, the Metalink client MUST verify the 495 cryptographic hash of the file. If the cryptographic hash offered by 496 the Metalink server with Instance Digests does not match the 497 cryptographic hash of the downloaded file, see Section 7.1.2 for a 498 possible way to repair errors. 500 If the download can not be repaired, it is considered corrupt. The 501 client can attempt to re-download the file. 503 7.1. Error Prevention, Detection, and Correction 505 Error prevention, or early file mismatch detection, is possible 506 before file transfers with the use of file sizes, ETags, and 507 cryptographic hashes. Error detection requires Instance Digests, or 508 cryptographic hashes, to determine after transfers if there has been 509 an error. Error correction, or download repair, is possible with 510 partial file cryptographic hashes. 512 Note that cyptographic hashes obtained from Instance Digests are in 513 base64 encoding, while those from Metalink/XML and FTP HASH are in 514 hexadecimal. 516 7.1.1. Error Prevention (Early File Mismatch Detection) 518 In HTTP terms, the requirement is that merging of ranges from 519 multiple responses must be verified with a strong validator, which in 520 this context is the same as either Instance Digest or a strong ETag. 522 In most cases it is sufficient that the Metalink server provides 523 mirrors and Instance Digest information, but operation will be more 524 robust and efficient if the mirror servers do implement a 525 synchronized ETag as well. In fact, the emitted ETag can be 526 implemented the same as the Instance Digest for simplicity, but there 527 is no need to specify how the ETag is generated, just that it needs 528 to be shared among the mirror servers. If the mirror server provides 529 neither synchronized ETag or Instance Digest, then early detection of 530 mismatches is not possible unless file length also differs. Finally, 531 the error is still detectable, after the download has completed, when 532 the merged response is verified. 534 ETags can not be used for verifying the integrity of the received 535 content. But it is a guarantee issued by the Metalink server that 536 the content is correct for that ETag. And if the ETag given by the 537 mirror server matches the ETag given by the master server, then we 538 have a chain of trust where the master server authorizes these 539 responses as valid for that object. 541 This guarantees that a mismatch will be detected by using only the 542 synchronized ETag from a master server and mirror server, even 543 alerted by the mirror servers themselves by responding with an error, 544 preventing accidental merges of ranges from different versions of 545 files with the same name. This even includes many malicious attacks 546 where the data on the mirror has been replaced by some other file, 547 but not all. 549 Synchronized ETag can not strictly protect against malicious attacks 550 or server or network errors replacing content, but neither can 551 Instance Digest on the mirror servers as the attacker most certainly 552 can make the server seemingly respond with the expected Instance 553 Digest even if the file contents have been modified, just as he can 554 with ETag, and the same for various system failures also causing bad 555 data to be returned. The Metalink client has to rely on the Instance 556 Digest returned by the Metalink master server in the first response 557 for the verification of the downloaded object as a whole. 559 If the mirror servers do return an Instance Digest, then that is a 560 bonus, just as having them return the right set of Link header 561 fieldss is. The set of trusted mirrors doing that can be substituted 562 as master servers accepting the initial request if one likes. 564 The benefit of having slave mirror servers (those not trusted as 565 masters) return Instance Digest is that the client then can detect 566 mismatches early even if ETag is not used. Both ETag and slave 567 mirror Instance Digest do provide value, but just one is sufficient 568 for early detection of mismatches. If none is provided then early 569 detection of mismatches is not possible unless the file length also 570 differs, but the error is still detected when the merged response is 571 verified. 573 If FTP servers support the FTP HASH command [draft-ietf-ftpext2-hash] 574 and the same hash algorithm as the originating Metalink server, then 575 that information can be used for early file mismatch detection. 577 7.1.2. Error Correction 579 Partial file cryptographic hashes can be used to detect errors during 580 the download. Metalink servers are not required to offer partial 581 file cryptographic hashes in Metalink/XML as specified in 582 Section 4.1, but they are encouraged to do so. 584 If the object cryptographic hash does not match the Instance Digest 585 then fetch the Metalink/XML if available, where partial file 586 cryptographic hashes can be found, allowing detection of which server 587 returned incorrect data. If the Instance Digest computation does not 588 match then the client needs to fetch the partial file cryptographic 589 hashes, if available, and from there figure out what of the 590 downloaded data can be recovered and what needs to be fetched again. 591 If no partial cryptographic hashes are available, then the client 592 MUST fetch the complete object from other mirrors. 594 8. Multi-server Performance 596 When opting to download simultaneously from multiple mirrors, there 597 are a number of factors (both within and outside the influence of the 598 client software) that are relevant to the performance achieved: 600 o The number of servers used simultaneously. 601 o The ability to pipeline sufficient or sufficiently large range 602 requests to each server so as to avoid connections going idle. 603 o The ability to pipeline sufficiently few or sufficiently small 604 range requests to servers so that all the servers finish their 605 final chunks simultaneously. 606 o The ability to switch between mirrors dynamically so as to use the 607 fastest mirrors at any moment in time 609 Obviously we do not want to use too many simultaneous connections, or 610 other traffic sharing a bottleneck link will be starved. But at the 611 same time, good performance requires that the client can 612 simultaneously download from at least one fast mirror while exploring 613 whether any other mirror is faster. Based on laboratory experiments, 614 we suggest a good default number of simultaneous connections is 615 probably four, with three of these being used for the best three 616 mirrors found so far, and one being used to evaluate whether any 617 other mirror might offer better performance. 619 The size of chunks chosen by the client should be sufficiently large 620 that the chunk request header fields and reponse header fields 621 represent neglible overhead, and sufficiently large that they can be 622 pipelined effectively without needing a very high rate of chunk 623 requests. At the same time, the amount of time wasted waiting for 624 the last chunk to download from the last server after all the other 625 servers have finished should be minimized. Note that Range requests 626 impose an overhead on servers and clients need to be aware of that 627 and not abuse them. 629 9. IANA Considerations 631 Accordingly, IANA will make the following registration to the Link 632 Relation Type registry. 634 o Relation Name: duplicate 636 o Description: Refers to a resource whose available representations 637 are byte-for-byte identical with the corresponding representations of 638 the context IRI. 640 o Reference: This specification. 642 o Notes: This relation is for static resources. That is, an HTTP GET 643 request on any duplicate will return the same representation. It 644 does not make sense for dynamic or POSTable resources and should not 645 be used for them. 647 10. Security Considerations 649 10.1. URIs and IRIs 651 Metalink clients handle URIs and IRIs. See Section 7 of [RFC3986] 652 and Section 8 of [RFC3987] for security considerations related to 653 their handling and use. 655 10.2. Spoofing 657 There is potential for spoofing attacks where the attacker publishes 658 Metalinks with false information. In that case, this could deceive 659 unaware downloaders that they are downloading a malicious or 660 worthless file. Also, malicious publishers could attempt a 661 distributed denial of service attack by inserting unrelated URIs into 662 Metalinks. 664 10.3. Cryptographic Hashes 666 Currently, some of the digest values defined in Instance Digests in 667 HTTP [RFC3230] are considered insecure. These include the whole 668 Message Digest family of algorithms which are not suitable for 669 cryptographically strong verification. Malicious people could 670 provide files that appear to be identical to another file because of 671 a collision, i.e. the weak cryptographic hashes of the intended file 672 and a substituted malicious file could match. 674 If a Metalink contains whole file hashes as described in Section 6, 675 it SHOULD include SHA-256, as specified in [FIPS-180-3], or stronger. 676 It MAY also include other hashes. 678 10.4. Signing 680 Metalinks should include digital signatures, as described in 681 Section 5. 683 Digital signatures provide authentication, message integrity, and 684 non-repudiation with proof of origin. 686 11. References 688 11.1. Normative References 690 [BITTORRENT] 691 Cohen, B., "The BitTorrent Protocol Specification", 692 BITTORRENT 11031, February 2008, 693 . 695 [FIPS-180-3] 696 National Institute of Standards and Technology (NIST), 697 "Secure Hash Standard (SHS)", FIPS PUB 180-3, 698 October 2008. 700 [ISO3166-1] 701 International Organization for Standardization, "ISO 3166- 702 1:2006. Codes for the representation of names of 703 countries and their subdivisions -- Part 1: Country 704 codes", November 2006. 706 [RFC0959] Postel, J. and J. Reynolds, "File Transfer Protocol", 707 STD 9, RFC 0959, October 1985. 709 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 710 Requirement Levels", BCP 14, RFC 2119, March 1997. 712 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 713 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 714 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 716 [RFC3156] Elkins, M., Del Torto, D., Levien, R., and T. Roessler, 717 "MIME Security with OpenPGP", RFC 3156, August 2001. 719 [RFC3230] Mogul, J. and A. Van Hoff, "Instance Digests in HTTP", 720 RFC 3230, January 2002. 722 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 723 Resource Identifier (URI): Generic Syntax", STD 66, 724 RFC 3986, January 2005. 726 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 727 Identifiers (IRIs)", RFC 3987, January 2005. 729 [RFC5854] Bryan, A., Tsujikawa, T., McNab, N., and P. Poeml, "The 730 Metalink Download Description Format", RFC 5854, 731 June 2010. 733 [RFC5988] Nottingham, M., "Web Linking", RFC 5988, October 2010. 735 [draft-ietf-ftpext2-hash] 736 Bryan, A., Kosse, T., and D. Stenberg, "FTP Extensions for 737 Cryptographic Hashes", draft-ietf-ftpext2-hash-00 (work in 738 progress), November 2010. 740 11.2. Informative References 742 [RFC5843] Bryan, A., "Additional Hash Algorithms for HTTP Instance 743 Digests", RFC 5843, April 2010. 745 Appendix A. Acknowledgements and Contributors 747 Thanks to the Metalink community, Alexey Melnikov, Julian Reschke, 748 Mark Nottingham, Daniel Stenberg, Matt Domsch, Micah Cowan, and David 749 Morris. 751 Mark Handley and Javier Vela Diago did work on simultaneous download 752 from multiple mirrors, which also provided validation of the benefits 753 of this approach. 755 Appendix B. Comparisons to Similar Options 757 [[ to be removed by the RFC editor before publication as an RFC. ]] 758 This draft, compared to the Metalink/XML format [RFC5854] : 760 o (+) Reuses existing HTTP standards without much new besides a Link 761 Relation Type. It's more of a collection/coordinated feature set. 762 o (?) The existing standards don't seem to be widely implemented. 763 o (+) No XML dependency, except for Metalink/XML for partial file 764 cryptographic hashes. 765 o (+) Existing Metalink/XML clients can be easily converted to 766 support this as well. 767 o (+) Coordination of mirror servers is preferred, but not required. 768 Coordination could be difficult or impossible unless you are in 769 control of all servers on the mirror network. 770 o (-) Requires software or configuration changes to originating 771 server. 772 o (-?) Tied to HTTP, not as generic. FTP/P2P clients won't be 773 using it unless they also support HTTP, unlike Metalink/XML. 774 o (-) Requires server-side support. Metalink/XML can be created by 775 user (or server, but server component/changes not required). 776 o (-) Also, Metalink/XML files are easily mirrored on all servers. 777 Even if usage in that case is not as transparent, this method 778 still gives access to all download information (with no changes 779 needed to servers) from all mirrors (FTP included). 780 o (-) Not portable/archivable/emailable. Metalink/XML is used to 781 import/export transfer queues. Not as easy for search engines to 782 index? 783 o (-) Not as rich metadata. 784 o (-) Not able to add multiple files to a download queue or create 785 directory structure. 787 Appendix C. Document History 789 [[ to be removed by the RFC editor before publication as an RFC. ]] 791 Known issues concerning this draft: 792 o Some organizations have many mirrors. Should all be sent, or only 793 a certain number? All should be included in the Metalink/XML, if 794 used. 795 o Using Metalink/XML for partial file cryptographic hashes. That 796 adds XML dependency to apps for an important feature. Is there a 797 better method? 799 -19 : January 20, 2011. 800 o Julian Reschke's review. 802 -18 : January 1, 2010. 804 o AD review by Alexey Melnikov. 806 -17 : September 13, 2010. 807 o RFC 5854 Metalink/XML. 809 -16 : April 16, 2010. 810 o Add draft-ietf-ftpext2-hash reference and FTP mirror coordination. 812 -15 : February 20, 2010. 813 o Update references and terminology. 815 -14 : December 31, 2009. 816 o Baseline file hash: SHA-256. 818 -13 : November 22, 2009. 819 o Metalink/XML for partial file cryptographic hashes. 821 -12 : November 11, 2009. 822 o Clarifications. 824 -11 : October 23, 2009. 825 o Mirror changes. 827 -10 : October 15, 2009. 828 o Mirror coordination changes. 830 -09 : October 13, 2009. 831 o Mirror location, coordination, and depth. 832 o Split HTTP Digest Algorithm Values Registration into 833 draft-bryan-http-digest-algorithm-values-update. 835 -08 : October 4, 2009. 836 o Clarifications. 838 -07 : September 29, 2009. 839 o Preferred mirror servers. 841 -06 : September 24, 2009. 842 o Add Mismatch Detection, Error Recovery, and Digest Algorithm 843 values. 844 o Remove Content-MD5 and Want-Digest. 846 -05 : September 19, 2009. 847 o ETags, preferably matching the Instance Digests. 849 -04 : September 17, 2009. 851 o Temporarily remove .torrent. 853 -03 : September 16, 2009. 854 o Mention HEAD request, negotiate mirrors if Want-Digest is used. 856 -02 : September 7, 2009. 857 o Content-MD5 for partial file cryptographic hashes. 859 -01 : September 1, 2009. 860 o Link Relation Type Registration: "duplicate" 862 -00 : August 24, 2009. 863 o Initial draft. 865 Authors' Addresses 867 Anthony Bryan 868 Pompano Beach, FL 869 USA 871 Email: anthonybryan@gmail.com 872 URI: http://www.metalinker.org 874 Neil McNab 876 Email: neil@nabber.org 877 URI: http://www.nabber.org 879 Henrik Nordstrom 881 Email: henrik@henriknordstrom.net 882 URI: http://www.henriknordstrom.net/ 884 Tatsuhiro Tsujikawa 885 Shiga 886 Japan 888 Email: tatsuhiro.t@gmail.com 889 URI: http://aria2.sourceforge.net 890 Dr. med. Peter Poeml 891 MirrorBrain 892 Venloer Str. 317 893 Koeln 50823 894 DE 896 Phone: +49 221 6778 333 8 897 Email: peter@poeml.de 898 URI: http://mirrorbrain.org/~poeml/ 900 Alan Ford 901 Roke Manor Research 902 Old Salisbury Lane 903 Romsey, Hampshire SO51 0ZN 904 UK 906 Phone: +44 1794 833 465 907 Email: alan.ford@roke.co.uk