idnits 2.17.1 draft-bryan-metalinkhttp-20.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 14, 2011) is 4792 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'BITTORRENT' -- Possible downref: Non-RFC (?) normative reference: ref. 'FIPS-180-3' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3166-1' ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Obsolete normative reference: RFC 3230 (Obsoleted by RFC 9530) ** Obsolete normative reference: RFC 5988 (Obsoleted by RFC 8288) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Bryan 3 Internet-Draft N. McNab 4 Intended status: Standards Track T. Tsujikawa 5 Expires: August 18, 2011 6 P. Poeml 7 MirrorBrain 8 H. Nordstrom 9 February 14, 2011 11 Metalink/HTTP: Mirrors and Cryptographic Hashes in HTTP Header Fields 12 draft-bryan-metalinkhttp-20 14 Abstract 16 This document specifies Metalink/HTTP: Mirrors and Cryptographic 17 Hashes in HTTP header fields, a different way to get information that 18 is usually contained in the Metalink XML-based download description 19 format. Metalink/HTTP describes multiple download locations 20 (mirrors), Peer-to-Peer, cryptographic hashes, digital signatures, 21 and other information using existing standards for HTTP header 22 fields. Clients can use this information to make file transfers more 23 robust and reliable. 25 Editorial Note (To be removed by RFC Editor) 27 Discussion of this draft should take place on the HTTPBIS working 28 group mailing list (ietf-http-wg@w3.org), although this draft is not 29 a WG item. 31 The changes in this draft are summarized in Appendix C. 33 Status of this Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at http://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on August 18, 2011. 50 Copyright Notice 52 Copyright (c) 2011 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 68 1.1. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 4 69 1.2. Notational Conventions . . . . . . . . . . . . . . . . . . 4 70 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 4 71 3. Mirrors / Multiple Download Locations . . . . . . . . . . . . 6 72 3.1. Mirror Priority . . . . . . . . . . . . . . . . . . . . . 6 73 3.2. Mirror Geographical Location . . . . . . . . . . . . . . . 6 74 3.3. Coordinated Mirror Policies . . . . . . . . . . . . . . . 7 75 3.4. Mirror Depth . . . . . . . . . . . . . . . . . . . . . . . 7 76 4. Peer-to-Peer / Metainfo . . . . . . . . . . . . . . . . . . . 7 77 4.1. Metalink/XML Files . . . . . . . . . . . . . . . . . . . . 8 78 5. OpenPGP Signatures . . . . . . . . . . . . . . . . . . . . . . 8 79 6. Cryptographic Hashes of Whole Documents . . . . . . . . . . . 8 80 7. Client / Server Multi-source Download Interaction . . . . . . 9 81 7.1. Error Prevention, Detection, and Correction . . . . . . . 12 82 7.1.1. Error Prevention (Early File Mismatch Detection) . . . 12 83 7.1.2. Error Correction . . . . . . . . . . . . . . . . . . . 13 84 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 85 9. Security Considerations . . . . . . . . . . . . . . . . . . . 14 86 9.1. URIs and IRIs . . . . . . . . . . . . . . . . . . . . . . 14 87 9.2. Spoofing . . . . . . . . . . . . . . . . . . . . . . . . . 14 88 9.3. Cryptographic Hashes . . . . . . . . . . . . . . . . . . . 14 89 9.4. Signing . . . . . . . . . . . . . . . . . . . . . . . . . 14 90 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 91 10.1. Normative References . . . . . . . . . . . . . . . . . . . 15 92 10.2. Informative References . . . . . . . . . . . . . . . . . . 16 93 Appendix A. Acknowledgements and Contributors . . . . . . . . . . 16 94 Appendix B. Comparisons to Similar Options . . . . . . . . . . . 16 95 Appendix C. Document History . . . . . . . . . . . . . . . . . . 17 96 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 98 1. Introduction 100 Metalink/HTTP is an alternative and complementary representation of 101 Metalink information, which is usually presented as an XML-based 102 document format [RFC5854]. Metalink/HTTP attempts to provide as much 103 functionality as the Metalink/XML format by using existing standards 104 such as Web Linking [RFC5988], Instance Digests in HTTP [RFC3230], 105 and Entity Tags (also known as ETags) [RFC2616]. Metalink/HTTP is 106 used to list information about a file to be downloaded. This can 107 include lists of multiple URIs (mirrors), Peer-to-Peer information, 108 cryptographic hashes, and digital signatures. 110 Identical copies of a file are frequently accessible in multiple 111 locations on the Internet over a variety of protocols (such as FTP, 112 HTTP, and Peer-to-Peer). In some cases, users are shown a list of 113 these multiple download locations (mirrors) and must manually select 114 a single one on the basis of geographical location, priority, or 115 bandwidth. This distributes the load across multiple servers, and 116 should also increase throughput and resilience. At times, however, 117 individual servers can be slow, outdated, or unreachable, but this 118 can not be determined until the download has been initiated. Users 119 will rarely have sufficient information to choose the most 120 appropriate server, and will often choose the first in a list which 121 might not be optimal for their needs, and will lead to a particular 122 server getting a disproportionate share of load. The use of 123 suboptimal mirrors can lead to the user canceling and restarting the 124 download to try to manually find a better source. During downloads, 125 errors in transmission can corrupt the file. There are no easy ways 126 to repair these files. For large downloads this can be extremely 127 troublesome. Any of the number of problems that can occur during a 128 download lead to frustration on the part of users. 130 Some popular sites automate the process of selecting mirrors using 131 DNS load balancing, both to approximately balance load between 132 servers, and to direct clients to nearby servers with the hope that 133 this improves throughput. Indeed, DNS load balancing can balance 134 long-term server load fairly effectively, but it is less effective at 135 delivering the best throughput to users when the bottleneck is not 136 the server but the network. 138 This document describes a mechanism by which the benefit of mirrors 139 can be automatically and more effectively realized. All the 140 information about a download, including mirrors, cryptographic 141 hashes, digital signatures, and more can be transferred in 142 coordinated HTTP header fields hereafter referred to as a Metalink. 143 This Metalink transfers the knowledge of the download server (and 144 mirror database) to the client. Clients can fallback to other 145 mirrors if the current one has an issue. With this knowledge, the 146 client is enabled to work its way to a successful download even under 147 adverse circumstances. All this can be done without complicated user 148 interaction and the download can be much more reliable and efficient. 149 In contrast, a traditional HTTP redirect to a mirror conveys only 150 extremely minimal information - one link to one server, and there is 151 no provision in the HTTP protocol to handle failures. Furthermore, 152 in order to provide better load distribution across servers and 153 potentially faster downloads to users, Metalink/HTTP facilitates 154 multi-source downloads, where portions of a file are downloaded from 155 multiple mirrors (and optionally, Peer-to-Peer) simultaneously. 157 Upon connection to a Metalink/HTTP server, a client will receive 158 information about other sources of the same resource and a 159 cryptographic hash of the whole resource. The client will then be 160 able to request chunks of the file from the various sources, 161 scheduling appropriately in order to maximize the download rate. 163 1.1. Examples 165 This example shows a brief Metalink server response with ETag, 166 mirrors, .metalink, OpenPGP signature, and a cryptographic hash of 167 the whole file: 169 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 170 Link: ; rel=duplicate 171 Link: ; rel=duplicate 172 Link: ; rel=describedby; 173 type="application/x-bittorrent" 174 Link: ; rel=describedby; 175 type="application/metalink4+xml" 176 Link: ; rel=describedby; 177 type="application/pgp-signature" 178 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 179 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 181 1.2. Notational Conventions 183 This specification describes conformance of Metalink/HTTP. 185 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 186 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 187 document are to be interpreted as described in BCP 14, [RFC2119], as 188 scoped to those conformance targets. 190 2. Requirements 192 In this context, "Metalink" refers to Metalink/HTTP which consists of 193 mirrors and cryptographic hashes in HTTP header fields as described 194 in this document. "Metalink/XML" refers to the XML format described 195 in [RFC5854]. 197 Metalink resources include Link header fields [RFC5988] to present a 198 list of mirrors in the response to a client request for the resource. 199 Metalink servers MUST include the cryptographic hash of a resource 200 via Instance Digests in HTTP [RFC3230]. Valid algorithms are found 201 in the IANA registry named "Hypertext Transfer Protocol (HTTP) Digest 202 Algorithm Values" at 203 . 204 SHA-256 and SHA-512 were added to the registry by [RFC5843]. 206 Metalink servers are HTTP servers with one or more Metalink 207 resources. Metalink servers MUST support the Link header fields for 208 listing mirrors and MUST support Instance Digests in HTTP [RFC3230]. 209 Metalink servers MUST return the same Link header fields and Instance 210 Digests on HEAD requests. Metalink servers and their associated 211 mirror servers SHOULD all share the same ETag policy. It is up to 212 the administrator of the Metalink server to communicate the details 213 of the shared ETag policy to the administrators of the mirror servers 214 so that the mirror servers can be configured with the same ETag 215 policy. To have the same ETag policy means that ETags are 216 synchronized across servers for resources that are mirrored, i.e. 217 byte-for-byte identical files will have the same ETag on mirrors that 218 they have on the Metalink server. ETags could be based on the file 219 contents (cryptographic hash) and not server-unique filesystem 220 metadata. The emitted ETag could be implemented the same as the 221 Instance Digest for simplicity. Metalink servers SHOULD offer 222 Metalink/XML documents that contain cryptographic hashes of parts of 223 the file (and other information) if error recovery is desirable. 225 Mirror servers are typically FTP or HTTP servers that "mirror" 226 another server. That is, they provide identical copies of (at least 227 some) files that are also on the mirrored server. Mirror servers 228 SHOULD support serving partial content. HTTP mirror servers SHOULD 229 share the same ETag policy as the originating Metalink server. HTTP 230 Mirror servers SHOULD support Instance Digests in HTTP [RFC3230] 231 using the same algorithm as the Metalink server. Optimally, mirror 232 servers will share the same ETag policy and support Instance Digests 233 in HTTP. 235 Metalink clients use the mirrors provided by a Metalink server in 236 Link header fields [RFC5988]. Metalink clients MUST support HTTP and 237 SHOULD support FTP [RFC0959]. Metalink clients MAY support 238 BitTorrent [BITTORRENT], or other download methods. Metalink clients 239 SHOULD switch downloads from one mirror to another if a mirror 240 becomes unreachable. Metalink clients MAY support multi-source, or 241 parallel, downloads, where portions of a file can be downloaded from 242 multiple mirrors simultaneously (and optionally, from Peer-to-Peer 243 sources). Metalink clients MUST support Instance Digests in HTTP 244 [RFC3230] by requesting and verifying cryptographic hashes. Metalink 245 clients SHOULD support error recovery by using the cryptographic 246 hashes of parts of the file listed in Metalink/XML files. Metalink 247 clients SHOULD support checking digital signatures. 249 3. Mirrors / Multiple Download Locations 251 Mirrors are specified with the Link header fields [RFC5988] and a 252 relation type of "duplicate" as defined in Section 8. 254 This example shows a brief Metalink server response with two mirrors 255 only: 257 Link: ; rel=duplicate; 258 pri=1; pref 259 Link: ; rel=duplicate; 260 pri=2; geo=gb; depth=1 262 As some organizations can have many mirrors, it is up to the 263 organization to configure the amount of Link header fields the 264 Metalink server will provide. Such a decision could be a random 265 selection or a hard-coded limit based on network proximity, file 266 size, server load, or other factors. 268 3.1. Mirror Priority 270 Entries for mirror servers are listed in order of priority (from most 271 preferred to least) or have a "pri" value, where mirrors with lower 272 values are used first. 274 This is purely an expression of the server's preferences; it is up to 275 the client what it does with this information, particularly with 276 reference to how many servers to use at any one time. 278 3.2. Mirror Geographical Location 280 Entries for a mirror servers can have a "geo" value, which is a 281 [ISO3166-1] alpha-2 two letter country code for the geographical 282 location of the physical server the URI is used to access. A client 283 can use this information to select a mirror, or set of mirrors, that 284 are geographically near (if the client has access to such 285 information), with the aim of reducing network load at inter-country 286 bottlenecks. 288 3.3. Coordinated Mirror Policies 290 There are two types of mirror servers: preferred and normal. 291 Preferred mirror servers are HTTP mirror servers that MUST share the 292 same ETag policy as the originating Metalink server and/or MUST 293 provide Instance Digests using the same algorithm as the Metalink 294 server. Preferred mirrors make it possible to detect early on, 295 before data is transferred, if the file requested matches the desired 296 file. Entries for preferred HTTP mirror servers have a "pref" value. 297 By default, if unspecified then mirrors are considered "normal" and 298 do not necessarily share the same ETag policy or support Instance 299 Digests using the same algorithm as the Metalink server. FTP mirrors 300 are considered "normal", as they do not emit ETags or support 301 Instance Digests. 303 3.4. Mirror Depth 305 Some mirrors can mirror single files, whole directories, or multiple 306 directories. 308 Entries for mirror servers can have a "depth" value, where "depth=0" 309 is the default. A value of 0 means ONLY that file is mirrored and 310 that other URI path segments are not. A value of 1 means that file 311 and all other files and URI path segments contained in the rightmost 312 URI path segment are mirrored. For values of N, the client will go 313 up N-1 URI path segments above. A value of 2 means means going up 314 one URI path segment above, and all files and URI path segments 315 contained are mirrored. For each higher value, another URI path 316 segment closer to the Host is mirrored. 318 This example shows a mirror with a depth value of 4: 320 Link: ; 321 rel=duplicate; pri=1; pref; depth=4 323 In the above example, 4 URI path segments up are mirrored, from 324 /dir2/ on down. 326 4. Peer-to-Peer / Metainfo 328 Entries for metainfo files, which describe ways to download a file 329 over Peer-to-Peer networks or otherwise, are specified with the Link 330 header fields [RFC5988] and a relation type of "describedby" and a 331 type parameter that indicates the MIME type of the metadata available 332 at the URI. Since metainfo files can sometimes describe multiple 333 files, or the filename may not be the same on the Metalink server and 334 in the metainfo file but still have the same content, an optional 335 name parameter can be used. 337 This example shows a brief Metalink server response with .torrent and 338 .metalink: 340 Link: ; rel=describedby; 341 type="application/x-bittorrent"; name="differentname.ext" 342 Link: ; rel=describedby; 343 type="application/metalink4+xml" 345 Metalink clients MAY support the use of metainfo files for 346 downloading files. 348 4.1. Metalink/XML Files 350 Full Metalink/XML files for a given resource can be specified as 351 shown in the example in Section 4. This is particularly useful for 352 providing metadata such as cryptographic hashes of parts of a file, 353 allowing a client to recover from errors (see Section 7.1.2). 354 Metalink servers SHOULD provide Metalink/XML files with partial file 355 hashes in Link header fields and Metalink clients SHOULD use them for 356 error recovery. 358 5. OpenPGP Signatures 360 OpenPGP signatures [RFC3156] are specified with the Link header 361 fields [RFC5988] and a relation type of "describedby" and a type 362 parameter of "application/pgp-signature". 364 This example shows a brief Metalink server response with OpenPGP 365 signature only: 367 Link: ; rel=describedby; 368 type="application/pgp-signature" 370 Metalink clients SHOULD support the use of OpenPGP signatures. 372 6. Cryptographic Hashes of Whole Documents 374 If Instance Digests are not provided by the Metalink servers, the 375 Link header fields pertaining to this specification MUST be ignored. 377 This example shows a brief Metalink server response with ETag, 378 mirror, and cryptographic hash: 380 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 381 Link: ; rel=duplicate 382 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 383 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 385 7. Client / Server Multi-source Download Interaction 387 Metalink clients begin a download with a standard HTTP [RFC2616] GET 388 request to the Metalink server. Metalink clients MAY use a Range 389 limit if desired. 391 GET /distribution/example.ext HTTP/1.1 392 Host: www.example.com 394 The Metalink server responds with the data and these header fields: 396 HTTP/1.1 200 OK 397 Accept-Ranges: bytes 398 Content-Length: 14867603 399 Content-Type: application/x-cd-image 400 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 401 Link: ; rel=duplicate; pref 402 Link: ; rel=duplicate 403 Link: ; rel=describedby; 404 type="application/x-bittorrent" 405 Link: ; rel=describedby; 406 type="application/metalink4+xml" 407 Link: ; rel=describedby; 408 type="application/pgp-signature" 409 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 410 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 412 Alternatively, Metalink clients can begin with a HEAD request to the 413 Metalink server to discover mirrors via Link header fields, and then 414 skip to making the following decisions on every available mirror 415 server found via the Link header fields. 417 After that, the client follows with a GET request to the desired 418 mirrors. 420 From the Metalink server response the client learns some or all of 421 the following metadata about the requested object, in addition to 422 also starting to receive the object: 424 o Mirror profile link, which can describe the mirror's priority, 425 whether it shares the ETag policy of the originating Metalink 426 server, geographical location, and mirror depth. 427 o Instance Digest, which is the whole file cryptographic hash. 428 o ETag. 429 o Object size from the Content-Length header field. 430 o Metalink/XML, which can include partial file cryptographic hashes 431 to repair a file. 432 o Peer-to-peer information. 433 o Digital signature. 435 Next, the Metalink client requests a Range of the object from a 436 preferred mirror server, so it can use If-Match conditions: 438 GET /example.ext HTTP/1.1 439 Host: www2.example.com 440 Range: bytes=7433802- 441 If-Match: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 442 Referer: http://www.example.com/distribution/example.ext 444 Here, the preferred mirror server has the correct file (the If-Match 445 conditions match) and responds with a 206 Partial Content HTTP status 446 code and appropriate "Content-Length", "Content Range", ETag, and 447 Instance Digest header fields. In this example, the mirror server 448 responds, with data, to the above request: 450 HTTP/1.1 206 Partial Content 451 Accept-Ranges: bytes 452 Content-Length: 7433801 453 Content-Range: bytes 7433802-14867602/14867603 454 Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" 455 Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO 456 DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== 458 If the object is large and gets delivered slower than expected, then 459 the Metalink client MAY start a number of parallel ranged downloads 460 (one per selected mirror server other than the first) using mirrors 461 provided by the Link header fields with "duplicate" relation type. 462 Metalink clients SHOULD use the location of the original GET request 463 in the "Referer" header field for these ranged requests. 465 The Metalink client can determine the size and number of ranges 466 requested from each server, based upon the type and number of mirrors 467 and performance observed from each mirror. Note that Range requests 468 impose an overhead on servers and clients need to be aware of that 469 and not abuse them. Metalink clients SHOULD NOT make more than one 470 concurrent Range request to each mirror server that it downloads 471 from. 473 Metalink clients SHOULD close all but the fastest connection if any 474 Ranged requests generated after the first request end up with a 475 complete response, instead of a partial response (as some mirrors 476 might not support HTTP ranges), if the goal is the fastest transfer. 477 Metalink clients MAY monitor mirror conditions and dynamically switch 478 between mirrors to achieve the fastest download possible. Similarly, 479 Metalink clients SHOULD abort extremely slow or stalled range 480 requests and finish the request on other mirrors. If all ranges have 481 finished except for the final one, the Metalink client can split the 482 final range into multiple range requests to other mirrors so the 483 transfer finishes faster. 485 If the first request was GET and no Range header field was sent and 486 the client determines later that it will issue a Range request, then 487 the client SHOULD close the first connection when it catches up with 488 the other parallel ranged downloads of the same object. This means 489 the first connection was sacrificed. Metalink clients can use a HEAD 490 request first, if possible, so that the client can find out if there 491 are any Link header fields, and then Range-based requests are 492 undertaken to the mirror servers without sacrificing a first 493 connection. 495 Preferred mirrors have coordinated ETags, as described in 496 Section 3.3, and Metalink clients SHOULD use If-Match conditions 497 based on the ETag to quickly detect out-of-date mirrors by using the 498 ETag from the Metalink server response. Optimally, the mirror server 499 will include an Instance Digest in the mirror response to the client 500 GET request, which the client can also use to detect a mismatch 501 early. If the mirror did not include the pref parameter or an 502 Instance Digest, then a mismatch can not be detected until the 503 completed object is verified. Early file mismatch detection is 504 described in detail in Section 7.1.1. 506 Metalink clients MUST reject downloads from mirrors where the file 507 size does not match the file size as reported by the Metalink server. 509 Metalink clients MUST reject downloads from mirrors that support 510 Instance Digests if the Instance Digest from the mirror does not 511 match the Instance Digest as reported by the Metalink server and the 512 same algorithm is used. 514 If a Metalink client does not support certain download methods (such 515 as FTP or BitTorrent) that a file is available from, and there are no 516 available download methods that the client supports, then the 517 download will have no way to complete. 519 Metalink clients MUST verify the cryptographic hash of the file once 520 the download has completed. If the cryptographic hash offered by the 521 Metalink server with Instance Digests does not match the 522 cryptographic hash of the downloaded file, see Section 7.1.2 for a 523 possible way to repair errors. 525 If the download can not be repaired, it is considered corrupt. The 526 client can attempt to re-download the file. 528 7.1. Error Prevention, Detection, and Correction 530 Error prevention, or early file mismatch detection, is possible 531 before file transfers with the use of file sizes, ETags, and Instance 532 Digests provided by Metalink servers. Error detection requires 533 Instance Digests to detect errors in transfer after the transfers 534 have completed. Error correction, or download repair, is possible 535 with partial file cryptographic hashes. 537 Note that cryptographic hashes obtained from Instance Digests are in 538 base64 encoding, while those from Metalink/XML are in hexadecimal. 540 7.1.1. Error Prevention (Early File Mismatch Detection) 542 In HTTP terms, the merging of ranges from multiple responses can be 543 verified with a strong validator, which in this context is either an 544 Instance Digest or a shared ETag. In most cases, it is sufficient 545 that the Metalink server provides mirrors and Instance Digest 546 information, but operation will be more robust and efficient if the 547 mirror servers do implement a shared ETag policy or Instance Digests 548 as well. There is no need to specify how the ETag is generated, just 549 that it needs to be shared between the Metalink server and the mirror 550 servers. The benefit of having mirror servers return an Instance 551 Digest is that the client then can detect mismatches early even if 552 ETags are not used. Mirrors that support both a shared ETag and 553 Instance Digests do provide value, but just one is sufficient for 554 early detection of mismatches. If the mirror server provides neither 555 shared ETag nor Instance Digest, then early detection of mismatches 556 is not possible unless file length also differs. Finally, errors are 557 still detectable after the download has completed, when the 558 cryptographic hash of the merged response is verified. 560 ETags can not be used for verifying the integrity of the received 561 content. But it is a guarantee issued by the Metalink server that 562 the content is correct for that ETag. And if the ETag given by the 563 mirror server matches the ETag given by the Metalink server, then 564 there is a chain of trust where the Metalink server authorizes these 565 responses as valid for that object. 567 This guarantees that a mismatch will be detected by using only the 568 shared ETag from a Metalink server and mirror server. Mirror servers 569 will respond with an error if ETags do not match, which will prevent 570 accidental merges of ranges from different versions of files with the 571 same name. 573 A shared ETag or Instance Digest can not strictly protect against 574 malicious attacks or server or network errors replacing content. An 575 attacker can make a mirror server seemingly respond with the expected 576 Instance Digest or ETags even if the file contents have been 577 modified. The same goes for various system failures which would also 578 cause bad data (i.e. corrupted files) to be returned. The Metalink 579 client has to rely on the Instance Digest returned by the Metalink 580 server in the first response for the verification of the downloaded 581 object as a whole. 583 7.1.2. Error Correction 585 Partial file cryptographic hashes can be used to detect errors during 586 the download. Metalink servers SHOULD provide Metalink/XML files 587 with partial file hashes in Link header fields as specified in 588 Section 4.1, and Metalink clients SHOULD use them for error 589 correction. 591 If the cryptographic hash of the object does not match the Instance 592 Digest from the Metalink server, then the client SHOULD fetch the 593 Metalink/XML (if available) that could contain partial file 594 cryptographic hashes which will allow detection of which mirror 595 server returned incorrect data. Metalink clients SHOULD figure out 596 what ranges of the downloaded data can be recovered and what needs to 597 be fetched again. 599 Other methods can be used for error correction. For example, some 600 other metainfo files also include partial file hashes that can be 601 used to check for errors. 603 8. IANA Considerations 605 Accordingly, IANA will make the following registration to the Link 606 Relation Type registry at . 609 o Relation Name: duplicate 611 o Description: Refers to a resource whose available representations 612 are byte-for-byte identical with the corresponding representations of 613 the context IRI. 615 o Reference: This specification. 617 o Notes: This relation is for static resources. That is, an HTTP GET 618 request on any duplicate will return the same representation. It 619 does not make sense for dynamic or POSTable resources and should not 620 be used for them. 622 9. Security Considerations 624 9.1. URIs and IRIs 626 Metalink clients handle URIs and IRIs. See Section 7 of [RFC3986] 627 and Section 8 of [RFC3987] for security considerations related to 628 their handling and use. 630 9.2. Spoofing 632 There is potential for spoofing attacks where the attacker publishes 633 Metalinks with false information. In that case, this could deceive 634 unaware downloaders into downloading a malicious or worthless file. 635 As with all downloads, users should only download from trusted 636 sources. Also, malicious publishers could attempt a distributed 637 denial of service attack by inserting unrelated URIs into Metalinks. 639 9.3. Cryptographic Hashes 641 Currently, some of the digest values defined in Instance Digests in 642 HTTP [RFC3230] are considered insecure. These include the whole 643 Message Digest family of algorithms which are not suitable for 644 cryptographically strong verification. Malicious people could 645 provide files that appear to be identical to another file because of 646 a collision, i.e. the weak cryptographic hashes of the intended file 647 and a substituted malicious file could match. 649 If a Metalink contains whole file hashes as described in Section 6, 650 it SHOULD include SHA-256, as specified in [FIPS-180-3], or stronger. 651 It MAY also include other hashes. 653 9.4. Signing 655 Metalinks SHOULD include digital signatures, as described in 656 Section 5. 658 Digital signatures provide authentication, message integrity, and 659 non-repudiation with proof of origin. 661 10. References 662 10.1. Normative References 664 [BITTORRENT] 665 Cohen, B., "The BitTorrent Protocol Specification", 666 BITTORRENT 11031, February 2008, 667 . 669 [FIPS-180-3] 670 National Institute of Standards and Technology (NIST), 671 "Secure Hash Standard (SHS)", FIPS PUB 180-3, 672 October 2008. 674 [ISO3166-1] 675 International Organization for Standardization, "ISO 3166- 676 1:2006. Codes for the representation of names of 677 countries and their subdivisions -- Part 1: Country 678 codes", November 2006. 680 [RFC0959] Postel, J. and J. Reynolds, "File Transfer Protocol", 681 STD 9, RFC 0959, October 1985. 683 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 684 Requirement Levels", BCP 14, RFC 2119, March 1997. 686 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 687 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 688 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 690 [RFC3156] Elkins, M., Del Torto, D., Levien, R., and T. Roessler, 691 "MIME Security with OpenPGP", RFC 3156, August 2001. 693 [RFC3230] Mogul, J. and A. Van Hoff, "Instance Digests in HTTP", 694 RFC 3230, January 2002. 696 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 697 Resource Identifier (URI): Generic Syntax", STD 66, 698 RFC 3986, January 2005. 700 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 701 Identifiers (IRIs)", RFC 3987, January 2005. 703 [RFC5854] Bryan, A., Tsujikawa, T., McNab, N., and P. Poeml, "The 704 Metalink Download Description Format", RFC 5854, 705 June 2010. 707 [RFC5988] Nottingham, M., "Web Linking", RFC 5988, October 2010. 709 10.2. Informative References 711 [RFC5843] Bryan, A., "Additional Hash Algorithms for HTTP Instance 712 Digests", RFC 5843, April 2010. 714 Appendix A. Acknowledgements and Contributors 716 Thanks to the Metalink community, Alexey Melnikov, Julian Reschke, 717 Mark Nottingham, Daniel Stenberg, Matt Domsch, Micah Cowan, David 718 Morris, Yves Lafon, Juergen Schoenwaelder, Ben Campbell, and the 719 HTTPBIS Working Group. 721 Thanks to Alan Ford and Mark Handley for spurring us on to publish 722 this document. 724 Appendix B. Comparisons to Similar Options 726 [[ to be removed by the RFC editor before publication as an RFC. ]] 728 This draft, compared to the Metalink/XML format [RFC5854] : 730 o (+) Reuses existing HTTP standards without much new besides a Link 731 Relation Type. It's more of a collection/coordinated feature set. 732 o (?) The existing standards don't seem to be widely implemented. 733 o (+) No XML dependency, except for Metalink/XML for partial file 734 cryptographic hashes. 735 o (+) Existing Metalink/XML clients can be easily converted to 736 support this as well. 737 o (+) Coordination of mirror servers is preferred, but not required. 738 Coordination could be difficult or impossible unless one group is 739 in control of all servers on the mirror network. 740 o (-) Requires software or configuration changes to originating 741 server. 742 o (-?) Tied to HTTP, not as generic. FTP/P2P clients won't be 743 using it unless they also support HTTP, unlike Metalink/XML. 744 o (-) Requires server-side support. Metalink/XML can be created by 745 user (or server, but server component/changes not required). 746 o (-) Also, Metalink/XML files are easily mirrored on all servers. 747 Even if usage in that case is not as transparent, this method 748 still gives access to all download information (with no changes 749 needed to servers) from all mirrors (FTP included). 750 o (-) Not portable/archivable/emailable. Metalink/XML is used to 751 import/export transfer queues. Not as easy for search engines to 752 index? 754 o (-) Not as rich metadata. 755 o (-) Not able to add multiple files to a download queue or create 756 directory structure. 758 Appendix C. Document History 760 [[ to be removed by the RFC editor before publication as an RFC. ]] 762 Known issues concerning this draft: 763 o None. 765 -20 : January , 2011. 766 o Yves Lafon's apps-team review, Juergen Schoenwaelder's secdir 767 review, Ben Campbell's Gen-ART review. 769 -19 : January 20, 2011. 770 o Julian Reschke's review. 772 -18 : January 1, 2010. 773 o AD review by Alexey Melnikov. 775 -17 : September 13, 2010. 776 o RFC 5854 Metalink/XML. 778 -16 : April 16, 2010. 779 o Add draft-ietf-ftpext2-hash reference and FTP mirror coordination. 781 -15 : February 20, 2010. 782 o Update references and terminology. 784 -14 : December 31, 2009. 785 o Baseline file hash: SHA-256. 787 -13 : November 22, 2009. 788 o Metalink/XML for partial file cryptographic hashes. 790 -12 : November 11, 2009. 791 o Clarifications. 793 -11 : October 23, 2009. 794 o Mirror changes. 796 -10 : October 15, 2009. 797 o Mirror coordination changes. 799 -09 : October 13, 2009. 801 o Mirror location, coordination, and depth. 802 o Split HTTP Digest Algorithm Values Registration into 803 draft-bryan-http-digest-algorithm-values-update. 805 -08 : October 4, 2009. 806 o Clarifications. 808 -07 : September 29, 2009. 809 o Preferred mirror servers. 811 -06 : September 24, 2009. 812 o Add Mismatch Detection, Error Recovery, and Digest Algorithm 813 values. 814 o Remove Content-MD5 and Want-Digest. 816 -05 : September 19, 2009. 817 o ETags, preferably matching the Instance Digests. 819 -04 : September 17, 2009. 820 o Temporarily remove .torrent. 822 -03 : September 16, 2009. 823 o Mention HEAD request, negotiate mirrors if Want-Digest is used. 825 -02 : September 7, 2009. 826 o Content-MD5 for partial file cryptographic hashes. 828 -01 : September 1, 2009. 829 o Link Relation Type Registration: "duplicate" 831 -00 : August 24, 2009. 832 o Initial draft. 834 Authors' Addresses 836 Anthony Bryan 837 Pompano Beach, FL 838 USA 840 Email: anthonybryan@gmail.com 841 URI: http://www.metalinker.org 842 Neil McNab 844 Email: neil@nabber.org 845 URI: http://www.nabber.org 847 Tatsuhiro Tsujikawa 848 Shiga 849 Japan 851 Email: tatsuhiro.t@gmail.com 852 URI: http://aria2.sourceforge.net 854 Dr. med. Peter Poeml 855 MirrorBrain 856 Venloer Str. 317 857 Koeln 50823 858 DE 860 Phone: +49 221 6778 333 8 861 Email: peter@poeml.de 862 URI: http://mirrorbrain.org/~poeml/ 864 Henrik Nordstrom 866 Email: henrik@henriknordstrom.net 867 URI: http://www.henriknordstrom.net/