idnits 2.17.1 draft-soilandreyes-app-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document date (January 17, 2018) is 2262 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 549 ** Obsolete normative reference: RFC 2279 (Obsoleted by RFC 3629) ** Obsolete normative reference: RFC 5785 (Obsoleted by RFC 8615) ** Obsolete normative reference: RFC 7320 (Obsoleted by RFC 8820) == Outdated reference: A later version (-17) exists of draft-kunze-bagit-14 Summary: 4 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Soiland-Reyes 3 Internet-Draft The University of Manchester 4 Intended status: Informational January 17, 2018 5 Expires: July 21, 2018 7 The app URI scheme 8 draft-soilandreyes-app-00 10 Abstract 12 This Internet-Draft proposes the "app" URI scheme for the Archive and 13 Packaging Protocol. 15 app URIs can be used to consume or reference hypermedia resources 16 bundled inside a file archive or a mobile application package, as 17 well as to resolve URIs for archive resources within a programmatic 18 framework. 20 This URI scheme provides mechanisms to generate a unique base URI to 21 represent the root of the archive, so that relative URI references in 22 a bundled resource can be resolved within the archive without having 23 to extract the archive content on the local file system. 25 An app URI can be used for purposes of isolation (e.g. when consuming 26 multiple archives), security constraints (avoiding "climb out" from 27 the archive), or for externally identiyfing sub-resources in other 28 hypermedia formats. 30 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 31 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 32 document are to be interpreted as described in [RFC2119]. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at https://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on July 21, 2018. 50 Copyright Notice 52 Copyright (c) 2018 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (https://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Background . . . . . . . . . . . . . . . . . . . . . . . . . 2 68 2. Scheme syntax . . . . . . . . . . . . . . . . . . . . . . . . 3 69 2.1. Authority . . . . . . . . . . . . . . . . . . . . . . . . 4 70 2.2. Path . . . . . . . . . . . . . . . . . . . . . . . . . . 5 71 3. Scheme semantics . . . . . . . . . . . . . . . . . . . . . . 5 72 3.1. Resolution protocol . . . . . . . . . . . . . . . . . . . 6 73 3.2. Resolving from a .well-known endpoint . . . . . . . . . . 7 74 4. Encoding considerations . . . . . . . . . . . . . . . . . . . 8 75 5. Interoperability considerations . . . . . . . . . . . . . . . 8 76 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 77 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 78 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 79 8.1. Normative References . . . . . . . . . . . . . . . . . . 10 80 8.2. Informative References . . . . . . . . . . . . . . . . . 11 81 8.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 12 82 Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 12 83 A.1. Sandboxing . . . . . . . . . . . . . . . . . . . . . . . 12 84 A.2. Origin-based . . . . . . . . . . . . . . . . . . . . . . 13 85 A.3. Hash-based . . . . . . . . . . . . . . . . . . . . . . . 14 86 A.4. Archives that are not files . . . . . . . . . . . . . . . 14 87 A.5. Resolution of packaged resources . . . . . . . . . . . . 15 88 Appendix B. History . . . . . . . . . . . . . . . . . . . . . . 15 89 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 15 91 1. Background 93 Applications that are accessing resources bundled inside a file 94 archive (e.g. zip or tar.gz) can struggle to consume hypermedia 95 content types that use relative URI references [RFC3986], as it is 96 challenging to determine the base URI in a consistent fashion. 98 Frequently the archive must be unpacked locally to synthesize base 99 URIs like "file:///tmp/a1b27ae03865/" to represent the root of the 100 archive. Such URIs are fluctual, might not be globally unique, and 101 could be vulnerable to attacks such as "climbing out" of the root 102 directory. 104 Mobile and Web applications that are distributed as packages may 105 bundle resources such as stylesheets with relative URI references to 106 images and fonts. 108 An archive containing multiple HTML or Linked Data resources, such as 109 in a BagIt archive [I-D.draft-kunze-bagit-14], may be using relative 110 URIs to cross-reference constituent files. 112 Consumptions of archives might be performed in memory or through a 113 common framework, abstracting away any local file location. 115 Consumption of an archive with a consistent base URL should be 116 possible no matter from which location it was retrieved, or on which 117 device it is inspected. 119 When consuming multiple archives from untrusted sources it would be 120 beneficial to have a Same Origin policy [RFC6454] so that relative 121 hyperlinks can't escape the particular archive. 123 The "file:" URI scheme [RFC8089] can be ill-suited for purposes such 124 as above, where a location-independent URI scheme is more flexible, 125 secure and globally unique. 127 2. Scheme syntax 129 The "app" URI scheme follows the [RFC3986] syntax for hierarchical 130 URIs according to the following production: 132 appURI = "app://" app-authority [ path-absolute ] 133 [ "?" query ] [ "#" fragment ] 135 The "app-authority" component provides a unique identifier for the 136 opened archive. See Section 2.1 for details. 138 The "path-absolute" component provides the absolute path of a 139 resource (e.g. a file or directory) within the archive. See 140 Section 2.2 for details. 142 The semantics of the "query" component is undefined by this Internet- 143 Draft. Implementations SHOULD NOT generate a query component for app 144 URIs. 146 The "fragment" component MAY be used by implementations according to 147 [RFC3986] and the implied media type [RFC2046] of the resource at the 148 path. This Internet-Draft does not specify how to determine the 149 media type. 151 2.1. Authority 153 The purpose of the "authority" component in an app URI is to build a 154 unique base URI for a particular archive. The authority is NOT 155 intended to be resolvable without former knowledge of the archive. 157 The authority of an app URI MUST be valid according to this 158 production: 160 app-authority = UUID | alg-val | authority 162 The "UUID" production match its definition in [RFC4122], e.g. 163 "2a47c495-ac70-4ed1-850b-8800a57618cf" 165 The "alg-val" production match its definition in [RFC6920], e.g. 166 "sha-256;JCS7yveugE3UaZiHCs1XpRVfSHaewxAKka0o5q2osg8" 168 The "authority" production match its definition in [RFC3986], e.g. 169 "example.com". As this production necessarily also match the "UUID" 170 and "alg-val" productions, consumers of app URIs should attempt to 171 match those first. While [RFC7320] section 2.2 says an extension may 172 not "define the structure or the semantics for URI authorities", 173 extensions of this Internet-Draft *are* permitted to do so, if using 174 a DNS domain name under their control. For instance, a vendor owning 175 "example.com" may specify that "{OID}" in "{OID}.oid.example.com" has 176 special semantics. 178 The choice of authority depends on the purpose of the app URI within 179 the implementation. Below are some recommendations: 181 1. _Sandboxing_, when independently interpreting resources in an 182 archive, the authority SHOULD be a UUID v4 [RFC4122] created with 183 a suitable random number generator [RFC4086]. This ensures with 184 high probablity that the app base URI is globally unique. An 185 application MAY choose to reuse a previously assigned UUID that 186 is associated with the archive. 188 2. _Location-based_, for referencing resources in an archive 189 accessed at a particular URL, the authority SHOULD be generated 190 as a name-based UUID v5 [RFC4122]; that is based on the SHA1 191 concatination of the URL namespace "6ba7b811-9dad- 192 11d1-80b4-00c04fd430c8" (as UUID bytes) and the ASCII bytes of 193 the particular URL. It is NOT RECOMMENDED to use this approach 194 with a file URI [RFC8089] without a fully qualified "host" name. 196 3. _Hash-based_, for referencing resources in an archive as a 197 particular bytestream, independent of its location, the authority 198 SHOULD be a checksum of the archive bytes. The checksum MUST be 199 expressed according to [RFC6920]'s "alg-val" production, and 200 SHOULD use the "sha-256" algorithm. It is NOT RECOMMENDED to use 201 truncated hash methods. 203 The generic "authority" production MAY be used for extensions if the 204 above mechanisms are not suitable. Care should be taken so that the 205 custom "authority" do not match the "UUID" nor "alg-val" productions. 207 2.2. Path 209 The "path-absolute" component MUST match the production in [RFC3986] 210 and provide the absolute path of a resource (e.g. a file or 211 directory) within the archive. 213 Archive media types vary in constraints and flexibilities of how to 214 express paths. Here we assume an archive generally consists of a 215 single root directory, which can contain multiple directories and 216 files at arbitrary nesting levels. 218 Paths SHOULD be expressed using "/" as the directory separator. The 219 below productions are from [RFC3986]: 221 path-absolute = "/" [ segment-nz *( "/" segment ) ] 222 segment = *pchar 223 segment-nz = 1*pchar 225 In an app URI, each intermediate "segment" (or "segment-nz") 226 represent a directory name, while the last segment represent either a 227 directory or file name. 229 It is RECOMMENDED to include the trailing "/" if it is known the path 230 represents a directory. 232 3. Scheme semantics 234 This Internet-Draft does not constrain what particular format might 235 constitute an _archive_, and neither does it require that the archive 236 is retrievable as a single bytestream or file. Examples of archive 237 media types include "application/zip", "application/ 238 vnd.android.package-archive", "application/x-tar", "application/ 239 x-gtar" and "application/x-7z-compressed". 241 The _authority_ component identifies the archive file. 243 The _path_ component of an app URI identify individual resources 244 within a particular archive, typically a _directory_ or _file_. 246 o If the _path_ is missing/empty - e.g. "app://833ebda2-f9a8-4462- 247 b74a-4fcdc1a02d22" - then the app URI represent the whole archive 248 file. 250 o If the _path_ is "/" - e.g. "app://833ebda2-f9a8-4462-b74a- 251 4fcdc1a02d22/" - then the app URI represent the root directory of 252 the archive. 254 o If the path ends with "/" then the path represents a directory in 255 the archive 257 The app URIs can be used for uniquely identifying the resources 258 independent of the location of the archive, such as within an 259 information system. 261 Assuming an appropriate resolution mechanism which have knowledge of 262 the corresponding archive, an app URI can also be used for 263 resolution. 265 3.1. Resolution protocol 267 This Internet-Draft do not specify directly the protocol to resolve 268 resources according to the app URI scheme. For instance, one 269 implementation might rewrite app URIs to localized "file:///" paths 270 in a temporary directory, while another implementation might use an 271 embedded HTTP server. 273 It is envisioned that an implementation will have extracted or opened 274 an archive in advance, and assigned it an appropriate authority 275 according to Section 2.1. Such an implementation can then resolve 276 app URIs programmatically, e.g. by using in-memory access or mapping 277 paths to the extracted archive on the local file system. 279 Implementations that support resolving app URIs SHOULD: 281 1. Fail with the equivalent of _Not Found_ if the authority is 282 unknown. 284 2. Fail with the equivalent of _Gone_ if the authority is known, but 285 the content of the archive is no longer available. 287 3. Fail with the equivalent of _Not Found_ if the path does not map 288 to a file or directory within the archive. 290 4. Return the corresponding (potentially uncompressed) bytestream if 291 the path maps to a file within the archive. 293 5. Return an appropriate directory listing if the path maps to a 294 directory within the archive. 296 6. Return an appropriate directory listing of the archive's root 297 directory if the path is "/" 299 7. Return the archive file if the path component is missing/empty. 301 Not all archive formats or implementations will have the concept of a 302 directory listing, in which case the directory listing SHOULD fail 303 with the equivalent of "Not Implemented". 305 It is not specified in this Internet-Draft how an implementation can 306 determine the media type of a file within an archive. This may be 307 expressed in secondary resources (such as a manifest), be determined 308 by file extensions or magic bytes. 310 The media type "text/uri-list" [RFC2483] MAY be used to represent a 311 directory listing, in which case it SHOULD contain only URIs that 312 start with the app URI of the directory. 314 Some archive formats might support resources which are neither 315 directories nor regular files (e.g. device files, symbolic links). 316 This Internet-Draft does not specify the semantics of attempting to 317 resolve such resources. 319 This Internet-Draft does not specify how to change an archive or its 320 content using app URIs. 322 3.2. Resolving from a .well-known endpoint 324 If the "authority" component of an app URI matches the "alg-val" 325 production, an application MAY attempt to resolve the authority from 326 any ".well-known/ni/" endpoint [RFC5785] as specified in [RFC6920] 327 section 4, in order to retrieve the complete archive. Applications 328 SHOULD verify the checksum of the retrieved archive before resolving 329 the individual path. 331 4. Encoding considerations 333 The production for "UUID" and "alg-val" are restricted to ASCII and 334 should not require any encoding considerations. 336 Care should be taken to %-encode the directory and file segments of 337 "path-absolute" according to [RFC3986] (for URIs) or [RFC3987] (for 338 IRIs). 340 When used as part an IRI, paths SHOULD be expressed using 341 international Unicode characters instead of %-encoding as ASCII. 343 Not all archive media types have an explicit character encoding 344 specified for their paths. If no such information is available for 345 the archive format, implementations MAY assume that the path 346 component is encoded with UTF-8 [RFC2279]. 348 Some archive media types are case-insensitive, in which cases it is 349 RECOMMENDED to preserve the casing as expressed in the archive. 351 5. Interoperability considerations 353 As multiple authorities are possible (Section 2.1), there could be 354 interoperability challenges when exchanging app URIs between 355 implementations. Some considerations: 357 1. Two implementations describe the same archive (e.g. stored in the 358 same local file path), but using different v4 UUIDs. The 359 implementations may need to detect equality of the two UUIDs out 360 of band. 362 2. Two implementations describe an archive retrieved from the same 363 URL, with the same v5 UUIDs, but retrieved at different times. 364 The implementations might disagree about the content of the 365 archive. 367 3. Two implementations describe an archive retrieved from the same 368 URL, with the same v5 UUIDs, but retrieved using different 369 content negotiation resulting in different archive 370 representations. The implementations may disagree about path 371 encoding, file name casing or hierarchy. 373 4. Two implementations describe the same archive bytestream using 374 the "alg-val" production, but they have used two different hash 375 algorithms. The implementations may need to negotiate to a 376 common hash algorithm. 378 5. An implementation describe an archive using the "alg-val" 379 production, but a second implementation concurrently modifies the 380 archive's content. The first implementation may need to detect 381 changes to the archive or verify the checksum at the end of its 382 operations. 384 6. Two implementations might have different views of the content of 385 the same archive if the format permits multiple entries with the 386 same path. Care should be taken to follow the convention and 387 specification of the particular archive format. 389 7. Two implementations that access the same archive which contain 390 file paths with Unicode characters, but they extract to two 391 different file systems. Limitations and conventions for file 392 names in the local file system (e.g. Unicode normalization, case 393 insensitivity, total path length) may result in the 394 implementations having inconsistent or inaccessible paths. 396 6. Security Considerations 398 As when handling any content, extra care should be taken when 399 consuming archives and app URIs from unknown sources. 401 An archive could contain compressed files that expand to fill all 402 available disk space. 404 A maliciously crafted archive could contain paths with characters 405 (e.g. backspace) which could make an app URI invalid or misleading if 406 used unescaped. 408 A maliciously crafted archive could contain paths (e.g. combined 409 Unicode sequences) that cause the app URI to be very long, causing 410 issues in information systems propagating said URI. 412 An archive might contain symbolic links that, if extracted to a local 413 file system, might address files outside the archive's directory 414 structure. 416 An maliciously crafted app URI might contain "../" segments, which if 417 naively converted to a "file:///" URI might address files outside the 418 archive's directory structure. 420 In particular for IRIs, an archive might contain multiple paths with 421 similar-looking characters or with different Unicode combine 422 sequences, which could be facilitated to mislead users. 424 An URI hyperlink might use or guess an app URI authority to attempt 425 to climb into a different archive for malicious purposes. 426 Applications SHOULD employ Same Orgin policy [RFC6454] checks. 428 7. IANA Considerations 430 This Internet-Draft contains the Provisional IANA registration of the 431 app URI scheme according to [RFC7595]. 433 Scheme name: app 435 Status: provisional 437 Applications/protocols that use this protocol: Hypermedia-consuming 438 application that handle archives. 440 Contact: Stian Soiland-Reyes stain@apache.org [1] 442 Change controller: Stian Soiland-Reyes 444 8. References 446 8.1. Normative References 448 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 449 Extensions (MIME) Part Two: Media Types", RFC 2046, 450 DOI 10.17487/RFC2046, November 1996, 451 . 453 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 454 Requirement Levels", BCP 14, RFC 2119, 455 DOI 10.17487/RFC2119, March 1997, 456 . 458 [RFC2279] Yergeau, F., "UTF-8, a transformation format of ISO 459 10646", RFC 2279, DOI 10.17487/RFC2279, January 1998, 460 . 462 [RFC2483] Mealling, M. and R. Daniel, "URI Resolution Services 463 Necessary for URN Resolution", RFC 2483, 464 DOI 10.17487/RFC2483, January 1999, 465 . 467 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 468 Resource Identifier (URI): Generic Syntax", STD 66, 469 RFC 3986, DOI 10.17487/RFC3986, January 2005, 470 . 472 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 473 Identifiers (IRIs)", RFC 3987, DOI 10.17487/RFC3987, 474 January 2005, . 476 [RFC4086] Eastlake 3rd, D., Schiller, J., and S. Crocker, 477 "Randomness Requirements for Security", BCP 106, RFC 4086, 478 DOI 10.17487/RFC4086, June 2005, 479 . 481 [RFC4122] Leach, P., Mealling, M., and R. Salz, "A Universally 482 Unique IDentifier (UUID) URN Namespace", RFC 4122, 483 DOI 10.17487/RFC4122, July 2005, 484 . 486 [RFC5785] Nottingham, M. and E. Hammer-Lahav, "Defining Well-Known 487 Uniform Resource Identifiers (URIs)", RFC 5785, 488 DOI 10.17487/RFC5785, April 2010, 489 . 491 [RFC6454] Barth, A., "The Web Origin Concept", RFC 6454, 492 DOI 10.17487/RFC6454, December 2011, 493 . 495 [RFC6920] Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B., 496 Keranen, A., and P. Hallam-Baker, "Naming Things with 497 Hashes", RFC 6920, DOI 10.17487/RFC6920, April 2013, 498 . 500 [RFC7320] Nottingham, M., "URI Design and Ownership", BCP 190, 501 RFC 7320, DOI 10.17487/RFC7320, July 2014, 502 . 504 [RFC7595] Thaler, D., Ed., Hansen, T., and T. Hardie, "Guidelines 505 and Registration Procedures for URI Schemes", BCP 35, 506 RFC 7595, DOI 10.17487/RFC7595, June 2015, 507 . 509 [RFC8089] Kerwin, M., "The "file" URI Scheme", RFC 8089, 510 DOI 10.17487/RFC8089, February 2017, 511 . 513 8.2. Informative References 515 [CWLViewer] 516 Robinson, M., Soiland-Reyes, S., and M. Crusoe, "Common- 517 Workflow-Language/CWLviewer: CWL Viewer", Zenodo Software, 518 DOI 10.5281/zenodo.823534, August 2017, 519 . 521 [I-D.draft-kunze-bagit-14] 522 Kunze, J., Littman, J., Madden, L., Summers, E., Boyko, 523 A., and B. Vargas, "The BagIt File Packaging Format 524 (V0.97)", draft-kunze-bagit-14 (work in progress), October 525 2016. 527 [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 528 Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, 529 . 531 [ROBundle] 532 Soiland-Reyes, S., Gamble, M., and R. Haines, "Research 533 Object Bundle 1.0", Zenodo report, 534 DOI 10.5281/zenodo.12586, November 2014, 535 . 537 [W3C.NOTE-app-uri-20150723] 538 Caceres, M., "The app: URL Scheme", World Wide Web 539 Consortium NOTE NOTE-app-uri-20150723, July 2015, 540 . 542 [W3C.NOTE-widgets-uri-20120313] 543 Caceres, M., "Widget URI scheme", World Wide Web 544 Consortium NOTE NOTE-widgets-uri-20120313, March 2012, 545 . 547 8.3. URIs 549 [1] mailto:stain@apache.org 551 Appendix A. Examples 553 A.1. Sandboxing 555 An document store application has received a file "document.tar.gz" 556 which content will be checked for consistency. 558 For sandboxing purposes it generates a UUID v4 "32a423d6-52ab-47e3- 559 a9cd-54f418a48571" using a pseudo-random generator. The app base URI 560 is thus "app://32a423d6-52ab-47e3-a9cd-54f418a48571/" 562 The archive contains the files: 564 o "./doc.html" which links to "css/base.css" 566 o "./css/base.css" which links to "../fonts/Coolie.woff" 568 o "./fonts/Coolie.woff" 569 The application generates the corresponding app URIs and uses those 570 for URI resolutions: 572 o app://32a423d6-52ab-47e3-a9cd-54f418a48571/doc.html links to 573 app://32a423d6-52ab-47e3-a9cd-54f418a48571/css/base.css 575 o app://32a423d6-52ab-47e3-a9cd-54f418a48571/css/base.css` links to 576 app://32a423d6-52ab-47e3-a9cd-54f418a48571/fonts/Coolie.woff 578 o app://32a423d6-52ab-47e3-a9cd-54f418a48571/`fonts/Coolie.woff 580 The application is now confident that all hyperlinked files are 581 indeed present in the archive. In its database it notes which ZIP 582 file corresponds to "32a423d6-52ab-47e3-a9cd-54f418a48571". 584 If the application had encountered a malicious hyperlink 585 "../../../outside.txt" it would first resolve it to the absolute URI 586 "app://32a423d6-52ab-47e3-a9cd-54f418a48571/outside.txt" and conclude 587 from the _"Not Found"_ error that the path "/outside.txt" was not 588 present in the archive. 590 A.2. Origin-based 592 A web crawler is about to index the content of the URL 593 "http://example.com/data.zip" and need to generate absolute URIs as 594 it continues crawling inside the individual resources of the archive. 596 The application generates a UUID v5 based on the URL namespace 597 "6ba7b811-9dad-11d1-80b4-00c04fd430c8" and the URL to the zip file: 599 >>> uuid.uuid5(uuid.NAMESPACE_URL, "http://example.com/data.zip") 600 UUID('b7749d0b-0e47-5fc4-999d-f154abe68065') 602 Thus the base app URI is "app://b7749d0b-0e47-5fc4-999d- 603 f154abe68065/" for indexing the ZIP content, after which the crawler 604 finds: 606 o app://b7749d0b-0e47-5fc4-999d-f154abe68065/ 608 o app://b7749d0b-0e47-5fc4-999d-f154abe68065/pics/ 610 o app://b7749d0b-0e47-5fc4-999d-f154abe68065/pics/flower.jpeg 612 When the application encounters "http://example.com/data.zip" some 613 time later it can recalculate the same base app URI. This time the 614 ZIP file has been modified upstream and the crawler finds 615 additionally: 617 o app://b7749d0b-0e47-5fc4-999d-f154abe68065/pics/cloud.jpeg 619 If files had been removed from the updated ZIP file this would be 620 trivial for the crawler to clear from its database, as it used the 621 same base URI as in last crawl. 623 A.3. Hash-based 625 An application where users can upload software distributions for 626 virus checking needs to avoid duplication as users tend to upload 627 "foo-1.2.tar" multiple times. 629 The application calculates the _sha-256_ checksum of the uploaded 630 file to be 631 "17edf80f84d478e7c6d2c7a5cfb4442910e8e1778f91ec0f79062d8cbdef42cd" in 632 hexadecimal. The _base64url_ encoding [RFC4648] of the binary 633 version of the checksum is 634 "F-34D4TUeOfG0selz7REKRDo4XePkewPeQYtjL3vQs0". 636 The corresponding "alg-val" authority is thus "sha- 637 256;F-34D4TUeOfG0selz7REKRDo4XePkewPeQYtjL3vQs0" meaning the base app 638 URL is "app://sha-256;F-34D4TUeOfG0selz7REKRDo4XePkewPeQYtjL3vQs0/" 640 The crawler finds that it's virus database already contain entries 641 for: 643 o app://sha-256;F-34D4TUeOfG0selz7REKRDo4XePkewPeQYtjL3vQs0/bin/evil 645 and flags the upload as malicious without having to scan it again. 647 A.4. Archives that are not files 649 An application is relating BagIt archives [I-D.draft-kunze-bagit-14] 650 on a shared file system, using structured folders and manifests 651 rather than individual archive files. 653 The BagIt payload manifest "/gfs/bags/scan15/manifest-md5.txt" lists 654 the files: 656 49afbd86a1ca9f34b677a3f09655eae9 data/27613-h/images/q172.png 657 408ad21d50cef31da4df6d9ed81b01a7 data/27613-h/images/q172.txt 659 The application generates a random UUID v4 "ff2d5a82-7142-4d3f-b8cc- 660 3e662d6de756" which it adds to the bag metadata file 661 "/gfs/bags/scan15/bag-info.txt" 663 External-Identifier: ff2d5a82-7142-4d3f-b8cc-3e662d6de756 664 It then generates app URIs for the files listed in the manifest: 666 app://ff2d5a82-7142-4d3f-b8cc-3e662d6de756/data/27613-h/images/q172.png 667 app://ff2d5a82-7142-4d3f-b8cc-3e662d6de756/data/27613-h/images/q172.txt 669 A.5. Resolution of packaged resources 671 A virtual file system driver on a mobile operating system has mounted 672 several packaged application for resolving common resources. An 673 application requests the rendering framework to resolve a picture 674 from "app://eb1edec9-d2eb-4736-a875-eb97b37c690e/img/logo.png" to 675 show it within a user interface. 677 The framework first checks that the authority "eb1edec9-d2eb- 678 4736-a875-eb97b37c690e" is valid to access according to the Same 679 Origin policies or permissions of the running application. It then 680 matches the authority to the corresponding application package. 682 The framework then resolves "/img/logo.png" from within that package, 683 and returns an image buffer it already had cached in memory. 685 Appendix B. History 687 This Internet-Draft proposes the URI scheme "app", which was 688 originally proposed by [W3C.NOTE-app-uri-20150723] but never 689 registered with IANA. That W3C Note evolved from 690 [W3C.NOTE-widgets-uri-20120313] which proposed the URI scheme 691 "widget". 693 Neither W3C Notes did progress further as Recommendation track 694 documents. 696 While the focus of W3C Notes was to specify how to resolve resources 697 from within a packaged application, this Internet-Draft generalize 698 the "app" URI scheme to support referencing and identifying resources 699 within any archive, and de-emphasize the retrieval mechanism. 701 For compatibility with existing adaptations of the "app" URI scheme, 702 e.g. [ROBundle] and [CWLViewer], this Internet-Draft reuse the same 703 scheme name and remains compatible with the intentions of 704 [W3C.NOTE-app-uri-20150723]. 706 Author's Address 707 Stian Soiland-Reyes 708 The University of Manchester 709 Oxford Road 710 Manchester 711 United Kingdom 713 Email: stain@apache.org