idnits 2.17.1
draft-yasskin-wpack-use-cases-01.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
No issues found here.
Miscellaneous warnings:
----------------------------------------------------------------------------
== The copyright year in the IETF Trust and authors Copyright Line does not
match the current year
-- The document date (27 July 2020) is 1367 days in the past. Is this
intentional?
Checking references for intended status: Informational
----------------------------------------------------------------------------
-- Obsolete informational reference (is this intentional?): RFC 6962
(Obsoleted by RFC 9162)
-- Obsolete informational reference (is this intentional?): RFC 7540
(Obsoleted by RFC 9113)
Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
2 Network Working Group J. Yasskin
3 Internet-Draft Google
4 Intended status: Informational 27 July 2020
5 Expires: 28 January 2021
7 Use Cases and Requirements for Web Packages
8 draft-yasskin-wpack-use-cases-01
10 Abstract
12 This document lists use cases for signing and/or bundling collections
13 of web pages, and extracts a set of requirements from them.
15 Note to Readers
17 Discussion of this draft takes place on the ART area mailing list
18 (art@ietf.org), which is archived at
19 https://mailarchive.ietf.org/arch/search/?email_list=art
20 (https://mailarchive.ietf.org/arch/search/?email_list=art).
22 The source code and issues list for this draft can be found in
23 https://github.com/WICG/webpackage (https://github.com/WICG/
24 webpackage).
26 Status of This Memo
28 This Internet-Draft is submitted in full conformance with the
29 provisions of BCP 78 and BCP 79.
31 Internet-Drafts are working documents of the Internet Engineering
32 Task Force (IETF). Note that other groups may also distribute
33 working documents as Internet-Drafts. The list of current Internet-
34 Drafts is at https://datatracker.ietf.org/drafts/current/.
36 Internet-Drafts are draft documents valid for a maximum of six months
37 and may be updated, replaced, or obsoleted by other documents at any
38 time. It is inappropriate to use Internet-Drafts as reference
39 material or to cite them other than as "work in progress."
41 This Internet-Draft will expire on 28 January 2021.
43 Copyright Notice
45 Copyright (c) 2020 IETF Trust and the persons identified as the
46 document authors. All rights reserved.
48 This document is subject to BCP 78 and the IETF Trust's Legal
49 Provisions Relating to IETF Documents (https://trustee.ietf.org/
50 license-info) in effect on the date of publication of this document.
51 Please review these documents carefully, as they describe your rights
52 and restrictions with respect to this document. Code Components
53 extracted from this document must include Simplified BSD License text
54 as described in Section 4.e of the Trust Legal Provisions and are
55 provided without warranty as described in the Simplified BSD License.
57 Table of Contents
59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
60 2. Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . 4
61 2.1. Essential . . . . . . . . . . . . . . . . . . . . . . . . 4
62 2.1.1. Offline installation . . . . . . . . . . . . . . . . 4
63 2.1.2. Offline browsing . . . . . . . . . . . . . . . . . . 6
64 2.1.3. Save and share a web page . . . . . . . . . . . . . . 6
65 2.1.4. Privacy-preserving prefetch . . . . . . . . . . . . . 7
66 2.2. Nice-to-have . . . . . . . . . . . . . . . . . . . . . . 7
67 2.2.1. Packaged Web Publications . . . . . . . . . . . . . . 8
68 2.2.2. Avoiding Censorship . . . . . . . . . . . . . . . . . 9
69 2.2.3. Third-party security review . . . . . . . . . . . . . 9
70 2.2.4. Building packages from multiple libraries . . . . . . 10
71 2.2.5. Cross-CDN Serving . . . . . . . . . . . . . . . . . . 10
72 2.2.6. Pre-installed applications . . . . . . . . . . . . . 11
73 2.2.7. Protecting Users from a Compromised Frontend . . . . 12
74 2.2.8. Installation from a self-extracting executable . . . 13
75 2.2.9. Packages in version control . . . . . . . . . . . . . 13
76 2.2.10. Subresource bundling . . . . . . . . . . . . . . . . 13
77 2.2.11. Archival . . . . . . . . . . . . . . . . . . . . . . 14
78 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 15
79 3.1. Essential . . . . . . . . . . . . . . . . . . . . . . . . 15
80 3.1.1. Indexed by URL . . . . . . . . . . . . . . . . . . . 15
81 3.1.2. Request headers . . . . . . . . . . . . . . . . . . . 15
82 3.1.3. Response headers . . . . . . . . . . . . . . . . . . 15
83 3.1.4. Signing as an origin . . . . . . . . . . . . . . . . 15
84 3.1.5. Random access . . . . . . . . . . . . . . . . . . . . 16
85 3.1.6. Resources from multiple origins in a package . . . . 16
86 3.1.7. Cryptographic agility . . . . . . . . . . . . . . . . 16
87 3.1.8. Unsigned content . . . . . . . . . . . . . . . . . . 16
88 3.1.9. Certificate revocation . . . . . . . . . . . . . . . 16
89 3.1.10. Downgrade prevention . . . . . . . . . . . . . . . . 16
90 3.1.11. Metadata . . . . . . . . . . . . . . . . . . . . . . 17
91 3.1.12. Implementations are hard to get wrong . . . . . . . . 17
92 3.2. Nice to have . . . . . . . . . . . . . . . . . . . . . . 17
93 3.2.1. Streamed loading . . . . . . . . . . . . . . . . . . 17
94 3.2.2. Signing without origin trust . . . . . . . . . . . . 17
95 3.2.3. Additional signatures . . . . . . . . . . . . . . . . 17
96 3.2.4. Binary . . . . . . . . . . . . . . . . . . . . . . . 18
97 3.2.5. Deduplication of diamond dependencies . . . . . . . . 18
98 3.2.6. Old crypto can be removed . . . . . . . . . . . . . . 18
99 3.2.7. Compress transfers . . . . . . . . . . . . . . . . . 18
100 3.2.8. Compress stored packages . . . . . . . . . . . . . . 18
101 3.2.9. Subsetting and reordering . . . . . . . . . . . . . . 18
102 3.2.10. Packaged validity information . . . . . . . . . . . . 18
103 3.2.11. Signing uses existing TLS certificates . . . . . . . 18
104 3.2.12. External dependencies . . . . . . . . . . . . . . . . 19
105 3.2.13. Trailing length . . . . . . . . . . . . . . . . . . . 19
106 3.2.14. Time-shifting execution . . . . . . . . . . . . . . . 19
107 3.2.15. Service Worker integration . . . . . . . . . . . . . 19
108 4. Non-goals . . . . . . . . . . . . . . . . . . . . . . . . . . 19
109 4.1. Store confidential data . . . . . . . . . . . . . . . . . 19
110 4.2. Generate packages on the fly . . . . . . . . . . . . . . 20
111 4.3. Non-origin identity . . . . . . . . . . . . . . . . . . . 20
112 4.4. DRM . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
113 4.5. Ergonomic replacement for HTTP/2 PUSH . . . . . . . . . . 20
114 5. Security Considerations . . . . . . . . . . . . . . . . . . . 21
115 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21
116 7. Informative References . . . . . . . . . . . . . . . . . . . 21
117 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 23
118 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 23
120 1. Introduction
122 People would like to use content offline and in other situations
123 where there isn't a direct connection to the server where the content
124 originates. However, it's difficult to distribute and verify the
125 authenticity of applications and content without a connection to the
126 network. The W3C has addressed running applications offline with
127 Service Workers ([ServiceWorkers]), but not the problem of
128 distribution.
130 Previous attempts at packaging web resources (e.g. Resource Packages
131 (https://www.mnot.net/blog/2010/02/18/resource_packages) and the W3C
132 TAG's packaging proposal (https://w3ctag.github.io/packaging-on-the-
133 web/)) were motivated by speeding up the download of resources from a
134 single server, which is probably better achieved through other
135 mechanisms like HTTP/2 PUSH, possibly augmented with a simple
136 manifest of URLs a page plans to use
137 (https://lists.w3.org/Archives/Public/public-web-
138 perf/2015Jan/0038.html). This attempt is instead motivated by
139 avoiding a connection to the origin server at all. It may still be
140 useful for the earlier use cases, so they're still listed, but
141 they're not primary.
143 2. Use cases
145 These use cases are in rough descending priority order. If use cases
146 have conflicting requirements, the design should enable more
147 important use cases.
149 2.1. Essential
151 2.1.1. Offline installation
153 Alex can download a file containing a website (a PWA
154 (https://developers.google.com/web/progressive-web-apps/checklist))
155 including a Service Worker from origin "O", and transmit it to their
156 peer Bailey, and then Bailey can install the Service Worker with a
157 proof that it came from "O". This saves Bailey the bandwidth costs
158 of transferring the website.
160 There are roughly two ways to accomplish this:
162 1. Package just the Service Worker Javascript and any other
163 Javascript that it importScripts() (https://w3c.github.io/
164 ServiceWorker/#importscripts), with their URLs and enough
165 metadata to synthesize a
166 navigator.serviceWorker.register(scriptURL, options) call
167 (https://w3c.github.io/ServiceWorker/#navigator-service-worker-
168 register), along with an uninterpreted but signature-checked blob
169 of data that the Service Worker can interpret to fill in its
170 caches.
172 2. Package the resources so that the Service Worker can fetch() them
173 to populate its cache.
175 Associated requirements for just the Service Worker:
177 * Indexed by URL: The "register()" and "importScripts()" calls have
178 semantics that depend on the URL.
180 * Signing as an origin: To prove that the file came from "O".
182 * Signing uses existing TLS certificates: So "O" doesn't have to
183 spend lots of money buying a specialized certificate.
185 * Cryptographic agility: Today's algorithms will eventually be
186 obsolete and will need to be replaced.
188 * Certificate revocation: "O"'s certificate might be compromised or
189 mis-issued, and the attacker shouldn't then get an infinite
190 ability to mint packages.
192 * Downgrade prevention: "O"'s site might have an XSS vulnerability,
193 and attackers with an old signed package shouldn't be able to take
194 advantage of the XSS forever.
196 * Metadata: Just enough to generate the "register()" call, which is
197 less than a full W3C Application Manifest.
199 Additional associated requirements for packaged resources:
201 * Indexed by URL: Resources on the web are addressed by URL.
203 * Request headers: If Bailey's running a different browser from Alex
204 or has a different language configured, the "accept*" headers are
205 important for selecting which resource to use at each URL.
207 * Response headers: The meaning of a resource is heavily influenced
208 by its HTTP response headers.
210 * Resources from multiple origins in a package: So the site can be
211 built from multiple components (Section 2.2.4).
213 * Metadata: The browser needs to know which resource within a
214 package file to treat as its Service Worker and/or initial HTML
215 page.
217 2.1.1.1. Online use
219 Bailey may have an internet connection through which they can, in
220 real time, fetch updates to the package they received from Alex.
222 2.1.1.2. Fully offline use
224 Or Bailey may not have any internet connection a significant fraction
225 of the time, either because they have no internet at all, because
226 they turn off internet except when intentionally downloading content,
227 or because they use up their plan partway through each month.
229 Associated requirements beyond Offline installation:
231 * Packaged validity information: Even without a direct internet
232 connection, Bailey should be able to check that their package is
233 still valid.
235 2.1.2. Offline browsing
237 Alex can download a file containing a large website (e.g. Wikipedia)
238 from its origin, save it to transferrable storage (e.g. an SD card),
239 and hand it to their peer Bailey. Then Bailey can browse the website
240 with a proof that it came from "O". Bailey may not have the storage
241 space to copy the website before browsing it.
243 This use case is harder for publishers to support if we specialize
244 Section 2.1.1 for Service Workers since it requires the publisher to
245 adopt Service Workers before they can sign their site.
247 Associated requirements beyond Offline installation:
249 * Random access: To avoid needing a long linear scan before using
250 the content.
252 * Compress stored packages: So that more content can fit on the same
253 storage device.
255 2.1.3. Save and share a web page
257 Casey is viewing a web page and wants to save it either for offline
258 use or to show it to their friend Dakota. Since Casey isn't the web
259 page's publisher, they don't have the private key needed to sign the
260 page. Browsers currently allow their users to save pages, but each
261 browser uses a different format (MHTML, Web Archive, or files in a
262 directory), so Dakota and Casey would need to be using the same
263 browser. Casey could also take a screenshot, at the cost of losing
264 links and accessibility.
266 Associated requirements:
268 * Unsigned content: A client can't sign content as another origin.
270 * Resources from multiple origins in a package: General web pages
271 include resources from multiple origins.
273 * Indexed by URL: Resources on the web are addressed by URL.
275 * Response headers: The meaning of a resource is heavily influenced
276 by its HTTP response headers.
278 2.1.4. Privacy-preserving prefetch
280 Lots of websites link to other websites. Many of these source sites
281 would like the targets of these links to load quickly. The source
282 could use "" to prefetch the target of a link,
283 but if the user doesn't actually click that link, that leaks the fact
284 that the user saw a page that linked to the target. This can be true
285 even if the prefetch is made without browser credentials because of
286 mechanisms like TLS session IDs.
288 Because clients have limited data budgets to prefetch link targets,
289 this use case is probably limited to sites that can accurately
290 predict which link their users are most likely to click. For
291 example, search engines can predict that their users will click one
292 of the first couple results, and news aggreggation sites like Reddit
293 or Slashdot can hope that users will read the article if they've
294 navigated to its discussion.
296 Two search engines have built systems to do this with today's
297 technology: Google's AMP (https://www.ampproject.org/) and Baidu's
298 MIP (https://www.mipengine.org/) formats and caches allow them to
299 prefetch search results while preserving privacy, at the cost of
300 showing the wrong URLs for the results once the user has clicked. A
301 good solution to this problem would show the right URLs but still
302 avoid a request to the publishing origin until after the user clicks.
304 Associated requirements:
306 * Signing as an origin: To prove the content came from the original
307 origin.
309 * Streamed loading: If the user clicks before the target page is
310 fully transferred, the browser should be able to start loading
311 early parts before the source site finishes sending the whole
312 page.
314 * Compress transfers
316 * Subsetting and reordering: If a prefetched page includes
317 subresources, its publisher might want to provide and sign both
318 WebP and PNG versions of an image, but the source site should be
319 able to transfer only best one for each client.
321 2.2. Nice-to-have
322 2.2.1. Packaged Web Publications
324 The W3C's Publishing Working Group
325 (https://www.w3.org/publishing/groups/publ-wg/), merged from the
326 International Digital Publishing Forum (IDPF) and in charge of EPUB
327 maintenance, wants to be able to create publications on the web and
328 then let them be copied to different servers or to other users via
329 arbitrary protocols. See their Packaged Web Publications use cases
330 (https://www.w3.org/TR/pwp-ucr/#pwp) for more details.
332 Associated requirements:
334 * Indexed by URL: Resources on the web are addressed by URL.
336 * Signing as an origin: So that readers can be sure their copy is
337 authentic and so that copying the package preserves the URLs of
338 the content inside it.
340 * Downgrade prevention: An early version of a publication might
341 contain incorrect content, and a publisher should be able to
342 update that without worrying that an attacker can still show the
343 old content to users.
345 * Metadata: A publication can have copyright and licensing concerns;
346 a title, author, and cover image; an ISBN or DOI name; etc.; which
347 should be included when that publication is packaged.
349 Other requirements are similar to those from Offline installation:
351 * Random access: To avoid needing a long linear scan before using
352 the content.
354 * Compress stored packages: So that more content can fit on the same
355 storage device.
357 * Request headers: If different users' browsers have different
358 capabilities or preferences, the "accept*" headers are important
359 for selecting which resource to use at each URL.
361 * Response headers: The meaning of a resource is heavily influenced
362 by its HTTP response headers.
364 * Signing uses existing TLS certificates: So a publisher doesn't
365 have to spend lots of money buying a specialized certificate.
367 * Cryptographic agility: Today's algorithms will eventually be
368 obsolete and will need to be replaced.
370 * Certificate revocation: The publisher's certificate might be
371 compromised or mis-issued, and an attacker shouldn't then get an
372 infinite ability to mint packages.
374 2.2.2. Avoiding Censorship
376 Some users want to retrieve resources that their governments or
377 network providers don't want them to see. Right now, it's
378 straightforward for someone in a privileged network position to block
379 access to particular hosts, but TLS makes it difficult to block
380 access to particular resources on those hosts.
382 Today it's straightforward to retrieve blocked content from a third
383 party, but there's no guarantee that the third-party has sent the
384 user an accurate representation of the content: the user has to trust
385 the third party.
387 With signed web packages, the user can re-gain assurance that the
388 content is authentic, while still bypassing the censorship. Packages
389 don't do anything to help discover this content.
391 Systems that make censorship more difficult can also make legitimate
392 content filtering more difficult. Because the client that processes
393 a web package always knows the true URL, this forces content
394 filtering to happen on the client instead of on the network.
396 Associated requirements:
398 * Indexed by URL: So the user can see that they're getting the
399 content they expected.
401 * Signing as an origin: So that readers can be sure their copy is
402 authentic and so that copying the package preserves the URLs of
403 the content inside it.
405 2.2.3. Third-party security review
407 Some users may want to grant certain permissions only to applications
408 that have been reviewed for security by a trusted third party. These
409 third parties could provide guarantees similar to those provided by
410 the iOS, Android, or Chrome OS app stores, which might allow browsers
411 to offer more powerful capabilities than have been deemed safe for
412 unaudited websites.
414 Binary transparency for websites is similar: like with Certificate
415 Transparency [RFC6962], the transparency logs would sign the content
416 of the package to provide assurance that experts had a chance to
417 audit the exact package a client received.
419 Associated requirements:
421 * Additional signatures
423 2.2.4. Building packages from multiple libraries
425 Large programs are built from smaller components. In the case of the
426 web, components can be included either as Javascript files or as
427 "