idnits 2.17.1 draft-iab-escape-report-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 18, 2019) is 1680 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 918 -- Looks like a reference, but probably isn't: '2' on line 920 -- Looks like a reference, but probably isn't: '3' on line 1122 -- Looks like a reference, but probably isn't: '4' on line 1137 -- Looks like a reference, but probably isn't: '5' on line 1149 -- Obsolete informational reference (is this intentional?): RFC 7230 (ref. 'HTTP') (Obsoleted by RFC 9110, RFC 9112) == Outdated reference: A later version (-09) exists of draft-yasskin-http-origin-signed-responses-06 Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Thomson 3 Internet-Draft 4 Intended status: Informational M. Nottingham 5 Expires: March 21, 2020 September 18, 2019 7 Report from the IAB Workshop on Exploring Synergy between Content 8 Aggregation and the Publisher Ecosystem (ESCAPE) 9 draft-iab-escape-report-00 11 Abstract 13 The Exploring Synergy between Content Aggregation and the Publisher 14 Ecosystem (ESCAPE) Workshop was convened by the Internet Architecture 15 Board (IAB) in July 2019. This report summarizes its significant 16 points of discussion and identifies topics that may warrant further 17 consideration. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at https://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on March 21, 2020. 36 Copyright Notice 38 Copyright (c) 2019 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (https://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 54 1.1. Mention of Specific Entities . . . . . . . . . . . . . . 3 55 2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 2.1. Instant Navigation . . . . . . . . . . . . . . . . . . . 4 57 2.2. Offline Content Sharing . . . . . . . . . . . . . . . . . 5 58 2.3. Other Use Cases . . . . . . . . . . . . . . . . . . . . . 5 59 2.3.1. Book Publishing . . . . . . . . . . . . . . . . . . . 6 60 2.3.2. Web Archiving . . . . . . . . . . . . . . . . . . . . 7 61 3. Interactions Between Web Publishers and Aggregators . . . . . 8 62 3.1. Incentives for Web Packages . . . . . . . . . . . . . . . 8 63 3.2. Operational Costs . . . . . . . . . . . . . . . . . . . . 9 64 3.3. Content Regulation . . . . . . . . . . . . . . . . . . . 9 65 3.4. Web Performance . . . . . . . . . . . . . . . . . . . . . 10 66 4. Systemic Effects . . . . . . . . . . . . . . . . . . . . . . 11 67 4.1. Consolidation . . . . . . . . . . . . . . . . . . . . . . 11 68 4.1.1. Consolidation of Power in Linking Sites . . . . . . . 11 69 4.1.2. Consolidation of Power in Publishers . . . . . . . . 12 70 4.1.3. Consolidation of User Preferences . . . . . . . . . . 12 71 4.2. Effect on Web Security . . . . . . . . . . . . . . . . . 13 72 4.3. Privacy of Content . . . . . . . . . . . . . . . . . . . 14 73 5. AMP Issues Unrelated to Web Packaging . . . . . . . . . . . . 15 74 5.1. AMP Governance . . . . . . . . . . . . . . . . . . . . . 15 75 5.2. Constraints on the AMP Format . . . . . . . . . . . . . . 16 76 5.3. Performance . . . . . . . . . . . . . . . . . . . . . . . 16 77 5.4. Implementation of Paywalls . . . . . . . . . . . . . . . 16 78 6. Venues for Future Discussion . . . . . . . . . . . . . . . . 17 79 7. Security Considerations . . . . . . . . . . . . . . . . . . . 17 80 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 81 8.1. Informative References . . . . . . . . . . . . . . . . . 17 82 8.2. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 20 83 Appendix A. About the Workshop . . . . . . . . . . . . . . . . . 20 84 A.1. Agenda . . . . . . . . . . . . . . . . . . . . . . . . . 20 85 A.1.1. Thursday 2019-07-18 . . . . . . . . . . . . . . . . . 20 86 A.1.2. Friday 2019-07-19 . . . . . . . . . . . . . . . . . . 21 87 A.2. Workshop Attendees . . . . . . . . . . . . . . . . . . . 21 88 Appendix B. Web Packaging Overview . . . . . . . . . . . . . . . 22 89 B.1. Authority in HTTPS . . . . . . . . . . . . . . . . . . . 23 90 B.2. Authority in Web Packaging . . . . . . . . . . . . . . . 23 91 B.3. Applicability . . . . . . . . . . . . . . . . . . . . . . 24 92 B.4. The AMP Format, Google Search Results, and Web Packaging 24 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25 95 1. Introduction 97 The IAB convened this workshop to examine some proposed changes to 98 the Internet and the Web, and their potential effects on the Internet 99 publishing landscape. Of particular interest was the Web Packaging 100 proposal from Google, under consideration in the IETF, the W3C's Web 101 Incubator Community Group (WICG), and the Web Hypertext Application 102 Technology Working Group (WHATWG). 104 In considering these proposals, we heard about both positive effects 105 of Web Packaging, and concerns that it could have significant effects 106 on the relationship between publishers (e.g., news Web sites) and 107 content aggregators (e.g., search engines and social networks). As 108 such, our focus was primarily on this relationship, rather than being 109 a technical discussion. 111 Online publishers do not regularly participate in standards 112 activities directly. A Workshop format was used to solicit input 113 from them. The workshop had 27 participants from a diverse set of 114 backgrounds, including a small number of attendees from publishers, 115 one aggregator (Google), plus representatives from browsers, the AMP 116 community, CDNs, network operators, academia, and standards bodies. 117 See the Workshop Call for Participation [CFP] for more information 118 and a complete listing of submissions. 120 As intended, the Workshop was primarily a forum for discussion, so it 121 did not reach definite conclusions. Instead, this report is the 122 primary output of the Workshop, as a record of that discussion. 124 This report documents the use cases discussed in Section 2 and 125 explains the interactions between publishers and aggregators that 126 might be affected by it in Section 3. Appendix A includes more 127 details about the Workshop itself. For those unfamiliar with Web 128 Packaging, Appendix B provides a summary as background material. 130 1.1. Mention of Specific Entities 132 Participants agreed to conduct the Workshop under the Chatham House 133 Rule [CHATHAM-HOUSE], so this report does not attribute statements to 134 individuals or organizations without express permission. Submissions 135 to the Workshop were public, and thus attributable; they are used 136 here to provide substance and context. 138 2. Use Cases 140 Much of the Workshop concentrated on discussion of the validity and 141 relative merits of the use cases that might be enabled by Web 142 Packaging. See Appendix B for an overview of what Web Packaging is. 144 2.1. Instant Navigation 146 The largest use of Web Packaging so far is in Google Search, where 147 packages are intended to improve the perceived performance of 148 navigation to pages that are linked from search results when 149 "clicked". 151 To enable this, when a linking (or referring) web page includes links 152 to pages on another site, it also provides the browser with a 153 packaged copy of the target content, signed by the origin of the 154 target content. In effect, the referring page provides a cache for 155 the target page's content. If navigation to one of those links 156 occurs, having the Web Package gives a browser the assurance that the 157 cache didn't change the content, so it can treat that content as if 158 it were acquired directly from the server for the target page - even 159 though it came from a different server. In many cases, this results 160 in significantly lower perceived delay in displaying the target page. 162 A vital characteristic of this technique is that the browser does not 163 contact the target site before navigation. The browser does not make 164 any requests to sites until after navigation occurs, and only then if 165 the site requires additional content or makes a request directly. 167 Similar improvements could also be realized by downloading content 168 (packaged or otherwise) directly from the target site through a 169 technique called prefetching. However, doing so would reveal 170 information about the user's activity on the linking page to those 171 sites - even when the user never actually navigates to it. 173 Note: This technique that uses Web Packaging is also referred to as 174 "privacy-preserving prefetch". This document avoids that term as 175 there was some contention at the workshop about what aspects of 176 privacy might be preserved by the technique. 178 Sites bundled with Web Packaging can additionally be constructed in a 179 way that ensures that they render without needing any additional 180 network access. This makes it possible to provide near-instantaneous 181 navigation. The proposed changes to web navigation in support of 182 loading Web Packages is designed to support this use case. 184 Workshop participants recognized the value of web performance for 185 usability, as well as for business metrics like retention and bounce 186 rates. Such improvements were seen as a valuable goal, but 187 publishers raised questions about whether they justified the cost of 188 supporting an additional format, while others raised concerns about 189 different aspects of the Web Packaging proposal. 191 2.2. Offline Content Sharing 193 Another primary use case discussed was the ability to share Web 194 content between devices where neither has an active connection to the 195 Internet. One of the stated goals of Web Packaging is to enable 196 sharing of content offline. 198 Several participants reported that in areas where Internet access is 199 expensive, slow, or intermittent, the use of direct peer-to-peer file 200 exchange (e.g., "saving a Web site and sharing it on a USB stick") is 201 commonplace. Most Web browsers already have some affordances for 202 this, but these are recognized as in need of improvements. 204 In the discussion, several rejected an assumed requirement of this 205 use case - that there be no difference between the treatment of a 206 "normal" Web page and that of one loaded from an offline Web Package. 208 The ability for a Web Package to provide clear attribution for 209 content was seen as valuable by some participants for a range of 210 reasons. However, reservations were expressed about the subtleties 211 of the properties that signatures provide and the effect of this on 212 Web security; see also Section 4.2 and Section 2.3.2. 214 Many participants pointed out that using "unsigned bundles" - that 215 is, Web Packages without Signed Exchanges - could be adequate for 216 this use case, since most users don't need cryptographic proof of the 217 site's identity. However, some expressed concerns that this might 218 worsen the propagation of falsehood. 220 Some suggested that the value of Signed Exchanges was not realized in 221 small-scale interpersonal exchange of information, but in the 222 building of systems for content delivery that might include 223 capabilities like discovery and automated distribution. The 224 contention here was that effective use of digital signatures in 225 offline distribution of content implied considerably more 226 infrastructure than was described in current proposals. 228 No definite conclusions about offline sharing were reached during the 229 workshop. 231 2.3. Other Use Cases 233 A session on the second morning concentrated on two other significant 234 potential use cases for Web Packages: book publishing and Web 235 archiving. These were not seen as "primary" by the proponents of Web 236 Packaging; the original intent was not to spend significant time on 237 these subjects, but there was considerable interest from attendees. 239 2.3.1. Book Publishing 241 The potential application of a packaging format to book publishing 242 was discussed, with particular reference to ways that books differ 243 from web content. Specialists from that industry pointed out that 244 book delivery can vary greatly from typical web content delivery. 246 Workshop participants briefly explored existing solutions. PDF was 247 seen as particularly challenging for this use case, due to its 248 limitations, and EPUB has constraints that also make it challenging 249 for publishers. 251 Although Web Packaging might help to address this use case, the 252 question of how to identify book content was not resolved. The use 253 of Signed Exchanges in this context might offer means of tying 254 content in books to a Web site, but several limitations inherent in 255 doing that were identified. 257 In particular, book publication specialists represented that books 258 don't have the same requirements for timeliness or currency as web 259 pages. For instance, Dave Cramer's submission [CRAMER] observed that 260 Moby Dick was published over 61,000 days ago, which is considerably 261 longer than the proposed limit of 7 days for Signed Exchanges. The 262 limited length of time that a Web Package can be considered valid was 263 discussed at some length. 265 Additionally, the risk of a publisher going out of business during 266 the lifetime of a book is significant, because books - at least 267 successful ones - often span generations in their applicability. To 268 that end, having a means of attributing content to a publisher was 269 considered less practical, and potentially undesirable (much like the 270 discussion above regarding "unsigned bundles"). 272 There were other aspects of book publication that participants saw as 273 challenging for packaging. For example, it is currently not 274 understood what it is to refer to distinct parts of a book. 275 Participants saw this as an area where providing stable references 276 for bundles of content might offer possibilities, but nothing 277 concrete came from that discussion. 279 The potential for active content in a bundle to use Web APIs to 280 enrich content or enable new features was considered valuable. 281 Models for enabling paywalls were discussed at some length (see 282 Section 5.4). 284 2.3.2. Web Archiving 286 Web archiving is a complicated discipline that is made more difficult 287 by the complex nature of the web itself. 289 From an archival standpoint, the potential for Web content to be 290 provided in a self-contained form was viewed positively. Several 291 improvements to the structure of Web Packaging were considered, such 292 as providing complete sets of content and the use of Memento 293 [MEMENTO]. 295 Though there were potential applications of a packaging scheme, many 296 challenges were recognized as requiring additional work on the part 297 of content producers to be fully effective. For example, JavaScript 298 is needed to render some archived content faithfully, but attributing 299 that content to an origin in all scenarios is challenging. 301 If packaging were to be widely deployed it might improve the 302 situation for archival replay. In particular, the speculation is 303 that there would be less "live leakage" as packaged content might be 304 less likely to refer to live resources that currently tend to "leak" 305 into views of archives. It was also noted that subresources might 306 also be more likely to be packaged, especially those that are needed 307 for deferred representations (i.e., after JavaScript execution on the 308 page or some user interactions). Other potential applications and 309 enhancements are discussed in [ALAM]. 311 Participants discussed the use of a signature for non-repudiation at 312 some length. In one case related to the Internet Archive, a public 313 figure disputed the accuracy of archived content, asserting that 314 either the original content was modified at the source, or in the 315 archive. 317 Some participants initially saw digital signatures as a way to 318 address such issues of provenance. As similar problems exist in 319 other areas, such as in book publication, medical research, and news, 320 a solution to this problem was considered to have broad 321 applicability. 323 However, the discussion ultimately concluded that providing non- 324 repudiation in retrospect is challenging. Signing keys are not 325 expected to remain secure for long periods. If keys are leaked 326 afterwards, an attacker could retroactively generate fraudulent 327 signatures. Alternative solutions were discussed, such as providing 328 independent archives for the same data, using consensus protocols, or 329 using an append-only construct like a Haber-Stornetta log [AOLOG], 330 all of which can be used to increase the difficulty of altering or 331 misrepresenting established archives. 333 3. Interactions Between Web Publishers and Aggregators 335 A significant motivation for holding the Workshop was to provide a 336 forum where publishers could discuss the impact of Web Packaging on 337 the online publishing ecosystem. Of primary interest was whether Web 338 Packages might effectively enable a transfer of power from publishers 339 to aggregators. 341 Both publishers and aggregators at the workshop expressed the 342 importance of maintaining a positive relationship. Publishers in 343 particular expressed the need to be able to trust that aggregators 344 won't misrepresent their work, or de-emphasize it for reasons 345 unrelated to quality and perceived value to the user. 347 One key question from [BERJON] was discussed: 349 Web Packaging has other uses, but it is primarily seen by a large 350 proportion of its stakeholders as a solution to problems that AMP 351 created. Before we agree to solve those issues, should we not ask 352 if AMP was a useful approach in the first place - and useful to 353 whom? 355 In examining this issue, discussion focused on the current incentive 356 model offered by aggregators. The costs that publishers incur for 357 participation in that system were considered. Considerable time was 358 spent on AMP, a summary of that discussion can be found in Section 5. 360 We also considered the question of whether standardizing Web 361 Packaging confers credibility to aggregators exercising unwelcome 362 control over publisher content, or whether the technical safeguards 363 Web Packaging provides could allow aggregators to relax their 364 restrictions on the kinds of content they're willing to cache and 365 serve. No conclusions were drawn. 367 3.1. Incentives for Web Packages 369 Submissions to the Workshop indicated that the use of inducements 370 involving better placement and formatting of links to publisher 371 content had a significant effect on the uptake of related technology. 372 For example, in [DEPUYDT-NELSON]: 374 [...] The Washington Post has always placed a great deal of trust 375 in Google to represent its content--and their reward for doing so 376 is more traffic, which positively impacts the business. 378 During the Workshop, several online publishers indicated that if it 379 weren't for the privileged position in the Google Search carousel 380 given to AMP content, they would not publish in that format. 382 Publishers that do produce AMP said they see a non-trivial increase 383 in traffic as a result of deploying AMP content. For example, Yahoo 384 Japan reported a 60% increase in traffic as a result of deploying AMP 385 on Yahoo Travel [OTSU]. There was no data presented as to whether 386 this increase was due to better placement in Google Search results, 387 from the inherent benefits of the AMP cache, or the use of the AMP 388 format. 390 Anecdotal evidence was offered by another large publisher that saw a 391 10% drop in traffic as a result of accidentally disabling AMP 392 content. However, increases in traffic might not result in similarly 393 proportioned increases in revenue, as observed in [BREWSTER]. 395 3.2. Operational Costs 397 Several participants pointed out that introducing a new, parallel 398 format for Web content incurs operational costs. In particular, 399 supporting any new format - such as Web Packaging, Apple News, or 400 Facebook Instant Articles - requires not only initial development of 401 tooling (some generic, some specific to a site's requirements) but 402 also an ongoing investment in maintaining its operability. Some 403 participants expressed concern about the impact upon small publishers 404 with limited technical and financial resources, especially in the 405 current publishing climate. 407 Increased exposure from new formats might not always justify the 408 added expense of providing articles in that format [BREWSTER]. 409 However, a standardized format might help publishers reduce the cost 410 of maintaining multiple formats. 412 3.3. Content Regulation 414 The use of Web Packaging as a tool for avoiding censorship was not a 415 significant topic of discussion, except to note that publishers often 416 have regulatory requirements regarding removal or correction of 417 content. 419 Reference was made to the desire to remove videos of a recent 420 shooting [CHRISTCHURCH] and the potential difficulty in doing so if 421 content were available as Web Packages. Legal requirements to remove 422 content come from multiple angles: copyright violations, illegal 423 content, editorial corrections or errors, and right to erasure 424 provisions in the European Union General Data Protection Regulation 425 [GDPR] were mentioned. One participant speculated that making it 426 more difficult to remove material in this way might discourage 427 regulators from censoring content. 429 In this context, participants observed that it would be difficult to 430 create mechanisms to track and control content served as a Web 431 Package without compromising the stated goal of censorship 432 resistance. 434 3.4. Web Performance 436 Understanding the effect that Web Packaging might have on web 437 performance was a matter of some contention. 439 Some informal analysis from the Google Search deployment was 440 presented (later published in [AMP-PERF]) that showed significant 441 performance improvements in metrics related to navigation time 442 resulting from the combination of prefetch, prerendering, and the AMP 443 format. These results are suggestive of a possibility that Web 444 Packaging could provide some of that improvement on its own, but no 445 data was presented that apportioned the improvement among the three 446 components. 448 Though data was presented to demonstrate potential rather than be a 449 definitive result, discussions raised a number of questions that 450 suggest the need for further study. Attendees suggested that future 451 measurements consider the effect of signed bundles distinct from the 452 enhancements derived from the AMP format. Future research in this 453 area might also consider the effectiveness of different strategies on 454 devices with varying capabilities, bandwidth, power consumption 455 requirements, or network conditions. 457 Of particular interest is the additional work required to fetch and 458 render multiple web pages in prepation for navigation. This might 459 ultimately use fewer connections, but comes with an increased network 460 and CPU cost for clients. Some participants pointed out that 461 different clients or applications might require different tuning; for 462 example, when users have limited (or expensive) bandwidth, or for 463 sites with less clear knowledge about the use of outbound links. 465 Workshop participants also expressed interest in learning about the 466 effect of Web Packages on subsequent navigations within the target 467 site. 469 In discussion, some participants suggested that their experience 470 supported a theory that operating a cache at the linking site was 471 most effective and the additional work done prior to navigation in 472 terms of fetching and preparing content was what provided the most 473 gains; others suggested that the benefits inherent in the AMP format 474 was a dominant factor. 476 Understanding the complete effect of Web Packaging on web performance 477 will require further work. 479 4. Systemic Effects 481 It is not straightforward to estimate how a proposed technology 482 change might affect all of the parts of a system - including not only 483 other components but also things like end-user rights and the balance 484 of power between parties - ahead of time. To date, when evaluating 485 proposals, the IETF has generally focused on more immediate concerns, 486 such as interoperability and security. 488 Moreover, people often find new uses for successful standards 489 [SUCCESS] after they are deployed. It is rarely possible to 490 accurately predict all applications of a protocol or format, whether 491 they are harmful or beneficial. Refusing standardization only 492 impedes both outcomes. 494 With the understanding that predictions are difficult to make, there 495 was considerable speculation at the Workshop about the possible 496 effect of Web Packaging on the Web. Some of that speculation is 497 informed by experience, but that experience is necessarily limited in 498 scope. This section attempts to capture that discussion. 500 4.1. Consolidation 502 Concerns about the consolidation of power on the Internet have 503 significantly increased lately, as a result of several factors. 504 While the IAB, the Internet Society, and others are examining this 505 phenomenon to understand it better, it is nevertheless prudent to 506 consider whether proposals for changes to how the Internet works 507 favors or counters consolidation. Favoring entities with existing 508 advantages - like resources, size, or market share - is not 509 necessarily a factor that disqualifies a new proposal, but it needs 510 to be considered as a cost of enabling that technology. 512 While it isn't clear what all of the outcomes of adopting Web 513 Packaging would be, the Workshop revealed several concerns for 514 consolidation risks for all involved parties: users, publisher sites, 515 linking sites, and services they each rely on. 517 4.1.1. Consolidation of Power in Linking Sites 519 Several participants noted that Web Packaging's enablement of instant 520 navigation (Section 2.1) might advantage larger linking sites - such 521 as social networks or search engines - over smaller ones in the same 522 industry because doing so requires careful selections of which links 523 to optimize, so as not to create unneeded traffic. 525 For example, a news article often has many links, but not all of them 526 are equally likely to be followed. Deciding which ones to pre-fetch 527 requires considerable data collection and engineering, so this 528 technique might not be feasible for smaller entities. Additionally, 529 some participants noted that this technique favors sites that have a 530 linear set of ranked links, like search results; it is more difficult 531 to apply to a page of news (for example) because predicting what link 532 a user will follow is less obvious. 534 This technique also requires access to a cache with terms of use 535 compatible with the requirements of the site. It was pointed out 536 that the Google AMP Cache has policies that might be acceptable to 537 many, and there are other caches. Sites operated by entities other 538 than Google already use this cache, though it was observed that a 539 site that does not host its own cache suffers a minor performance 540 degradation. 542 4.1.2. Consolidation of Power in Publishers 544 Participants seemed to agree that if performance is strong enough 545 differentiator, the effective use of Web Packaging might turn out to 546 be a condition for success for online publishers. Google Search's 547 choice to privilege content that is served using HTTPS was pointed 548 out as showing that this sort of influence can be effective. 549 Equally, it is not necessarily the case that standardization of new 550 capabilities will affect such policies materially, as noted in 551 [YASSKIN]: 553 It seems unlikely that any decisions we make in a packaging or 554 distribution system will affect the considerations aggregators use 555 when deciding how to rank recommendations or the power this gives 556 them over publishers. 558 The most common concern raised in the discussion was the effect of 559 this technology on smaller publishers who might be less able to 560 optimize the packages they produce, where their primary 561 differentiation in the market has previously been the quality of 562 their content. 564 4.1.3. Consolidation of User Preferences 566 In typical operation of the Web, servers have an opportunity to 567 tailor content to the needs of their users. In contrast, a static 568 Web Package has few options for individualization, as the content is 569 generated once and used by many. 571 As a result, publishers noted that AMP provides less opportunity to 572 customize content for their customers. Their concerns included not 573 only personalizing content based on what they know about the user but 574 also optimizing the package for specific browsers. Other 575 participants observed in relation to this that Web Packaging might 576 also have a consolidating effect in the browser market. 578 Some participants brought up the possibility of customization by 579 providing multiple packages, including multiple variants of resources 580 in a single package, or performing customization after the package 581 was loaded. However, other participants pointed out that all of 582 these options have negative side effects, either in complexity or 583 reduced performance arising from larger bundles or delayed 584 customization. 586 4.2. Effect on Web Security 588 One session explored the impact of introducing a new security model 589 for the Web. Currently, sites rely on connection-oriented security 590 (provided by TLS [TLS]), but Web Packaging adds a limited form of 591 object security. That is, the package protects the integrity of a 592 message, rather than providing integrity and confidentiality for its 593 delivery. Object security is not a new concept in the context of the 594 Web; designs like SHTTP [SHTTP] are as old as HTTPS. Though the 595 intent is for Web Packaging to have a far more narrow applicability, 596 it provides fewer security guarantees than HTTPS, since it provides 597 only authentication, no confidentiality with respect to the cache, 598 and no assurance of liveness. 600 Object-based security - such as proposed in Web Packaging - allows 601 the use of content regardless of how it is obtained; some 602 participants noted that third parties gain greater control over the 603 distribution of content, reducing the ability of publishers to 604 retract or alter content over the validity period of signed content. 606 Another topic of discussion was composition attacks. In its proposed 607 form, Web Packaging only provides authentication of independent 608 resources, not a web page as a single unit, allowing an attacker to 609 control the composition of resources. This weakness was acknowledged 610 as a known shortcoming of the current proposal that would be 611 addressed. 613 The issue of managing the trade-off between control and performance 614 in caches arose. While participants recognized that problems with 615 resource composition already occur by accident - for example, when a 616 cache stores different versions of resources - Web Packaging allows 617 an attacker more direct control over what resources are available to 618 clients. 620 For example, an attacker might be able to cause content with a 621 security flaw to be used up to a week past the time that the defect 622 was fixed. 624 As an example of how Web Packaging might change the risk profile for 625 sites, participants discussed recovery from cross-site scripting 626 attacks. It is already the case that a brief exposure to this class 627 of attack can result in an attacker gaining persistent access, but 628 mechanisms exist that can be used to avoid or correct issues, like 629 cache validation and Clear Site Data [CLEAR-DATA]. These measures 630 are not available to clients unless they connect to the site. 632 The discussion pointed out that these concerns are not new or 633 uniquely enabled by Web Packaging. However, it was pointed out that 634 new features are routinely subject to higher security and privacy 635 expectations. In an example unrelated to Web Packaging but with 636 similar tradeoffs, shared compression of multiple resources has 637 significant performance benefits. The risk with shared compression 638 exposes is the potential for exposing encrypted information through 639 side-channels. Though sites can use shared compression without this 640 exposure, shared compression will likely only be enabled once it is 641 clear that measures to prevent accidental information exposure are 642 understood to be effective in a broad set of deployments. 644 The discussion also addressed the question of whether concerns might 645 equally apply to the typical use of a Content Distribution Network 646 (CDN) as a third-party provider of the content. Some participants 647 concluded that CDNs are typically in a contractual relationship with 648 the sites they serve and so are more likely to have their interests 649 aligned. 651 4.3. Privacy of Content 653 Discussion and submissions raised concerns regarding how serving 654 content using Web Packages might adversely affect privacy of 655 individuals. There are challenges here, but the very narrow 656 applicability of Web Packaging to what is effectively static content 657 limits the privacy risk. The conclusion was that provided sufficient 658 care is taken in implementation, use of Web Packages does not 659 substantially increase the information that an aggregator gains about 660 what content is consumed. 662 Concretely, an aggregator knows what content it serves in 663 anticipation of navigation. This is - at least in theory - 664 substantially the same as the content that the aggregator might 665 receive if it performed the navigation itself. Assuming that content 666 is stripped of personalization, the aggregator gains no new 667 information. 669 5. AMP Issues Unrelated to Web Packaging 671 On multiple occasions, discussion at the Workshop concentrated on 672 problems that arise as a result of constraints on the AMP format or 673 details of its inclusion in Google Search. For instance, the 674 requirement to make metadata about pages to be exposed by pages is 675 unlikely to be affected by any standardization of a packaging format 676 as that requirement is independent of the process of delivering 677 content. 679 This section provides some detail on aspects of the discussion that 680 touched on AMP more generally in this way. Some treatment of these 681 points is considered relevant as some of the discussion at the 682 workshop, even under the remit of discussing Web Packaging, 683 concentrated on the effect of AMP on the ecosystem. 685 Note: Of the four formats mentioned in the workshop call for papers 686 [CFP], only AMP sent representatives to the workshop. The 687 discussion was therefore concentrated around AMP; this section 688 should not be read to imply anything about other formats. 690 Discussion and submissions referred to a commitment [AMP-LESSONS] to 691 allow publishers to use content that met specific criteria to access 692 privileged positions in search results, regardless of their adoption 693 of AMP. Participants felt that this approach might address some of 694 these concerns if it were adopted and durable. For instance, the use 695 of Web Packaging might be sufficient to remove some constraints on 696 active content on the basis that the active content would be 697 attributed to the publisher and not the AMP cache. 699 5.1. AMP Governance 701 There was interest from workshop participants in the governance model 702 used for AMP. In particular, the question of how independent the AMP 703 project would be of Google and Google Search. 705 Three of the seven members of the AMP Technical Steering Committee, 706 the body that governs AMP, are Google employees, which gives Google 707 considerable influence over the project. It was asserted that the 708 governance structure was intended to be more independent of Google 709 over time. The understanding was that any consumer of the format, 710 such as Google Search, would make an independent assessment about 711 whether to use or require different aspects of the AMP project 712 products. 714 5.2. Constraints on the AMP Format 716 Sites often implement AMP by creating a separate set of content in 717 parallel to their regular HTML content. Publishers noted this as a 718 high cost, particularly for smaller sites. It was pointed out that 719 websites can serve AMP-compliant content exclusively. However, 720 several publishers referred to limitations in the format that made it 721 unsuitable for their needs. 723 Many cited reasons for this duplication were related to the necessity 724 of running arbitrary active content (typically, JavaScript). For 725 example: 727 o AMP provides a framework for supporting user authentication, but 728 publishers asserted that using this framework was not considered 729 practical. 731 o AMP content does not support rendering of certain content, which 732 can affect the ability of publishers to innovate in how they 733 produce content. 735 o The AMP model for the implementation of paywalls (Section 5.4) was 736 claimed to be inimical to some publisher business models. 738 More broadly, they considered AMP's constraints on the use of active 739 content as problematic, since they prevent the use of capabilities 740 that are provided on equivalent non-AMP pages. Reference was made to 741 a proposed element - which has since been made fully 742 available - that seeks to provide limited access to some dynamic 743 content. 745 5.3. Performance 747 Publishers observed that using the AMP format does not provide any 748 guarantee of performance gains and in some cases could contribute to 749 performance degradation. It was suggested that this was most 750 problematic for sites that are already well-tuned for performance. 752 5.4. Implementation of Paywalls 754 The use of "paywalls" by Web publishers to control access to content 755 in return for payment is increasingly common. One popular approach 756 is to offer a limited number of articles without payment while 757 insisting on a paid subscription to access further articles. 759 On several occasions, participants expressed dissatisfaction with the 760 difficulty of integrating paywall authorization when using AMP. In 761 particular, they said AMP encourages publishers to include an 762 article's full content, hidden by default but easily accessible to 763 motivated users. The discussion extended to workarounds like cookie 764 syncing [COOKIE-SYNC] that is used as part of authorization, a 765 consequence of having cached content hosted on the linking site 766 rather than the target site. 768 The same topic came up concerning book publication, where publishers 769 indicated that having a means of enabling different methods of 770 distribution without also facilitating unconstrained copying of book 771 content was necessary. 773 This conflation of AMP issues with those addressed by Web Packaging 774 was recurrent in the discussion. As observed in [DAS], these 775 concerns might be addressed by linking to a signed bundle. 777 6. Venues for Future Discussion 779 Web Packaging work continues in multiple forums. Questions about the 780 core format and signatures is being discussed on the wpack@ietf.org 781 mailing list [1]. Changes to web browsers as proposed in [LOADING] 782 will be discussed on the Fetch specification repository [2]. 784 7. Security Considerations 786 Proposals discussed at the Workshop might have a significant security 787 impact, and these topics were discussed in some depth; see 788 Section 4.2. 790 8. References 792 8.1. Informative References 794 [ALAM] Alam, S., Weigle, M., Nelson, M., Klein, M., and H. Van de 795 Sompel, "Supporting Web Archiving via Web Packaging", June 796 2019, . 799 [AMP-LESSONS] 800 Ubl, M., "Standardizing lessons learned from AMP", March 801 2018, . 804 [AMP-PERF] 805 Steinlauf, E., "The Speed Benefit of AMP Prerendering", 806 August 2019, . 809 [AOLOG] Haber, S. and W. Stornetta, "How to time-stamp a digital 810 document", Journal of Cryptology Vol. 3, 811 DOI 10.1007/bf00196791, 1991. 813 [BERJON] Berjon, R., "ESCAPE: The New York Times Position", July 814 2019, . 817 [BREWSTER] 818 Brewster, A., "ESCAPE Position / Patch.com", June 2019, 819 . 822 [BUNDLE] Yasskin, J., "Web Packaging", draft-yasskin-dispatch-web- 823 packaging-00 (work in progress), June 2017. 825 [CFP] IAB, ., "Exploring Synergy between Content Aggregation and 826 the Publisher Ecosystem Workshop 2019", May 2019, 827 . 830 [CHATHAM-HOUSE] 831 Chatham House, "Chatham House Rule", n.d., 832 . 834 [CHRISTCHURCH] 835 Stevenson, R. and J. Anthony, "'Thousands' of Christchurch 836 shootings videos removed from YouTube, Google says", March 837 2019, . 841 [CLEAR-DATA] 842 West, M., "Clear Site Data", W3C Working Draft, November 843 2017, . 845 [COOKIE-SYNC] 846 Acar, G., Eubank, C., Englehardt, S., Juarez, M., 847 Narayanan, A., and C. Diaz, "The Web Never Forgets", 848 Proceedings of the 2014 ACM SIGSAC Conference on Computer 849 and Communications Security - CCS '14, 850 DOI 10.1145/2660267.2660347, 2014. 852 [CRAMER] Cramer, D., "Packaging Books", June 2019, 853 . 856 [DAS] Das, S., "The Implication of Signed Exchanges on 857 E-Commerce", June 2019, . 861 [DEPUYDT-NELSON] 862 DePuydt, M. and M. Nelson, "Signed Exchanges and The 863 Importance of Trust in Aggregator/Publisher 864 relationships", June 2019, . 867 [GDPR] European Union, "General Data Protection Regulation", EU 868 Regulation 2016/679, April 2016, . 872 [HTTP] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 873 Protocol (HTTP/1.1): Message Syntax and Routing", 874 RFC 7230, DOI 10.17487/RFC7230, June 2014, 875 . 877 [LOADING] Yasskin, J., "Loading Signed Exchanges", September 2019, 878 . 880 [MEMENTO] Van de Sompel, H., Nelson, M., and R. Sanderson, "HTTP 881 Framework for Time-Based Access to Resource States -- 882 Memento", RFC 7089, DOI 10.17487/RFC7089, December 2013, 883 . 885 [ORIGIN] Barth, A., "The Web Origin Concept", RFC 6454, 886 DOI 10.17487/RFC6454, December 2011, 887 . 889 [OTSU] Ohtsu, S., "Deployment Experience of Signed HTTP Exchanges 890 with AMP as a Publisher", June 2019, . 893 [SHTTP] Rescorla, E. and A. Schiffman, "The Secure HyperText 894 Transfer Protocol", RFC 2660, DOI 10.17487/RFC2660, August 895 1999, . 897 [SUCCESS] Thaler, D. and B. Aboba, "What Makes for a Successful 898 Protocol?", RFC 5218, DOI 10.17487/RFC5218, July 2008, 899 . 901 [SXG] Yasskin, J., "Signed HTTP Exchanges", draft-yasskin-http- 902 origin-signed-responses-06 (work in progress), July 2019. 904 [TAG-DC] Betts, A., "Distributed and syndicated content", July 905 2017, . 908 [TLS] Rescorla, E., "The Transport Layer Security (TLS) Protocol 909 Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, 910 . 912 [YASSKIN] Yasskin, J., "Chrome's position on the ESCAPE workshop", 913 June 2019, . 916 8.2. URIs 918 [1] https://www.ietf.org/mailman/listinfo/wpack 920 [2] https://github.com/whatwg/fetch/issues/784 922 [3] https://amp.dev/ 924 [4] https://schema.org/ 926 [5] https://developers.google.com/amp/cache/ 928 Appendix A. About the Workshop 930 The ESCAPE Workshop was held on 2019-07-18 and the morning of 931 2019-07-19 at Cisco's facility in Herndon, Virginia USA. 933 Attendees to the Workshop were asked to submit position papers. 934 These papers are published on the IAB website [CFP]. 936 The Workshop was conducted under Chatham House rule [CHATHAM-HOUSE], 937 meaning that statements cannot be attributed to individuals or 938 organizations without explicit authorization. 940 A.1. Agenda 942 This section outlines the broad areas of discussion on each day. 944 A.1.1. Thursday 2019-07-18 946 Web Packaging Overview:­­­­ A technical summary 947 of Web Packaging was provided, plus a longer discussion of a range 948 of use cases. 950 Web Packaging and Aggregators: The use of web packaging from the 951 perspective of a content aggregator was given. 953 Web Packaging and Publishers: After a break, presentations from web 954 publishers talked about the benefits and costs of Web Packaging. 955 This included some discussion of the effect of developing AMP- 956 conformant versions of content from a publisher perspective. 958 Web Packaging and Security: This session concentrated on how the Web 959 Packaging proposal might affect the Web security model. 961 Alternatives to Web Packaging: This session looked at alternative 962 technologies, including those that were attempted in the past and 963 some more recent ideas for addressing the use case of making web 964 navigations more performant. 966 A.1.2. Friday 2019-07-19 968 Web Archival: This session talked about the potential application of 969 a technology like Web Packaging in addressing some of the myriad 970 problems faced by web archival systems. 972 Book Publishing: A discussion of the effect of technologies for 973 bundling and distribution of books. 975 Conclusions: A wrap up session attempted to capture key learnings 976 from the Workshop. 978 A.2. Workshop Attendees 980 Attendees to the Workshop are listed with their primary affiliation 981 as it appeared in submissions. Attendees from the program committee 982 (PC), the Internet Architecture Board (IAB), and Internet Engineering 983 Steering Group (IESG) are also marked. 985 o Sawood Alam, Old Dominion University 987 o Jari Arkko, Ericsson (IAB) 989 o Richard Barnes, Cisco 991 o Robin Berjon, New York Times (PC) 993 o Zack Bloom, Cloudflare 995 o Abraham Brewster, Patch.com 997 o Alissa Cooper, Cisco (IESG, IAB) 999 o Dave Cramer, Hachette Book Group 1000 o Melissa DePuydt, Washington Post 1002 o Levi Durfee, AMP Advisory Committee 1004 o Rudy Galfi, Google 1006 o Joseph Lorenzo Hall, Center for Democracy & Technology (PC) 1008 o Matthew Nelson, Washington Post 1010 o Michael Nelson, Old Dominion University 1012 o Mark Nottingham, Fastly (IAB, PC) 1014 o Shigeki Ohtsu, Yahoo 1016 o Eric Rescorla, Mozilla 1018 o Adam Roach, Mozilla (IESG) 1020 o Rich Salz, Akamai Technologies 1022 o Wendy Seltzer, W3C 1024 o David Strauss, Pantheon (PC) 1026 o Chi-Jiun Su, Hughes 1028 o Ralph Swick, W3C 1030 o Martin Thomson, Mozilla (IAB, PC) 1032 o Jeffrey Yasskin, Google 1034 o Dan York, Internet Society 1036 o Benjamin Young, John Wiley & Sons 1038 Appendix B. Web Packaging Overview 1040 Web Packaging is comprised of two separate technologies: resource 1041 bundling [BUNDLE] and signed exchanges [SXG]. 1043 In both the submissions and Workshop discussion, the most 1044 controversial aspect of the technology is the use of signed exchanges 1045 as an alternative means of providing authority over a particular 1046 resource, for a few different reasons. 1048 This appendix explains how authority works on the Web and how Web 1049 Packaging proposes to change that. 1051 B.1. Authority in HTTPS 1053 The web currently uses HTTPS [HTTP] to establish a server's authority 1054 - that is, to give an assurance that the content came from where the 1055 URL implies. The combination of URI scheme (https), domain name (or 1056 host), and port number are formed into a single identifier, the 1057 origin [ORIGIN] to which content is attributed. 1059 Web browsers use the certificate offered as part of a TLS connection 1060 [TLS] to servers in determining whether a server is authoritative for 1061 that origin; see [ORIGIN] and Section 9.1 of [HTTP]. Content is 1062 attributed to a given URL only if it is received from a connection to 1063 a server that is authoritative for the associated origin. 1065 As an example, a web browser seeking to load "https://example.com/ 1066 index.html" makes a TLS connection to a server. As part of the TLS 1067 connection establishment, the server offers a certificate for the 1068 name "example.com". If the browser accepts the certificate, it will 1069 then make requests for URLs on the "https://example.com" origin on 1070 that connection and consider any answers the server to be 1071 authoritative. 1073 This notion of authority is a crucial property of web security: only 1074 content that is attributed to the same web origin can access all 1075 information in that origin, including the content of most resources 1076 as well as state associated with the origin, such as cookies. This 1077 separation ensures that sites can keep secrets from each other, even 1078 when they are both loaded in the same browser. 1080 B.2. Authority in Web Packaging 1082 Web Packaging, through the use of signed exchanges, aims to provide 1083 an alternative means of establishing authority. A signed exchange is 1084 an expression of an HTTP request and response (an exchange) with 1085 certain information stripped and a digital signature applied. 1087 The signature is made with a similar certificate to the one a server 1088 might offer in HTTPS - that certificate can also be used for HTTPS - 1089 but it includes a special attribute that denotes its suitability for 1090 signed exchanges. 1092 A web browser that has been provided with a signed exchange can 1093 verify the signature, and - if the signature is valid and the 1094 certificate is acceptable - use the content from the signed exchange. 1096 Critically, the web browser does not make an HTTPS connection to a 1097 server to get the content or to verify the signature. 1099 In effect, Web Packaging moves from a model where authority is 1100 derived from the delivery method (i.e., TLS) to an object security 1101 model, where authority is derived from a signature on objects. In 1102 doing so, it aims to render the means of delivery irrelevant to 1103 determinations of security. 1105 B.3. Applicability 1107 Web Packaging does not claim to supplant the authority model of the 1108 Web completely, but to provide an alternative that might be used 1109 under certain narrow conditions. In particular, Web Packaging is 1110 intended for use with content that is not secret from an entity that 1111 is aware of the existence of that content. 1113 In aid of this goal, web packaging does not include information from 1114 exchanges that is related either the process of acquiring content as 1115 well as any information that relates to individual requests. For 1116 instance, use of the Set-Cookie header field is expressly forbidden, 1117 as it often contains information that is related to a particular 1118 user. 1120 B.4. The AMP Format, Google Search Results, and Web Packaging 1122 The relationship between the AMP Project https://amp.dev/ [3] and Web 1123 Packaging is complicated. The AMP Project, sponsored by Google, 1124 establishes a profile of HTML with a stated goal of providing support 1125 for the best practices for the format, with a strong emphasis on 1126 performance. The format tightly constrains the use of HTML features 1127 but also offers a library of components that provide sanitized 1128 implementations of many commonly used capabilities. 1130 The connection to Web Packaging is bound up in the way that Google 1131 Search treats AMP content specially. AMP content provides two 1132 properties that Google Search exploits: metadata exposure and static 1133 analysis of active content. 1135 AMP content provides metadata in a form that can be reliably 1136 extracted, using the microformats defined by the Schema.org project 1137 https://schema.org/ [4]. This aspect of AMP has no effect on the 1138 discussion, except to the extent that this relates to Google Search 1139 and their use of this metadata in populating the carousel. 1141 Constrained use of active content - such as JavaScript - in AMP makes 1142 it possible to analyze content to verify that actions taken are 1143 narrowly limited. This static analysis assures that AMP content can 1144 be served without affecting other content on the same site. For 1145 Google Search, this is what enables the loading of AMP content 1146 alongside search content and other AMP resources. 1148 To provide preloading, Google operates an AMP Cache 1149 https://developers.google.com/amp/cache/ [5], from which AMP content 1150 is served. As a consequence, browsers attribute the content to the 1151 origin [ORIGIN] of the AMP Cache and not the publisher, creating some 1152 confusion about how content is attributed, as discussed in the W3C 1153 finding on distributed content [TAG-DC]. 1155 An important goal of Web Packaging is to attribute content loaded 1156 from a cache, such as the AMP cache, to the publisher that created 1157 that content. For more on this see Section 2.1. 1159 Authors' Addresses 1161 Martin Thomson 1163 Email: mt@lowentropy.net 1165 Mark Nottingham 1167 Email: mnot@mnot.net