idnits 2.17.1 draft-thomson-escape-report-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 10, 2019) is 1684 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 1101 -- Looks like a reference, but probably isn't: '2' on line 1116 -- Looks like a reference, but probably isn't: '3' on line 1128 -- Obsolete informational reference (is this intentional?): RFC 7230 (ref. 'HTTP') (Obsoleted by RFC 9110, RFC 9112) == Outdated reference: A later version (-09) exists of draft-yasskin-http-origin-signed-responses-06 Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Thomson 3 Internet-Draft 4 Intended status: Informational M. Nottingham 5 Expires: March 13, 2020 September 10, 2019 7 Report from the IAB Workshop on Exploring Synergy between Content 8 Aggregation and the Publisher Ecosystem (ESCAPE) 9 draft-thomson-escape-report-00 11 Abstract 13 The Exploring Synergy between Content Aggregation and the Publisher 14 Ecosystem (ESCAPE) Workshop was convened by the Internet Architecture 15 Board (IAB) in July 2019. This report summarizes its significant 16 points of discussion and identifies topics that may warrant further 17 consideration. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at https://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on March 13, 2020. 36 Copyright Notice 38 Copyright (c) 2019 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (https://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 54 1.1. Mention of Specific Entities . . . . . . . . . . . . . . 3 55 2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 2.1. Instant Navigation . . . . . . . . . . . . . . . . . . . 4 57 2.2. Offline Content Sharing . . . . . . . . . . . . . . . . . 5 58 2.3. Other Use Cases . . . . . . . . . . . . . . . . . . . . . 5 59 2.3.1. Book Publishing . . . . . . . . . . . . . . . . . . . 6 60 2.3.2. Web Archiving . . . . . . . . . . . . . . . . . . . . 7 61 3. Interactions Between Web Publishers and Aggregators . . . . . 8 62 3.1. Incentives for Web Packages . . . . . . . . . . . . . . . 8 63 3.2. Operational Costs . . . . . . . . . . . . . . . . . . . . 9 64 3.3. Content Regulation . . . . . . . . . . . . . . . . . . . 9 65 3.4. Web Performance . . . . . . . . . . . . . . . . . . . . . 10 66 4. Systemic Effects . . . . . . . . . . . . . . . . . . . . . . 11 67 4.1. Consolidation . . . . . . . . . . . . . . . . . . . . . . 11 68 4.1.1. Consolidation of Power in Linking Sites . . . . . . . 11 69 4.1.2. Consolidation of Power in Publishers . . . . . . . . 12 70 4.1.3. Consolidation of User Preferences . . . . . . . . . . 12 71 4.2. Effect on Web Security . . . . . . . . . . . . . . . . . 13 72 4.3. Privacy of Content . . . . . . . . . . . . . . . . . . . 14 73 5. AMP Issues Unrelated to Web Packaging . . . . . . . . . . . . 15 74 5.1. AMP Governance . . . . . . . . . . . . . . . . . . . . . 15 75 5.2. Constraints on the AMP Format . . . . . . . . . . . . . . 16 76 5.3. Performance . . . . . . . . . . . . . . . . . . . . . . . 16 77 5.4. Implementation of Paywalls . . . . . . . . . . . . . . . 16 78 6. Security Considerations . . . . . . . . . . . . . . . . . . . 17 79 7. Informative References . . . . . . . . . . . . . . . . . . . 17 80 Appendix A. About the Workshop . . . . . . . . . . . . . . . . . 20 81 A.1. Agenda . . . . . . . . . . . . . . . . . . . . . . . . . 20 82 A.1.1. Thursday 2019-07-18 . . . . . . . . . . . . . . . . . 20 83 A.1.2. Friday 2019-07-19 . . . . . . . . . . . . . . . . . . 21 84 A.2. Workshop Attendees . . . . . . . . . . . . . . . . . . . 21 85 Appendix B. Web Packaging Overview . . . . . . . . . . . . . . . 22 86 B.1. Authority in HTTPS . . . . . . . . . . . . . . . . . . . 22 87 B.2. Authority in Web Packaging . . . . . . . . . . . . . . . 23 88 B.3. Applicability . . . . . . . . . . . . . . . . . . . . . . 24 89 B.4. The AMP Format, Google Search Results, and Web Packaging 24 90 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25 92 1. Introduction 94 The IAB convened this workshop to examine some proposed changes to 95 the Internet and the Web, and their potential effects on the Internet 96 publishing landscape. Of particular interest was the Web Packaging 97 proposal from Google, under consideration in the IETF, the W3C's Web 98 Incubator Community Group (WICG), and the Web Hypertext Application 99 Technology Working Group (WHATWG). 101 In considering these proposals, we heard about both positive effects 102 of Web Packaging, and concerns that it could have significant effects 103 on the relationship between publishers (e.g., news Web sites) and 104 content aggregators (e.g., search engines and social networks). As 105 such, our focus was primarily on this relationship, rather than being 106 a technical discussion. 108 Online publishers do not regularly participate in standards 109 activities directly. A Workshop format was used to solicit input 110 from them. The workshop had 27 participants from a diverse set of 111 backgrounds, including a small number of attendees from publishers, 112 one aggregator (Google), plus representatives from browsers, the AMP 113 community, CDNs, network operators, academia, and standards bodies. 114 See the Workshop Call for Participation [CFP] for more information 115 and a complete listing of submissions. 117 As intended, the Workshop was primarily a forum for discussion, so it 118 did not reach definite conclusions. Instead, this report is the 119 primary output of the Workshop, as a record of that discussion. 121 This report documents the use cases discussed in Section 2 and 122 explains the interactions between publishers and aggregators that 123 might be affected by it in Section 3. Appendix A includes more 124 details about the Workshop itself. For those unfamiliar with Web 125 Packaging, Appendix B provides a summary as background material. 127 1.1. Mention of Specific Entities 129 Participants agreed to conduct the Workshop under the Chatham House 130 Rule [CHATHAM-HOUSE], so this report does not attribute statements to 131 individuals or organizations without express permission. Submissions 132 to the Workshop were public, and thus attributable; they are used 133 here to provide substance and context. 135 2. Use Cases 137 Much of the Workshop concentrated on discussion of the validity and 138 relative merits of the use cases that might be enabled by Web 139 Packaging. See Appendix B for an overview of what Web Packaging is. 141 2.1. Instant Navigation 143 The largest use of Web Packaging so far is in Google Search, where 144 packages are intended to improve the perceived performance of 145 navigation to pages that are linked from search results when 146 "clicked". 148 To enable this, when a linking (or referring) web page includes links 149 to pages on another site, it also provides the browser with a 150 packaged copy of the target content, signed by the origin of the 151 target content. In effect, the referring page provides a cache for 152 the target page's content. If navigation to one of those links 153 occurs, having the Web Package gives a browser the assurance that the 154 cache didn't change the content, so it can treat that content as if 155 it were acquired directly from the server for the target page - even 156 though it came from a different server. In many cases, this results 157 in significantly lower perceived delay in displaying the target page. 159 A vital characteristic of this technique is that the browser does not 160 contact the target site before navigation. The browser does not make 161 any requests to sites until after navigation occurs, and only then if 162 the site requires additional content or makes a request directly. 164 Similar improvements could also be realized by downloading content 165 (packaged or otherwise) directly from the target site through a 166 technique called prefetching. However, doing so would reveal 167 information about the user's activity on the linking page to those 168 sites - even when the user never actually navigates to it. 170 Note: This technique that uses Web Packaging is also referred to as 171 "privacy-preserving prefetch". This document avoids that term as 172 there was some contention at the workshop about what aspects of 173 privacy might be preserved by the technique. 175 Sites bundled with Web Packaging can additionally be constructed in a 176 way that ensures that they render without needing any additional 177 network access. This makes it possible to provide near-instantaneous 178 navigation. The proposed changes to web navigation in support of 179 loading Web Packages is designed to support this use case. 181 Workshop participants recognized the value of web performance for 182 usability, as well as for business metrics like retention and bounce 183 rates. Such improvements were seen as a valuable goal, but 184 publishers raised questions about whether they justified the cost of 185 supporting an additional format, while others raised concerns about 186 different aspects of the Web Packaging proposal. 188 2.2. Offline Content Sharing 190 Another primary use case discussed was the ability to share Web 191 content between devices where neither has an active connection to the 192 Internet. One of the stated goals of Web Packaging is to enable 193 sharing of content offline. 195 Several participants reported that in areas where Internet access is 196 expensive, slow, or intermittent, the use of direct peer-to-peer file 197 exchange (e.g., "saving a Web site and sharing it on a USB stick") is 198 commonplace. Most Web browsers already have some affordances for 199 this, but these are recognized as in need of improvements. 201 In the discussion, several rejected an assumed requirement of this 202 use case - that there be no difference between the treatment of a 203 "normal" Web page and that of one loaded from an offline Web Package. 205 The ability for a Web Package to provide clear attribution for 206 content was seen as valuable by some participants for a range of 207 reasons. However, reservations were expressed about the subtleties 208 of the properties that signatures provide and the effect of this on 209 Web security; see also Section 4.2 and Section 2.3.2. 211 Many participants pointed out that using "unsigned bundles" - that 212 is, Web Packages without Signed Exchanges - could be adequate for 213 this use case, since most users don't need cryptographic proof of the 214 site's identity. However, some expressed concerns that this might 215 worsen the propagation of falsehood. 217 Some suggested that the value of Signed Exchanges was not realized in 218 small-scale interpersonal exchange of information, but in the 219 building of systems for content delivery that might include 220 capabilities like discovery and automated distribution. The 221 contention here was that effective use of digital signatures in 222 offline distribution of content implied considerably more 223 infrastructure than was described in current proposals. 225 No definite conclusions about offline sharing were reached during the 226 workshop. 228 2.3. Other Use Cases 230 A session on the second morning concentrated on two other significant 231 potential use cases for Web Packages: book publishing and Web 232 archiving. These were not seen as "primary" by the proponents of Web 233 Packaging; the original intent was not to spend significant time on 234 these subjects, but there was considerable interest from attendees. 236 2.3.1. Book Publishing 238 The potential application of a packaging format to book publishing 239 was discussed, with particular reference to ways that books differ 240 from web content. Specialists from that industry pointed out that 241 book delivery can vary greatly from typical web content delivery. 243 Workshop participants briefly explored existing solutions. PDF was 244 seen as particularly challenging for this use case, due to its 245 limitations, and EPUB has constraints that also make it challenging 246 for publishers. 248 Although Web Packaging might help to address this use case, the 249 question of how to identify book content was not resolved. The use 250 of Signed Exchanges in this context might offer means of tying 251 content in books to a Web site, but several limitations inherent in 252 doing that were identified. 254 In particular, book publication specialists represented that books 255 don't have the same requirements for timeliness or currency as web 256 pages. For instance, Dave Cramer's submission [CRAMER] observed that 257 Moby Dick was published over 61,000 days ago, which is considerably 258 longer than the proposed limit of 7 days for Signed Exchanges. The 259 limited length of time that a Web Package can be considered valid was 260 discussed at some length. 262 Additionally, the risk of a publisher going out of business during 263 the lifetime of a book is significant, because books - at least 264 successful ones - often span generations in their applicability. To 265 that end, having a means of attributing content to a publisher was 266 considered less practical, and potentially undesirable (much like the 267 discussion above regarding "unsigned bundles"). 269 There were other aspects of book publication that participants saw as 270 challenging for packaging. For example, it is currently not 271 understood what it is to refer to distinct parts of a book. 272 Participants saw this as an area where providing stable references 273 for bundles of content might offer possibilities, but nothing 274 concrete came from that discussion. 276 The potential for active content in a bundle to use Web APIs to 277 enrich content or enable new features was considered valuable. 278 Models for enabling paywalls were discussed at some length (see 279 Section 5.4). 281 2.3.2. Web Archiving 283 Web archiving is a complicated discipline that is made more difficult 284 by the complex nature of the web itself. 286 From an archival standpoint, the potential for Web content to be 287 provided in a self-contained form was viewed positively. Several 288 improvements to the structure of Web Packaging were considered, such 289 as providing complete sets of content and the use of Memento 290 [MEMENTO]. 292 Though there were potential applications of a packaging scheme, many 293 challenges were recognized as requiring additional work on the part 294 of content producers to be fully effective. For example, JavaScript 295 is needed to render some archived content faithfully, but attributing 296 that content to an origin in all scenarios is challenging. 298 If packaging were to be widely deployed it might improve the 299 situation for archival views. In particular, the speculation is that 300 there would be less "live leakage" as packaged content might be less 301 likely to refer to live resources that currently tend to "leak" into 302 views of archives. It was also noted that subresources might also be 303 more likely to be packaged, especially those that are needed for pre- 304 rendering pages. Other potential applications and enhancements are 305 discussed in [ALAM]. 307 Participants discussed the use of a signature for non-repudiation at 308 some length. In one case related to attendees, a public figure 309 disputed the accuracy of archived content, asserting that either the 310 original content was modified at the source, or in the archive. 312 Some participants initially saw digital signatures as a way to 313 address such issues of provenance. As similar problems exist in 314 other areas, such as in book publication, medical research, and news, 315 a solution to this problem was considered to have broad 316 applicability. 318 However, the discussion ultimately concluded that providing non- 319 repudiation in retrospect is challenging. Signing keys are not 320 expected to remain secure for long periods. If keys are leaked 321 afterwards, an attacker could retroactively generate fraudulent 322 signatures. Alternative solutions were discussed, such as providing 323 independent archives for the same data, using consensus protocols, or 324 using an append-only construct like a Haber-Stornetta log [AOLOG], 325 all of which can be used to increase the difficulty of altering or 326 misrepresenting established archives. 328 3. Interactions Between Web Publishers and Aggregators 330 A significant motivation for holding the Workshop was to provide a 331 forum where publishers could discuss the impact of Web Packaging on 332 the online publishing ecosystem. Of primary interest was whether Web 333 Packages might effectively enable a transfer of power from publishers 334 to aggregators. 336 Both publishers and aggregators at the workshop expressed the 337 importance of maintaining a positive relationship. Publishers in 338 particular expressed the need to be able to trust that aggregators 339 won't misrepresent their work, or de-emphasize it for reasons 340 unrelated to quality and perceived value to the user. 342 One key question from [BERJON] was discussed: 344 Web Packaging has other uses, but it is primarily seen by a large 345 proportion of its stakeholders as a solution to problems that AMP 346 created. Before we agree to solve those issues, should we not ask 347 if AMP was a useful approach in the first place - and useful to 348 whom? 350 In examining this issue, discussion focused on the current incentive 351 model offered by aggregators. The costs that publishers incur for 352 participation in that system were considered. Considerable time was 353 spent on AMP, a summary of that discussion can be found in Section 5. 355 We also considered the question of whether standardizing Web 356 Packaging confers credibility to aggregators exercising unwelcome 357 control over publisher content, or whether the technical safeguards 358 Web Packaging provides could allow aggregators to relax their 359 restrictions on the kinds of content they're willing to cache and 360 serve. No conclusions were drawn. 362 3.1. Incentives for Web Packages 364 Submissions to the Workshop indicated that the use of inducements 365 involving better placement and formatting of links to publisher 366 content had a significant effect on the uptake of related technology. 367 For example, in [DEPUYDT-NELSON]: 369 [...] The Washington Post has always placed a great deal of trust 370 in Google to represent its content--and their reward for doing so 371 is more traffic, which positively impacts the business. 373 During the Workshop, several online publishers indicated that if it 374 weren't for the privileged position in the Google Search carousel 375 given to AMP content, they would not publish in that format. 377 Publishers that do produce AMP said they see a non-trivial increase 378 in traffic as a result of deploying AMP content. For example, Yahoo 379 Japan reported a 60% increase in traffic as a result of deploying AMP 380 on Yahoo Travel [OTSU]. There was no data presented as to whether 381 this increase was due to better placement in Google Search results, 382 from the inherent benefits of the AMP cache, or the use of the AMP 383 format. 385 Anecdotal evidence was offered by another large publisher that saw a 386 10% drop in traffic as a result of accidentally disabling AMP 387 content. However, increases in traffic might not result in similarly 388 proportioned increases in revenue, as observed in [BREWSTER]. 390 3.2. Operational Costs 392 Several participants pointed out that introducing a new, parallel 393 format for Web content incurs operational costs. In particular, 394 supporting any new format - such as Web Packaging, Apple News, or 395 Facebook Instant Articles - requires not only initial development of 396 tooling (some generic, some specific to a site's requirements) but 397 also an ongoing investment in maintaining its operability. Some 398 participants expressed concern about the impact upon small publishers 399 with limited technical and financial resources, especially in the 400 current publishing climate. 402 Increased exposure from new formats might not always justify the 403 added expense of providing articles in that format [BREWSTER]. 404 However, a standardized format might help publishers reduce the cost 405 of maintaining multiple formats. 407 3.3. Content Regulation 409 The use of Web Packaging as a tool for avoiding censorship was not a 410 significant topic of discussion, except to note that publishers often 411 have regulatory requirements regarding removal or correction of 412 content. 414 Reference was made to the desire to remove videos of a recent 415 shooting [CHRISTCHURCH] and the potential difficulty in doing so if 416 content were available as Web Packages. Legal requirements to remove 417 content come from multiple angles: copyright violations, illegal 418 content, editorial corrections or errors, and right to erasure 419 provisions in the European Union General Data Protection Regulation 420 [GDPR] were mentioned. One participant speculated that making it 421 more difficult to remove material in this way might discourage 422 regulators from censoring content. 424 In this context, participants observed that it would be difficult to 425 create mechanisms to track and control content served as a Web 426 Package without compromising the stated goal of censorship 427 resistance. 429 3.4. Web Performance 431 Understanding the effect that Web Packaging might have on web 432 performance was a matter of some contention. 434 Some informal analysis from the Google Search deployment was 435 presented (later published in [AMP-PERF]) that showed significant 436 performance improvements in metrics related to navigation time 437 resulting from the combination of prefetch, prerendering, and the AMP 438 format. These results are suggestive of a possibility that Web 439 Packaging could provide some of that improvement on its own, but no 440 data was presented that apportioned the improvement among the three 441 components. 443 Though data was presented to demonstrate potential rather than be a 444 definitive result, discussions raised a number of questions that 445 suggest the need for further study. Attendees suggested that future 446 measurements consider the effect of signed bundles distinct from the 447 enhancements derived from the AMP format. Future research in this 448 area might also consider the effectiveness of different strategies on 449 devices with varying capabilities, bandwidth, power consumption 450 requirements, or network conditions. 452 Of particular interest is the additional work required to fetch and 453 render multiple web pages in prepation for navigation. This might 454 ultimately use fewer connections, but comes with an increased network 455 and CPU cost for clients. Some participants pointed out that 456 different clients or applications might require different tuning; for 457 example, when users have limited (or expensive) bandwidth, or for 458 sites with less clear knowledge about the use of outbound links. 460 Workshop participants also expressed interest in learning about the 461 effect of Web Packages on subsequent navigations within the target 462 site. 464 In discussion, some participants suggested that their experience 465 supported a theory that operating a cache at the linking site was 466 most effective and the additional work done prior to navigation in 467 terms of fetching and preparing content was what provided the most 468 gains; others suggested that the benefits inherent in the AMP format 469 was a dominant factor. 471 Understanding the complete effect of Web Packaging on web performance 472 will require further work. 474 4. Systemic Effects 476 It is not straightforward to estimate how a proposed technology 477 change might affect all of the parts of a system - including not only 478 other components but also things like end-user rights and the balance 479 of power between parties - ahead of time. To date, when evaluating 480 proposals, the IETF has generally focused on more immediate concerns, 481 such as interoperability and security. 483 Moreover, people often find new uses for successful standards 484 [SUCCESS] after they are deployed. It is rarely possible to 485 accurately predict all applications of a protocol or format, whether 486 they are harmful or beneficial. Refusing standardization only 487 impedes both outcomes. 489 With the understanding that predictions are difficult to make, there 490 was considerable speculation at the Workshop about the possible 491 effect of Web Packaging on the Web. Some of that speculation is 492 informed by experience, but that experience is necessarily limited in 493 scope. This section attempts to capture that discussion. 495 4.1. Consolidation 497 Concerns about the consolidation of power on the Internet have 498 significantly increased lately, as a result of several factors. 499 While the IAB, the Internet Society, and others are examining this 500 phenomenon to understand it better, it is nevertheless prudent to 501 consider whether proposals for changes to how the Internet works 502 favors or counters consolidation. Favoring entities with existing 503 advantages - like resources, size, or market share - is not 504 necessarily a factor that disqualifies a new proposal, but it needs 505 to be considered as a cost of enabling that technology. 507 While it isn't clear what all of the outcomes of adopting Web 508 Packaging would be, the Workshop revealed several concerns for 509 consolidation risks for all involved parties: users, publisher sites, 510 linking sites, and services they each rely on. 512 4.1.1. Consolidation of Power in Linking Sites 514 Several participants noted that Web Packaging's enablement of instant 515 navigation (Section 2.1) might advantage larger linking sites - such 516 as social networks or search engines - over smaller ones in the same 517 industry because doing so requires careful selections of which links 518 to optimize, so as not to create unneeded traffic. 520 For example, a news article often has many links, but not all of them 521 are equally likely to be followed. Deciding which ones to pre-fetch 522 requires considerable data collection and engineering, so this 523 technique might not be feasible for smaller entities. Additionally, 524 some participants noted that this technique favors sites that have a 525 linear set of ranked links, like search results; it is more difficult 526 to apply to a page of news (for example) because predicting what link 527 a user will follow is less obvious. 529 This technique also requires access to a cache with terms of use 530 compatible with the requirements of the site. It was pointed out 531 that the Google AMP Cache has policies that might be acceptable to 532 many, and there are other caches. Sites operated by entities other 533 than Google already use this cache, though it was observed that a 534 site that does not host its own cache suffers a minor performance 535 degradation. 537 4.1.2. Consolidation of Power in Publishers 539 Participants seemed to agree that if performance is strong enough 540 differentiator, the effective use of Web Packaging might turn out to 541 be a condition for success for online publishers. Google Search's 542 choice to privilege content that is served using HTTPS was pointed 543 out as showing that this sort of influence can be effective. 544 Equally, it is not necessarily the case that standardization of new 545 capabilities will affect such policies materially, as noted in 546 [YASSKIN]: 548 It seems unlikely that any decisions we make in a packaging or 549 distribution system will affect the considerations aggregators use 550 when deciding how to rank recommendations or the power this gives 551 them over publishers. 553 The most common concern raised in the discussion was the effect of 554 this technology on smaller publishers who might be less able to 555 optimize the packages they produce, where their primary 556 differentiation in the market has previously been the quality of 557 their content. 559 4.1.3. Consolidation of User Preferences 561 In typical operation of the Web, servers have an opportunity to 562 tailor content to the needs of their users. In contrast, a static 563 Web Package has few options for individualization, as the content is 564 generated once and used by many. 566 As a result, publishers noted that AMP provides less opportunity to 567 customize content for their customers. Their concerns included not 568 only personalizing content based on what they know about the user but 569 also optimizing the package for specific browsers. Other 570 participants observed in relation to this that Web Packaging might 571 also have a consolidating effect in the browser market. 573 Some participants brought up the possibility of customization by 574 providing multiple packages, including multiple variants of resources 575 in a single packet, or performing customization after the package was 576 loaded. However, other participants pointed out that all of these 577 options have negative side effects, either in complexity or reduced 578 performance arising from larger bundles or delayed customization. 580 4.2. Effect on Web Security 582 One session explored the impact of introducing a new security model 583 for the Web. Currently, sites rely on connection-oriented security 584 (provided by TLS [TLS]), but Web Packaging adds a limited form of 585 object security. That is, the package protects the integrity of a 586 message, rather than providing integrity and confidentiality for its 587 delivery. Object security is not a new concept in the context of the 588 Web; designs like SHTTP [SHTTP] are as old as HTTPS. Though the 589 intent is for Web Packaging to have a far more narrow applicability, 590 it provides fewer security guarantees than HTTPS, since it provides 591 only authentication, no confidentiality with respect to the cache, 592 and no assurance of liveness. 594 Object-based security - such as proposed in Web Packaging - allows 595 the use of content regardless of how it is obtained; some 596 participants noted that third parties gain greater control over the 597 distribution of content, reducing the ability of publishers to 598 retract or alter content over the validity period of signed content. 600 Another topic of discussion was composition attacks. In its proposed 601 form, Web Packaging only provides authentication of independent 602 resources, not a web page as a single unit, allowing an attacker to 603 control the composition of resources. This weakness was acknowledged 604 as a known shortcoming of the current proposal that would be 605 addressed. 607 The issue of managing the trade-off between control and performance 608 in caches arose. While participants recognized that problems with 609 resource composition already occur by accident - for example, when a 610 cache stores different versions of resources - Web Packaging allows 611 an attacker more direct control over what resources are available to 612 clients. 614 For example, an attacker might be able to cause content with a 615 security flaw to be used up to a week past the time that the defect 616 was fixed. 618 As an example of how Web Packaging might change the risk profile for 619 sites, participants discussed recovery from cross-site scripting 620 attacks. It is already the case that a brief exposure to this class 621 of attack can result in an attacker gaining persistent access, but 622 mechanisms exist that can be used to avoid or correct issues, like 623 cache validation and Clear Site Data [CLEAR-DATA]. These measures 624 are not available to clients unless they connect to the site. 626 The discussion pointed out that these concerns are not new or 627 uniquely enabled by Web Packaging. However, it was pointed out that 628 new features are routinely subject to higher security and privacy 629 expectations. In an example unrelated to Web Packaging but with 630 similar tradeoffs, shared compression of multiple resources has 631 significant performance benefits but comes with a risk of exposing 632 encrypted information through side-channels. Though it is possible 633 that sites can use shared compression without this exposure, shared 634 compression will likely only be enabled once it is clear that 635 measures to prevent accidental information exposure are understood to 636 be effective in a broad set of deployments. 638 The discussion also addressed the question of whether concerns might 639 equally apply to the typical use of a Content Distribution Network 640 (CDN) as a third-party provider of the content. Some participants 641 concluded that CDNs are typically in a contractual relationship with 642 the sites they serve and so are more likely to have their interests 643 aligned. 645 4.3. Privacy of Content 647 Discussion and submissions raised concerns regarding how serving 648 content using Web Packages might adversely affect privacy of 649 individuals. There are challenges here, but the very narrow 650 applicability of Web Packaging to what is effectively static content 651 limits the privacy risk. The conclusion was that provided sufficient 652 care is taken in implementation, use of Web Packages does not 653 substantially increase the information that an aggregator gains about 654 what content is consumed. 656 Concretely, an aggregator knows what content it serves in 657 anticipation of navigation. This is - at least in theory - 658 substantially the same as the content that the aggregator might 659 receive if it performed the navigation itself. Assuming that content 660 is stripped of personalization, the aggregator gains no new 661 information. 663 5. AMP Issues Unrelated to Web Packaging 665 On multiple occasions, discussion at the Workshop concentrated on 666 problems that arise as a result of constraints on the AMP format or 667 details of its inclusion in Google Search. For instance, the 668 requirement to make metadata about pages to be exposed by pages is 669 unlikely to be affected by any standardization of a packaging format 670 as that requirement is independent of the process of delivering 671 content. 673 This section provides some detail on aspects of the discussion that 674 touched on AMP more generally in this way. Some treatment of these 675 points is considered relevant as some of the discussion at the 676 workshop, even under the remit of discussing Web Packaging, 677 concentrated on the effect of AMP on the ecosystem. 679 Note: Of the four formats mentioned in the workshop call for papers 680 [CFP], only AMP sent representatives to the workshop. The 681 discussion was therefore concentrated around AMP; this section 682 should not be read to imply anything about other formats. 684 Discussion and submissions referred to a commitment [AMP-LESSONS] to 685 allow publishers to use content that met specific criteria to access 686 privileged positions in search results, regardless of their adoption 687 of AMP. Participants felt that this approach might address some of 688 these concerns if it were adopted and durable. For instance, the use 689 of Web Packaging might be sufficient to remove some constraints on 690 active content on the basis that the active content would be 691 attributed to the publisher and not the AMP cache. 693 5.1. AMP Governance 695 There was interest from workshop participants in the governance model 696 used for AMP. In particular, the question of how independent the AMP 697 project would be of Google and Google Search. 699 Three of the seven members of the AMP Technical Steering Committee, 700 the body that governs AMP, are Google employees, which gives Google 701 considerable influence over the project. It was asserted that the 702 governance structure was intended to be more independent of Google 703 over time. The understanding was that any consumer of the format, 704 such as Google Search, would make an independent assessment about 705 whether to use or require different aspects of the AMP project 706 products. 708 5.2. Constraints on the AMP Format 710 Sites often implement AMP by creating a separate set of content in 711 parallel to their regular HTML content. Publishers noted this as a 712 high cost, particularly for smaller sites. It was pointed out that 713 websites can serve AMP-compliant content exclusively. However, 714 several publishers referred to limitations in the format that made it 715 unsuitable for their needs. 717 Many cited reasons for this duplication were related to the necessity 718 of running arbitrary active content (typically, JavaScript). For 719 example: 721 o AMP provides a framework for supporting user authentication, but 722 publishers asserted that using this framework was not considered 723 practical. 725 o AMP content does not support rendering of certain content, which 726 can affect the ability of publishers to innovate in how they 727 produce content. 729 o The AMP model for the implementation of paywalls (Section 5.4) was 730 claimed to be inimical to some publisher business models. 732 More broadly, they considered AMP's constraints on the use of active 733 content as problematic, since they prevent the use of capabilities 734 that are provided on equivalent non-AMP pages. Reference was made to 735 a proposed element - which has since been made fully 736 available - that seeks to provide limited access to some dynamic 737 content. 739 5.3. Performance 741 Publishers observed that using the AMP format does not provide any 742 guarantee of performance gains and in some cases could contribute to 743 performance degradation. It was suggested that this was most 744 problematic for sites that are already well-tuned for performance. 746 5.4. Implementation of Paywalls 748 The use of "paywalls" by Web publishers to control access to content 749 in return for payment is increasingly common. One popular approach 750 is to offer a limited number of articles without payment while 751 insisting on a paid subscription to access further articles. 753 On several occasions, participants expressed dissatisfaction with the 754 difficulty of integrating paywall authorization when using AMP. In 755 particular, they said AMP encourages publishers to include an 756 article's full content, hidden by default but easily accessible to 757 motivated users. The discussion extended to workarounds like cookie 758 syncing [COOKIE-SYNC] that is used as part of authorization, a 759 consequence of having cached content hosted on the linking site 760 rather than the target site. 762 The same topic came up concerning book publication, where publishers 763 indicated that having a means of enabling different methods of 764 distribution without also facilitating unconstrained copying of book 765 content was necessary. 767 This conflation of AMP issues with those addressed by Web Packaging 768 was recurrent in the discussion. As observed in [DAS], these 769 concerns might be addressed by linking to a signed bundle. 771 6. Security Considerations 773 Proposals discussed at the Workshop might have a significant security 774 impact, and these topics were discussed in some depth; see 775 Section 4.2. 777 7. References 779 7.1. Informative References 781 [ALAM] Alam, S., Weigle, M., Nelson, M., Klein, M., and H. Van de 782 Sompel, "Supporting Web Archiving via Web Packaging", June 783 2019, . 786 [AMP-LESSONS] 787 Ubl, M., "Standardizing lessons learned from AMP", March 788 2018, . 791 [AMP-PERF] 792 Steinlauf, E., "The Speed Benefit of AMP Prerendering", 793 August 2019, . 796 [AOLOG] Haber, S. and W. Stornetta, "How to time-stamp a digital 797 document", Journal of Cryptology Vol. 3, 798 DOI 10.1007/bf00196791, 1991. 800 [BERJON] Berjon, R., "ESCAPE: The New York Times Position", July 801 2019, . 804 [BREWSTER] 805 Brewster, A., "ESCAPE Position / Patch.com", June 2019, 806 . 809 [BUNDLE] Yasskin, J., "Web Packaging", draft-yasskin-dispatch-web- 810 packaging-00 (work in progress), June 2017. 812 [CFP] IAB, ., "Exploring Synergy between Content Aggregation and 813 the Publisher Ecosystem Workshop 2019", May 2019, 814 . 817 [CHATHAM-HOUSE] 818 Chatham House, "Chatham House Rule", n.d., 819 . 821 [CHRISTCHURCH] 822 Stevenson, R. and J. Anthony, "'Thousands' of Christchurch 823 shootings videos removed from YouTube, Google says", March 824 2019, . 828 [CLEAR-DATA] 829 West, M., "Clear Site Data", W3C Working Draft, November 830 2017, . 832 [COOKIE-SYNC] 833 Acar, G., Eubank, C., Englehardt, S., Juarez, M., 834 Narayanan, A., and C. Diaz, "The Web Never Forgets", 835 Proceedings of the 2014 ACM SIGSAC Conference on Computer 836 and Communications Security - CCS '14, 837 DOI 10.1145/2660267.2660347, 2014. 839 [CRAMER] Cramer, D., "Packaging Books", June 2019, 840 . 843 [DAS] Das, S., "The Implication of Signed Exchanges on 844 E-Commerce", June 2019, . 848 [DEPUYDT-NELSON] 849 DePuydt, M. and M. Nelson, "Signed Exchanges and The 850 Importance of Trust in Aggregator/Publisher 851 relationships", June 2019, . 854 [GDPR] European Union, "General Data Protection Regulation", EU 855 Regulation 2016/679, April 2016, . 859 [HTTP] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 860 Protocol (HTTP/1.1): Message Syntax and Routing", 861 RFC 7230, DOI 10.17487/RFC7230, June 2014, 862 . 864 [MEMENTO] Van de Sompel, H., Nelson, M., and R. Sanderson, "HTTP 865 Framework for Time-Based Access to Resource States -- 866 Memento", RFC 7089, DOI 10.17487/RFC7089, December 2013, 867 . 869 [ORIGIN] Barth, A., "The Web Origin Concept", RFC 6454, 870 DOI 10.17487/RFC6454, December 2011, 871 . 873 [OTSU] Ohtsu, S., "Deployment Experience of Signed HTTP Exchanges 874 with AMP as a Publisher", June 2019, . 877 [SHTTP] Rescorla, E. and A. Schiffman, "The Secure HyperText 878 Transfer Protocol", RFC 2660, DOI 10.17487/RFC2660, August 879 1999, . 881 [SUCCESS] Thaler, D. and B. Aboba, "What Makes for a Successful 882 Protocol?", RFC 5218, DOI 10.17487/RFC5218, July 2008, 883 . 885 [SXG] Yasskin, J., "Signed HTTP Exchanges", draft-yasskin-http- 886 origin-signed-responses-06 (work in progress), July 2019. 888 [TAG-DC] Betts, A., "Distributed and syndicated content", July 889 2017, . 892 [TLS] Rescorla, E., "The Transport Layer Security (TLS) Protocol 893 Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, 894 . 896 [YASSKIN] Yasskin, J., "Chrome's position on the ESCAPE workshop", 897 June 2019, . 900 7.2. URIs 902 [1] https://amp.dev/ 904 [2] https://schema.org/ 906 [3] https://developers.google.com/amp/cache/ 908 Appendix A. About the Workshop 910 The ESCAPE Workshop was held on 2019-07-18 and the morning of 911 2019-07-19 at Cisco's facility in Herndon, Virginia USA. 913 Attendees to the Workshop were asked to submit position papers. 914 These papers are published on the IAB website [CFP]. 916 The Workshop was conducted under Chatham House rule [CHATHAM-HOUSE], 917 meaning that statements cannot be attributed to individuals or 918 organizations without explicit authorization. 920 A.1. Agenda 922 This section outlines the broad areas of discussion on each day. 924 A.1.1. Thursday 2019-07-18 926 Web Packaging Overview:­­­­ A technical summary 927 of Web Packaging was provided, plus a longer discussion of a range 928 of use cases. 930 Web Packaging and Aggregators: The use of web packaging from the 931 perspective of a content aggregator was given. 933 Web Packaging and Publishers: After a break, presentations from web 934 publishers talked about the benefits and costs of Web Packaging. 935 This included some discussion of the effect of developing AMP- 936 conformant versions of content from a publisher perspective. 938 Web Packaging and Security: This session concentrated on how the Web 939 Packaging proposal might affect the Web security model. 941 Alternatives to Web Packaging: This session looked at alternative 942 technologies, including those that were attempted in the past and 943 some more recent ideas for addressing the use case of making web 944 navigations more performant. 946 A.1.2. Friday 2019-07-19 948 Web Archival: This session talked about the potential application of 949 a technology like Web Packaging in addressing some of the myriad 950 problems faced by web archival systems. 952 Book Publishing: A discussion of the effect of technologies for 953 bundling and distribution of books. 955 Conclusions: A wrap up session attempted to capture key learnings 956 from the Workshop. 958 A.2. Workshop Attendees 960 Attendees to the Workshop are listed with their primary affiliation 961 as it appeared in submissions. Attendees from the program committee 962 (PC), the Internet Architecture Board (IAB), and Internet Engineering 963 Steering Group (IESG) are also marked. 965 o Sawood Alam, Old Dominion University 967 o Jari Arkko, Ericsson (IAB) 969 o Richard Barnes, Cisco 971 o Robin Berjon, New York Times (PC) 973 o Zack Bloom, Cloudflare 975 o Abraham Brewster, Patch.com 977 o Alissa Cooper, Cisco (IESG, IAB) 979 o Dave Cramer, Hachette Book Group 981 o Melissa DePuydt, Washington Post 983 o Levi Durfee, AMP Advisory Committee 985 o Rudy Galfi, Google 987 o Joseph Lorenzo Hall, Center for Democracy & Technology (PC) 989 o Matthew Nelson, Washington Post 990 o Michael Nelson, Old Dominion University 992 o Mark Nottingham, Fastly (IAB, PC) 994 o Shigeki Ohtsu, Yahoo 996 o Eric Rescorla, Mozilla 998 o Adam Roach, Mozilla (IESG) 1000 o Rich Salz, Akamai Technologies 1002 o Wendy Seltzer, W3C 1004 o David Strauss, Pantheon (PC) 1006 o Chi-Jiun Su, Hughes 1008 o Ralph Swick, W3C 1010 o Martin Thomson, Mozilla (IAB, PC) 1012 o Jeffrey Yasskin, Google 1014 o Dan York, Internet Society 1016 o Benjamin Young, John Wiley & Sons 1018 Appendix B. Web Packaging Overview 1020 Web Packaging is comprised of two separate technologies: resource 1021 bundling [BUNDLE] and signed exchanges [SXG]. 1023 In both the submissions and Workshop discussion, the most 1024 controversial aspect of the technology is the use of signed exchanges 1025 as an alternative means of providing authority over a particular 1026 resource, for a few different reasons. 1028 This appendix explains how authority works on the Web and how Web 1029 Packaging proposes to change that. 1031 B.1. Authority in HTTPS 1033 The web currently uses HTTPS [HTTP] to establish a server's authority 1034 - that is, to give an assurance that the content came from where the 1035 URL implies. The combination of URI scheme (https), domain name (or 1036 host), and port number are formed into a single identifier, the 1037 origin [ORIGIN] to which content is attributed. 1039 Web browsers use the certificate offered as part of a TLS connection 1040 [TLS] to servers in determining whether a server is authoritative for 1041 that origin; see [ORIGIN] and Section 9.1 of [HTTP]. Content is 1042 attributed to a given URL only if it is received from a connection to 1043 a server that is authoritative for the associated origin. 1045 As an example, a web browser seeking to load "https://example.com/ 1046 index.html" makes a TLS connection to a server. As part of the TLS 1047 connection establishment, the server offers a certificate for the 1048 name "example.com". If the browser accepts the certificate, it will 1049 then make requests for URLs on the "https://example.com" origin on 1050 that connection and consider any answers the server to be 1051 authoritative. 1053 This notion of authority is a crucial property of web security: only 1054 content that is attributed to the same web origin can access all 1055 information in that origin, including the content of most resources 1056 as well as state associated with the origin, such as cookies. This 1057 separation ensures that sites can keep secrets from each other, even 1058 when they are both loaded in the same browser. 1060 B.2. Authority in Web Packaging 1062 Web Packaging, through the use of signed exchanges, aims to provide 1063 an alternative means of establishing authority. A signed exchange is 1064 an expression of an HTTP request and response (an exchange) with 1065 certain information stripped and a digital signature applied. 1067 The signature is made with a similar certificate to the one a server 1068 might offer in HTTPS - that certificate can also be used for HTTPS - 1069 but it includes a special attribute that denotes its suitability for 1070 signed exchanges. 1072 A web browser that has been provided with a signed exchange can 1073 verify the signature, and - if the signature is valid and the 1074 certificate is acceptable - use the content from the signed exchange. 1075 Critically, the web browser does not make an HTTPS connection to a 1076 server to get the content or to verify the signature. 1078 In effect, Web Packaging moves from a model where authority is 1079 derived from the delivery method (i.e., TLS) to an object security 1080 model, where authority is derived from a signature on objects. In 1081 doing so, it aims to render the means of delivery irrelevant to 1082 determinations of security. 1084 B.3. Applicability 1086 Web Packaging does not claim to supplant the authority model of the 1087 Web completely, but to provide an alternative that might be used 1088 under certain narrow conditions. In particular, Web Packaging is 1089 intended for use with content that is not secret from an entity that 1090 is aware of the existence of that content. 1092 In aid of this goal, web packaging does not include information from 1093 exchanges that is related either the process of acquiring content as 1094 well as any information that relates to individual requests. For 1095 instance, use of the Set-Cookie header field is expressly forbidden, 1096 as it often contains information that is related to a particular 1097 user. 1099 B.4. The AMP Format, Google Search Results, and Web Packaging 1101 The relationship between the AMP Project https://amp.dev/ [1] and Web 1102 Packaging is complicated. The AMP Project, sponsored by Google, 1103 establishes a profile of HTML with a stated goal of providing support 1104 for the best practices for the format, with a strong emphasis on 1105 performance. The format tightly constrains the use of HTML features 1106 but also offers a library of components that provide sanitized 1107 implementations of many commonly used capabilities. 1109 The connection to Web Packaging is bound up in the way that Google 1110 Search treats AMP content specially. AMP content provides two 1111 properties that Google Search exploits: metadata exposure and static 1112 analysis of active content. 1114 AMP content provides metadata in a form that can be reliably 1115 extracted, using the microformats defined by the Schema.org project 1116 https://schema.org/ [2]. This aspect of AMP has no effect on the 1117 discussion, except to the extent that this relates to Google Search 1118 and their use of this metadata in populating the carousel. 1120 Constrained use of active content - such as JavaScript - in AMP makes 1121 it possible to analyze content to verify that actions taken are 1122 narrowly limited. This static analysis assures that AMP content can 1123 be served without affecting other content on the same site. For 1124 Google Search, this is what enables the loading of AMP content 1125 alongside search content and other AMP resources. 1127 To provide preloading, Google operates an AMP Cache 1128 https://developers.google.com/amp/cache/ [3], from which AMP content 1129 is served. As a consequence, browsers attribute the content to the 1130 origin [ORIGIN] of the AMP Cache and not the publisher, creating some 1131 confusion about how content is attributed, as discussed in the W3C 1132 finding on distributed content [TAG-DC]. 1134 An important goal of Web Packaging is to attribute content loaded 1135 from a cache, such as the AMP cache, to the publisher that created 1136 that content. For more on this see Section 2.1. 1138 Authors' Addresses 1140 Martin Thomson 1142 Email: mt@lowentropy.net 1144 Mark Nottingham 1146 Email: mnot@mnot.net