idnits 2.17.1 draft-deen-daigle-ggie-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (October 24, 2016) is 2741 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group G. Deen 3 Internet-Draft NBCUniversal 4 Intended status: Informational L. Daigle 5 Expires: April 27, 2017 Thinking Cat Enterprises LLC 6 October 24, 2016 8 Glass to Glass Internet Ecosystem Introduction 9 draft-deen-daigle-ggie-02 11 Abstract 13 This document introduces the Glass to Glass Internet Ecosystem 14 (GGIE). GGIE's purpose is to improve how the Internet is used create 15 and consume video, both amateur and professional, reflecting that the 16 line between amateur and professional video technology is 17 increasingly blurred. Glass to Glass refers to the entire video 18 ecosystem, from the camera lens to the viewing screen. As the name 19 implies, GGIE's scope is the entire video ecosystem from capture, 20 through the steps of editing, packaging, distributed and searching, 21 and finally viewing. GGIE is not a complete end to end architecture 22 or solution, it provides foundational elements that can serve as 23 building blocks for new Internet video innovation. 25 This is a companion effort to the GGIE W3C Taskforce in the W3C Web 26 and TV Interest Group. 28 This document is being discussed on the ggie@ietf.org mailing list. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on April 27, 2017. 47 Copyright Notice 49 Copyright (c) 2016 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 66 3. Motivation: Video is filling up the pipes . . . . . . . . . . 4 67 4. Video is different . . . . . . . . . . . . . . . . . . . . . 5 68 5. Historical Approaches to supporting Video on the Internet . . 6 69 5.1. Video as an application . . . . . . . . . . . . . . . . . 6 70 5.2. Video as a network problem . . . . . . . . . . . . . . . 7 71 5.3. Video Ecosystem Encapsulation . . . . . . . . . . . . . . 7 72 6. Problem Statement and Solution Criteria . . . . . . . . . . . 8 73 7. The Glass to Glass Internet Ecosystem: GGIE . . . . . . . . . 8 74 7.1. Related work: W3C GGIE Taskforce . . . . . . . . . . . . 9 75 8. GGIE work of relevance to the IETF . . . . . . . . . . . . . 9 76 8.1. Affected IETF work areas . . . . . . . . . . . . . . . . 9 77 8.2. Example use cases . . . . . . . . . . . . . . . . . . . . 9 78 8.3. Core GGIE elements . . . . . . . . . . . . . . . . . . . 11 79 9. Conclusion and Next Steps . . . . . . . . . . . . . . . . . . 15 80 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 15 81 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 82 12. Security Considerations . . . . . . . . . . . . . . . . . . . 15 83 12.1. Privacy Concerns . . . . . . . . . . . . . . . . . . . . 15 84 13. Normative References . . . . . . . . . . . . . . . . . . . . 16 85 Appendix A. Overview of the details of the video lifecycle . . . 16 86 A.1. Media Lifecycle . . . . . . . . . . . . . . . . . . . . . 16 87 A.2. Video is not like other Internet data . . . . . . . . . . 19 88 A.3. Video Transport . . . . . . . . . . . . . . . . . . . . . 21 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 91 1. Introduction 93 In terms of shear bandwidth, the Internet's largest use, without any 94 close second competitor, is video. This is thanks to the 95 proliferation of Internet connected devices capable of capturing and/ 96 or watching streamed video. As of 2015 there are reports that 97 YouTube users upload over 500 hours of video every minute, and that 98 during evening hours NetFlix accounts for a staggering 50+% of 99 Internet traffic. The number of users using the Internet for both 100 ends of the video create-view lifecycle grows daily worldwide, and 101 this is creating an enormous strain on the underlying Internet 102 infrastructure at nearly every point from the core to the edge. 104 While video is one of the most conceptually simple uses of the 105 Internet, it is perhaps one of the most complex technically, built 106 from standards created by a large number of organizations and groups 107 some dating from before the modern Internet even existed. Many 108 critical parts of this complex ecosystem were not created with either 109 video's particular characteristics or vast scale of popularity in 110 mind. This has lead to both the degradation of the viewer experience 111 and many Internet policy issues around access to bandwidth for video 112 and the needed infrastructure to support the continued explosion in 113 video transport on the Internet. 115 The pace of video growth has been faster than new bandwidth for the 116 past many years, and all indicators are that, instead of abating, it 117 is actually accelerating as new users, new ways of sharing video, and 118 new types of video continue to be added. The Cisco Visual Networking 119 Index an excellent source of detail on this subject. 121 The combined current high levels of bandwidth consumed by video, plus 122 the accelerating pace of video's growth mean that to meet users' 123 demand for video, we must do more than simply rely on adding more 124 bandwidth. While other traditional improvements such as more 125 efficient codecs with better compression ratios are expected to 126 contribute to keep video flowing on the Internet, many in the 127 Internet video technology world have explored options to see if any 128 new approaches could be added to the mix to help the problem. That 129 was the motivation behind the creation of the GGIE Taskforce within 130 the W3C in 2014 with the charter to examine the end to end video 131 ecosystem and identify new areas of opportunity to improve video's 132 use of the Internet. 134 The W3C GGIE taskforce explored ways that video uses the Internet and 135 developed a series of use cases detailing specific scenarios ranging 136 from video capture, the editing and production cycle, through to 137 delivery to viewers. Out of these use cases there emerged a 138 recognition that there might be a new opportunity to improve Internet 139 video by enabling edge devices, and the underlying network to more 140 actively participate in making delivery optimization choices beyond 141 the simple ways the do currently. 143 The GGIE approach is to apply and evolve existing technologies to the 144 task of optimizing Internet video transport to permit applications, 145 video devices, and the network to more actively participate in making 146 smart access and transport decisions. This approach recognizes that 147 there are already extensively-deployed video infrastructure elements 148 that need to continue to work and be part of the optimized video 149 ecosystem. These deployed devices, applications, players, and tools 150 are responsible for the already high levels of video bandwidth 151 consumption, and to only address new devices would not be solving the 152 larger, most important problem. This is why GGIE is an evolution of 153 how video uses the Internet, and not a revolution involving wholesale 154 replacement of existing architecture or protocols. 156 GGIE is not a complete solution to the video problem. It provides 157 foundational building blocks that are intended to be used by 158 innovators in their work to create new optimizations, and novel 159 techniques to help address the video problem in the long term. 161 GGIE initially proposes a simple framework of three components that 162 will permit improved playback device identification of viewing 163 sources and enable network level awareness of video transport and new 164 cache selection chocies. GGIE proposes: Using existing content 165 identifiers as a means to identify a work, or title; Data level 166 identifiers to identify the encoded video data for a particular 167 manifestation of the title; A mapping service that permits bi- 168 directional resolution of these identifiers. 170 This document outlines the basic proposal for these three base GGIE 171 components and introduces the overall GGIE approach to evolving the 172 current video ecosystem by introducing basic standardized building 173 blocks for innovators to build upon the Glass to Glass Internet 174 Ecosystem. 176 2. Terminology 178 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 179 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 180 document are to be interpreted as described in [RFC2119]. 182 3. Motivation: Video is filling up the pipes 184 The growth in video bandwidth need is exceeding the growth in the 185 bandwidth provisioning. This trend is in fact accelerating, meaning 186 the growth rate of video is growing faster than the growth rate of 187 provisioning. Traditional techniques of caching, higher efficiency 188 codecs, etc, are all being used to help address the probiem and have 189 helped the Internet to continue to support the growth of video thus 190 far. 192 Video has been the top use of Internet bandwidth for several years 193 and is larger than the bandwidth used by all other applications 194 combined. This trend is unlikely to ease or reverse itself as users 195 of the Internet continue to make Internet transported video one of 196 their top uses of the Internet, either for uploading and sharing 197 video they creator, or as a primary sources for viewing video to a 198 wide variety of viewing devices: computers, tablets, phones, 199 connected televisions, game consoles, and AV receivers. 201 Adding to user demand, video itself is continually experiencing 202 innovation introducing ever higher resolutions (SD, HD, 4K, 8K...), 203 higher video quality, new distribution services (live one to many 204 streaming), and new user uses. The Cisco Visual Networking Index 205 projects that by 2019 there will be nearly a million minutes of video 206 per second transported by the Internet, a making up 80-90 percent of 207 all IP traffic. 209 The movitation behind GGIE is to help find new methods that can be 210 brought to bear, in addition to all the exiting ones, to help manage 211 the explosion in Internet video. 213 4. Video is different 215 Video is different than other uses of the network due to its combimed 216 high bandwidth demands and high sensitivity to latency and dropped 217 packets. Streaming of basic high-definition 1080p requires bandwidth 218 in the low Mbps translating into Gigabytes for each hour of video, 219 all transported with consistent low latency and very little packet 220 loss in order to deliver a suitable watching experience the viewer. 221 This differentiates video from other Internet applications as some 222 have low latency and packet loss requirements but don't need high 223 bandwidth, while others may demand high bandwidth, they will tolerate 224 high latency and dropped packets. An email user can tolerate an 225 extra moment to retransmit dropped packets, and a web page user can 226 tolerate a slow DNS lookup, but a video viewer sees latency and 227 dropped packets as jittery playback and low bandwidth as a 228 fundamental barrier to streaming at all. From the user's perspective 229 the network has faield to meet their need. (Audio has similar 230 challenges in terms of intolerance of delay and jitter, but the data 231 sizes are significantly smaller). 233 Video data sizes continue to grow at roughly 4x per format iteration 234 as cameras and playback devices are able to capture and display 235 higher quality images. Early digital video was often captured at 236 either 320x240 pixel resolution or 640x480 standard definition 237 resolution. High definition or HD video at 1920x1080 became possible 238 on some parts of the Internet after 2011, although even in 2016 it 239 remains unavailable or unreliable through many connections such as 240 DSL and many mobile networks. Camera and player technologies are 241 currently expanding again to permit 4K or 3840x2160 pixel resolution 242 reflecting a 4x data increase over HD. 244 Streaming is very demanding, requiring consistent frame to frame 245 playback in consistent constant time. Advanced features such as 246 pause, fast forward, rewind, slow motion, and fine scrubbing are 247 considered by users as standard features in players that the network 248 must support and serve to further the challenge facing the Internet. 250 New video abilities such as live streaming by users (both one to one 251 and one to many) bring what has traditionally been done by 252 professional broadcasters with dedicated broadcast infrastructure 253 into the realm of every day users with connected smartphones using 254 the Internet as a real-time global broadcast infrastructure. 256 5. Historical Approaches to supporting Video on the Internet 258 5.1. Video as an application 260 Internet video engineering began by adapting preexisting standards 261 used for over the air broadcast (OTA) and physical media. Video 262 encodings, such as AVI and MPEG2, originally designed for playback 263 from local storage connected to the player where added to the data 264 types carried by existing protocols like HTTP, and new protocols such 265 as RTSP and HLS. Early use of the Internet for video was a copy-and- 266 play model replacing the use of OTA broadcast and physical media to 267 copy video between systems. 269 As Internet bandwidth became sufficient to allow delivery of video 270 data at the same rate it was being decoded, it became possible to 271 stream video originally at very low resolutions such as 160x120 272 pixels (19.2 kilopixels), eventually permitting standard definition 273 (SD) 640x480 pixels (0.3 megapixels), and later high definition of 274 1920x1080 pixels (2 megapixels). This trend continues with some 275 providers beginning to offer 4K or 3840x2160 pixels (8.3 megapixels) 276 requiring very reliable and generous Internet bandwidth end to end 277 connection between the viewer and source. 279 Unlike the Web, email, and network file sharing which have been 280 engineered and standardized in Internet focused organizations such as 281 the W3C and IETF, video is dependent on standards developed by a very 282 large number of groups, companies, and organizations which include 283 the IETF, W3C but also MPEG, SMPTE, CTA, IEEE, ANSI, ISO, networking 284 and technology companies, many others. In contrast to the extensive 285 end to end expert knowledge and engineering done to create the Web 286 and email, Internet video has largely been an evolved cobbling and 287 adaption exercise done by engineers with their focus on a few, or 288 one, particular aspect or problem at a time, and little interaction 289 between other parts of the Internet video ecosystem. While it is 290 very much possible to deliver video over the Internet, this 291 uncoordinated cobbling has resulted in many areas of inefficiency 292 where engineering done from an end to end perspective could provide 293 the opportunity to vastly improve how video uses the Internet, which 294 offers the hope of improving the quality of video and increasing the 295 amount of video which can be delivered. 297 5.2. Video as a network problem 299 Network, video, and application engineers have constructed elaborate 300 solutions for dealing with bandwidth and processing limitations, 301 network congestion, lossy transport protocols, and the ever growing 302 size of video data. These solutions commonly fall into one of 303 several solution types: 305 1. Reducing data sizes through resolution changes, compression, and 306 more efficient encodings 308 2. Downloading before playing instead of real-time streaming 310 3. Positioning the data close to the viewer via caches, typically on 311 the network edge 313 4. Fetching of video data at a rate faster than playback 315 5. Transport protocols that attempt to deliver video data such that 316 the data arrives as if it were done on a congestion free/lossless 317 network 319 6. Dynamic reselection of sources and transport routes on either a 320 real-time or frequent intervals, 10-15 seconds, using player 321 feedback mechanisms or network telemetry 323 5.3. Video Ecosystem Encapsulation 325 The current delivery ecosystem for video has been primarily developed 326 at the higher application layers of the stack. While there has been 327 some video work done at lower levels such as general-purpose 328 transport improvements, caching protocols in CDNi, various 329 multicasting approaches, and other efforts, the majority of video- 330 specific work has previously been done by groups such as ISO's Moving 331 Pictures Expert Group (MPEG) which have focused on codecs and codec 332 transport optimized for use on the Internet. These efforts have made 333 video possible on the Internet, but they have done so largely while 334 treating the underlying network as a basic transporter of data. This 335 has resulted in little information being exposed to the network, 336 information that could be used to optimize delivery of the video, and 337 in an architecture that pushes more and more of the intelligence into 338 an ever more complex and isolated core. 340 The current video model benefits from a significant amount of 341 operational, feature, and protocol encapsulation that has come about 342 due to different groups working independently on the components that 343 make it up. Like any system in which distinct pieces are well 344 encapsulated from one another, this means it is possible to engage in 345 improvements at the networking layer without the need to coordinate 346 with higher levels of the video architecture. 348 6. Problem Statement and Solution Criteria 350 At its most basic the problem to be solved for video delivery is how 351 to simultaneous maximize all of the following conditions: The number 352 of viewing devices simultaneously supported by the network; The 353 quality of video as measured by bit-rate and resolution; The number 354 of distinct unique streams that can be delivered. 356 Solution Constraints 358 1. Bandwidth growth alone is not a solution 360 2. Codec efficiency improvements alone are not a solution 362 3. Existing devices, infrastructure, video delivery techniques must 363 as much as possible continue to be supported and benefit from new 364 solutions. 366 7. The Glass to Glass Internet Ecosystem: GGIE 368 GGIE is an effort to improve video's use of the Internet by examining 369 the end to end video ecosystem from the glass lens of the camera 370 through to the glass the screen, and to identify areas of 371 simplifications, standardization, and reengineering to make better 372 use of bandwidth enabling smarter network use by video creators, 373 distributors, and viewers. GGIE is focused on how video uses the 374 Internet, and not on how it is encoded or compressed. Likewise GGIE 375 does not deal with content protection. GGIE's scope however does 376 include creator and viewer privacy, content identification and 377 recognition as a means to enable smarter network usage, edge caching, 378 and discoverability. 380 GGIE benefits from the encapsulation of the video ecosystem elements 381 enabling it to introduce evolutional features to elements without 382 disrupting other distinct encapsulated parts. 384 GGIE is intended to work with a wide variety of video encoding 385 codecs, and video distribution and transport protocols. While 386 examples using MPEG-DASH are used due to is pervasive use, GGIE is 387 not limited to MPEG-DASH or any other video distribution system or 388 codec. 390 Beyond improving the simple experience of a viewer using the Internet 391 to watch linear video, it is hoped that a set of improved Internet 392 video infrastructure standards will provide a foundation that permits 393 innovators to create the next generation of Internet video content 394 (such as multisource personalized composite experiences, interactive 395 stories, and live personal broadcasting, to name a few). 397 Due to the very diverse and large deployment of existing video 398 playback devices and infrastructure, it is viewed as essential that 399 any evolved ecosystem continues to work with the majority of the 400 legacy deployment without the need for updates or changes to the 401 existing ecosystem. 403 7.1. Related work: W3C GGIE Taskforce 405 A companion effort ran through 2015 in the W3C Web and TV Interest 406 Group's GGIE Taskforce. The W3C GGIE group developed a series of 407 use-cases on discovery, search, delivery, identity, and metadata 408 which can be found at https://www.w3.org/2011/webtv/wiki/GGIE_TF 410 8. GGIE work of relevance to the IETF 412 This section assumes a working familiarity with video creation and 413 consumption "life cycle". For reference, an overview has been 414 provided in the Appendix. 416 8.1. Affected IETF work areas 418 It is expected that significant improvement is possible in the video 419 transport ecosystem by modest evolution and adaption of existing 420 standards for addressing, transporting, and routing of video data 421 flows between sources and display. 423 8.2. Example use cases 425 The following example use case help illustrate the use of the GGIE 426 core elements 428 8.2.1. Alternate Source Discovery 430 Description: A video player is streaming a movie from a CDN cache in 431 the core of the network. This use case illustrates the use of a 432 media identifier to query a media address resolution service to 433 locate additional alternate sources that offer the same movie. 435 1. The video player user selects a movie to watch from a list using 436 the player application UI. 438 2. The video player application has the media identifier of the 439 movie in the metadata description of the movie. This identifier 440 is passed to the playback device when the movie selected. 442 3. The playback device send a search query to the Media Address 443 Resolution Service (MARS) which includes the media identifier, 444 and additional query parameters use to filter the results 445 returned. 447 4. The MARS server searches its database and returns all the Media 448 Encoding Networks matching the media identifier and filters the 449 results using the additional parameters submitted in the query. 450 Each Media Encoding Network represents a different encoding of 451 the video. 453 5. The player then examines the returned list of media encoding 454 networks and selects, from its perspecitve, the optimal source 455 for the title. 457 6. The player then directs its streaming requests to the selected 458 Media Encoding Network addresses to obtain the video data for the 459 movie. 461 7. The video data is decoded and displayed on the screen 463 8.2.2. Alternate Format Discovery 465 Description: A video player is streaming a movie, and wants to send 466 the audio to another device for playback. However, the current video 467 data being streamed does not contain any audio that matches the 468 codecs the audio device can play. The audio device uses the core 469 GGIE services to locate an alternate encoding of the movie that 470 contains audio it can decode. 472 1. The user directs the video player to send the audio portion of 473 the playing video to an external audio device. 475 2. The video player application passes the media idenfitier for the 476 video to the audio device as well as the media encoding network 477 address the video player is using. 479 3. The audio device begins streaming from the media encoding network 480 is was given, but discovers the data does not include audio that 481 is able to decode. 483 4. The audio device sends a search query to the Media Address 484 Resolution Service (MARS) which includes the media identifier, 485 and additional query parameters including the list of audio 486 codecs and language choice it is able to decode. 488 5. The MARS server searches its database and returns all the Media 489 Encoding Networks matching the media identifier and filters the 490 results to only those matching the language and audio codec 491 supplied in the search. 493 6. The audio player examines the returned list of media encoding 494 networks, selects a media encoding network and begins streaming 495 data from it. 497 7. The external audio player decodes the returned movie data and 498 plays it for the user. 500 8.3. Core GGIE elements 502 GGIE proposes three initial fundamental pieces: 504 1. Media Identifiers which identify the video at the title, or work 505 level; 507 2. Media Encoded Networks which are subnets used to reference the 508 encoded video data; 510 3. Media Address Resolution Service which maps Media Identifiers for 511 a title to the Media Encoded Networks containing the encoded 512 video versions of the title. 514 These three foundational elements help by exposing information that 515 can be used in selection in a way that is independent of the video 516 encoding and video data storage choice. It also enables more 517 sophisticated video use cases beyond the basic single device playing 518 a video stream from an origin server over a flow controlled protocol. 520 8.3.1. Media Identifiers 522 A Media Identifier is a URI that carries a content identifier system 523 declaration, and a content identifier from the system that refers 524 unambiguously to a work, or title. This can be any content 525 identification system, GGIE does not specify the system used. 527 For example, a media identifier for a title identified by an EIDR 528 value would include a declaration that the identifier is from EIDR, 529 and would additionally contain the EIDR value. 531 At the application level, such as UI program guide applications, 532 search engines, and metadata databases, it is the identification of 533 the work or identity of the video that is typically of interest and 534 not the encoding, bit-rate, or the location of CDN caches etc. For 535 example, a UI would indicate that "the Minions movie" as opposed to 536 "a 15 megabit per second, HEVC encode with high dynamic range and 537 Dolby encoded 7.1 English audio of the Minions movie". Those 538 additional technical details are important when choosing a particular 539 encoded manifestation of the movie for delivery, decode, and 540 playback, but they are not generally needed as information to be 541 presented to the user or used to make viewing choices. Such 542 technical information is used after the user has chosen the title to 543 watch, but is used by the playback device not the user in selecting 544 the video. Media Identifiers in GGIE contain only title information, 545 and not encoding information. 547 There are many media identifiers in use for both personal and 548 professional content, with new ones being introduced seemingly 549 weekly. To try to create a single identifier to either harmonize or 550 replace the others, repeatedly been proven in practice to be an 551 impossible task. Recognizing this, the GGIE instead proposes to 552 standardize a URI which would contain at least two fields: 1) A 553 scheme identifier; 2) An unambiguous title identifier (note: this is 554 unambiguous only within domain of the identified scheme). 556 For professional content, titles are increasingly identified with a 557 scheme called EIDR that can identify both master versions of works, 558 and edit level versions. Likewise, advertisments use a scheme called 559 AD-ID. 561 8.3.2. Media Address Resolution Service (MARS) 563 The media address resolution service (MARS) provides bidirectional 564 mapping of Media Identifiers to Media Encoding Networks. It is 565 queryable using a query protocol which returns any results matching 566 the terms of the query parameters. 568 A Media Identifier alone isn't sufficient to connect a device to a 569 video data source. The media identifier distinguishes the work, but 570 not the technical details of an instance of the work such as codec, 571 bit-rate, resolution, high dynamic range video, audio encoding, nor 572 does it include information about available streaming sources etc. 573 The Media Address Resolution Service (MARS) provides this 574 association. It can be queried with the Media Identifier, and 575 optional filtering parameters, and will return Media Encoding Network 576 addresses for instances of matching encodings of the work. 578 This translation is used commonly in video streaming services today. 579 The link provided in the program guide UI will include a unique 580 identifier for the work which is then mapped by the streaming service 581 backend into a URI containing a network identifier and other info 582 which point to a caching server and the media data files in the 583 cache. MARS generalizes this and make it available via query over 584 the network. 586 8.3.3. Media Encoding Networks (MEN) 588 Media Encoding Encoding Networks are arrangements of encoded video 589 data that are assigned addresses under a shared prefix and subnet 590 following a scheme appropriate for the encoding used by the video 591 data. Each Media Encoding Network instance represents a distinct 592 instance of a set of associated encodings for a work. Different 593 Media Encoded Network address assignment schemes would be defined 594 under GGIE to handle different encode data such as MPEG-DASH and HLS. 596 For example, a single MEN instance would hold each of the different 597 variable bit-rate encodes for a single encoding of a video If a new 598 encoding instance of the video was prepared, it would have separate 599 and distinct MEN assigned to it. 601 8.3.3.1. Example: Using Media Encoding Networks with MPEG-DASH 603 A very basic form a video delivery uses persistent connection from a 604 player to a video file source which then streams the video by 605 transmitting the video file data, byte by byte in sequence, from the 606 first byte of the file until the last. This trivial approach 607 requires the device to know the server IP address and port number to 608 connect to. Essentially this involved simply transporting the file 609 from the source to the playback device in byte order. 611 In practice simple file streaming is not used beyond local device to 612 device playing in home networks as it doesn't permit dynamic bit rate 613 selection, source or session fail over, or trick play (pause, skip 614 forward, skip backward) etc. Instead manifest files contain lists of 615 available servers holding MPEG-DASH encodings of the larger video 616 file divided into fragments containing short portions (e.g. 2-15 617 seconds) of the video called chunks by MPEG-DASH. (GGIE generalizes 618 the MPEG-DASH chunk term into the more general shards). Each shard 619 is a distinct file typically named to reflects the video encode it 620 belongs to, and it's sequence position. 622 For example the shards for MY-VIDEO might be names MY-VIDEO-001, MY- 623 VIDEO-002, ... MY-VIDEO-nnn. The player then requests the shards in 624 the order it wants them over a data transport protocol such as http, 625 with the translation of the actual data sent in response to requests 626 for the named shards being handled by the data server. 628 So under MPEG-DASH the player is sent a manifest file containing the 629 address of the data server and the shard name to request. The player 630 then iterates over the available shards in the order desired by the 631 user. The manifest then contains URI's with the SERVER-ADDRESS and 632 the CHUNK name. This file can be sent once per video play, or more 633 commonly is sent at an interval of ~15 seconds to permit the sending 634 CDN to customize for each player, and to respond quickly to changes 635 in the network delivery performance and availability. 637 Each shard request by the device involves a network level server IP 638 address and port number, and an application level shard name. The 639 network is thus able to manage the routing of request to the server, 640 and the routing of the response, but it lacks the information needed 641 to do anything else to help optimize the video data transport. 643 GGIE proposes using Media Encoding Networks an evolution of this that 644 has the benefit of being backward compatible with manifest files, 645 while enabling the transport network and video ecosystem to have more 646 information to the network about the video transport flowing over it. 648 Using Media Encoding Networks for MPEG-DASH will be described in 649 another Internet-Draft, but the basic proposal is to assign the 650 shards into a sequence of IP addresses organized to reflect the same 651 ordering association that the chunk names followed in the MPEG-DASH 652 scheme. These shard addresses form a Media Encoding Network, and 653 they expose to the network layer knowledge of the specific video data 654 being transported between requesting device and the file server 655 holding the data. 657 This in practice means that Media Encoding Network addresses refer to 658 the shard and not the server holding the shard. This then permits 659 the network to be involved in the routing of the request for the 660 shard, as opposed to the CDN preparing the manifest file. Among 661 other benefits, this permits the network to provide path failover 662 functionality beyond the CDN manifest. 664 This enables the network to be involved in shard source selection. 665 Consider the use case wherein the network becomes aware of a local 666 cache that holds the requested shard, and is closer to the device 667 than another cache deeper in the network. The network can direct the 668 request to the local cache and save the transit cost and bandwidth of 669 sending the request and response exchange with the deeper cache. 670 This can reduce network congestion as well as deliver faster 671 transport for the shard to the playback device. 673 8.3.4. Media Encoding Network Gateways 675 In this new approach, the server providing the shard data is possibly 676 better viewed as acting as a gateway to the shard addresses versus 677 being just a file server. In practical terms, existing CDN caches 678 can perform this role by mapping the requested shard address to the 679 on disk file containing the shard. However, new CDN caches can be 680 developed work directly with the Media Encoding Network scheme, and 681 can act as smart caches proactively provisioning data within the 682 Media Encoding Network address space. 684 9. Conclusion and Next Steps 686 GGIE seeks to help address this problem by establish standards based 687 foundational building blocks that innovators can build upon creating 688 smarter delivery and transport architectures instead of relying on 689 raw bandwidth growth to satisfy video's growth. 691 Next steps will include describing the working prototypes of the GGIE 692 core elements and more exended use cases addressed by GGIE many of 693 which were defined in the W3C GGIE Taskforce. 695 10. Acknowledgements 697 Contributions to this document came from Bill Rose, Gaurav Naik, John 698 Brzozowski. 700 11. IANA Considerations 702 None (yet). 704 12. Security Considerations 706 12.1. Privacy Concerns 708 The assignment of persistent IPv6 Prefixes to MEN permits the video 709 being streamed to be identified at the network level by observing the 710 destrination addreses sent from the player to the media gateway. In 711 situations where it is desired by the user to prevent this level of 712 observation is necessary to obscure the true MEN prefix of the video 713 being streamed. 715 12.1.1. Privacy via VPN 717 One remediation is the use of a VPN that will encapsulate and hide 718 the traffic between the player and the streaming cache, or at least 719 between the trusted network the player resides on and the streaming 720 cache network. This will make identification of the actual video 721 title from the open Internet during transit. 723 12.1.2. Session Prefix Renumbering 725 Another technique is to have the player and streaming cache remap the 726 IPv6 prefix for the streaming session to a new prefix. Under such a 727 renumbering the cache will advertise to the routing layer and respond 728 to requests sent from the player to the session prefix just as it 729 would to the original video MEN prefix. 731 13. Normative References 733 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 734 Requirement Levels", BCP 14, RFC 2119, 735 DOI 10.17487/RFC2119, March 1997, 736 . 738 Appendix A. Overview of the details of the video lifecycle 740 This section outlines the details of the video lifecycle -- from 741 creation to consumption -- including the key handholds for building 742 applications and services around this complex data. The section also 743 provides more detail about the scope and requirements of video (scale 744 of data, real-time requirements). 746 Note: this document only deals with streaming video as used by 747 movies, TV shows, news broadcasts, sports events, music concert 748 broadcasts, product videos, personal videos, etc. It does not deal 749 with video conferencing or WebRTC style video transport. 751 A.1. Media Lifecycle 753 The complex workflow of creating media and consuming it is 754 decomposable into a series of distinct common phases. 756 A.1.1. Capture 758 The capture phase involves the original recording of the elements 759 which will be edited together to make the final work. Captured media 760 elements can be static images, images with audio, audio only, video 761 only, or video with audio. In sophisticated capture scenarios more 762 than one device maybe simultaneously recording. 764 A.1.1.1. Capture Metadata 766 The creation of metadata for the element, and for the final video 767 begins at capture. Typical basic capture metadata includes Camera 768 ID, exposure, encoder, capture time, and capture format. Some 769 systems record GPS location data, assigned asset ids, assigned camera 770 name, camera spatial location and orientation. 772 A.1.2. Store 774 The storage phase involves the transport and storage of captured 775 elements data. During the capture phase, an element is typically 776 captured into memory in the capture device and is then stored onto 777 persistent storage such as disc, SD or memory card. Storage can 778 involve network transport from the recording device to an external 779 storage system using either storage over IP protocols such as iSCSI, 780 a data transport such as FTP, or encapsulated data transport over a 781 protocol such as HTTP. 783 Storage systems can range from basic disk block storage, to 784 sophisticated media asset libraries 786 A.1.2.1. Storage Metadata 788 Storage systems add to the metadata associated with media elements. 789 For basic block storage, a file name and file size is typical, as are 790 a hierarchical grouping, creation date, and last-access date. For 791 library system an identifier unique to the library is typical, as 792 well as grouping by one or more attributes, a time stamp recording 793 the addition to the library and a last access time. 795 A.1.3. Edit 797 Editing is the phase where one or more elements are combined and 798 modified to create the final video work. In the case of live 799 streaming, the edit phase maybe bypassed. 801 A.1.4. Package 803 Packaging is the phase in which the work is encoded in one or more 804 video and audio codecs. These maybe produce multiple data files, or 805 they may be combined into a single file container. Typically, 806 creation or registration of a unique work identifier, for example an 807 Entertainment Identifier from EIDR, is assigned in the packaging 808 phase. 810 A.1.4.1. Package Metadata 812 A.1.5. Distribute 814 The distribute phase is publishing or sharing the packaged work to 815 viewers. Often it involved uploading to a site such as YouTube, or 816 Facebook for social media, or sending the packaged media to streaming 817 sites such as Hulu. 819 It is common for the distribution site to repackage the video often 820 transcoding it to codecs and bitrates chosen by the distributor as 821 more efficient for their needs. Distribution of content expected to 822 be widely viewed often includes prepositioning of the content on a 823 CDN (Content Distribution Network). 825 Distribution involves delivery of the video data to the viewer. 827 A.1.5.1. Distribution Metadata 829 Distribution often adds or changes considerable amounts of metadata. 830 The distributor typically assigns a Content Identifier to the work, 831 that is unique to the distributor and their content management system 832 (CMS). Additional actions by the distributor such as repacking and 833 transcoding to new codecs or bitrates can require significant changes 834 to the media metadata. 836 A secondary use of distribution metadata is enabling easy discovery 837 of the content either through a library catalog, EPG (electronic 838 program guide), or search engine. This phase often includes 839 significant new metadata generation involving tagging the work by 840 genre (sci-fi, drama, comedy), sub-genre (space opera, horror, 841 fantasy), actors, director, release date, similar works, rating level 842 (PG, PG-13), language level, etc. 844 A.1.6. Discovery 846 The discovery phase is the precursor to viewing the work. It is 847 where the viewer locates the work either through a library catalog, a 848 playlist, an EPG, or a search. The discover phase connects 849 interested viewers with distribution sources. 851 A.1.6.1. Discovery Metadata 853 It is typical for discovery systems to parse media metadata to use 854 the information as part of the discovery process. Discovery systems 855 may parse the content to extract imagery and audio as additional new 856 metadata for the work to ease the viewers navigation of the discovery 857 process perhaps as UI elements. The system may import new externally 858 generated metadata about the work and associate it in its search 859 system, such as viewer reviews, metadata cross reference indices. 861 A.1.7. Viewing 863 The viewing phase encompasses the consumption of the work from the 864 distributor. For Internet delivered video it is typical for delivery 865 to involve a CDN to perform the actual delivery. 867 A.2. Video is not like other Internet data 869 Video is distinctly different from other Internet data. There are 870 many characteristics that contribute to video's unique Internet 871 needs. The most significant characteristics are: 873 1. large size of video data (Gigabytes per hour of video) 875 2. high bandwith demands (Mbps to Gbps) 877 3. low latency demands of streamed video 879 4. responsiveness to trick play requests by the user (stop, fast 880 forward, fast reverse, jump ahead, jump back) 882 5. multiplicity of formats and encodings/bit rates that are 883 acceptable substitutes for one another 885 A.2.1. Data Sizes 887 Simply put compared to all other common Internet data sizes, video is 888 huge. A still image often ranges from 100KB to 10MB. A video file 889 can commonly range from 100MB to 50GB. Encoding and compression 890 options permit streaming videos using bandwidth ranging from 700Kbps 891 for extremely compressed SD video, to 1.5-3.0 Mbps for SD video, to 892 2.5-6.0 Mbps for HD video, and 11-30Mbps for 4K video. 894 Still images have 4 dimensional properties that affect their data 895 size: 897 1. number of horizontal X pixels 899 2. number of vertical Y pixels 901 3. bytes per pixel 903 4. compression factor for the image encoding. 905 Video adds to this: 907 1. frames per second playback rate 909 2. visual continuity between frames (meaning users notice when 910 frames are skipped or played out of order) 912 3. discontiguous jumps between frames such as skipping forward or 913 backwards to inserting frames from other sources between 914 contiguous frames (advertisement placement) 916 Each video format roughly increases by x4 the data needs of the 917 previously resolution: (1) SD is 640x480 pixels; (2) HD is 1920x1080 918 pixels; (3) 4K is 3840x2160 pixels. 920 Video, like still images, assigns a number of pixels to store color 921 and luminance information. This currently evolving alongside 922 resolutions after being stagnant for many years. The introduction of 923 high dynamic range videos or HDR has changed the color gamut for 924 video and increased the number of bits needed to carry luminance from 925 8 to 10 and in some formats more. 927 Compression is often misunderstood by viewers. Compression does not 928 change the video resolution, SD is still 640x480 pixels, HD is still 929 1980x1080 pixels. What changes is the quality of the detail in each 930 frame, and between frames. 932 Video is in its simplest form a series of still images shown 933 sequentially over time, adding an additional attribute to manage. 935 A.2.2. Low Latency Transport 937 Viewers demand that video plays back without any stutter, skips, or 938 pauses, which translates into low latency, high reliability transport 939 of the video data. 941 A.2.3. Multiplicity of Acceptable Formats 943 One of the unique aspects of video viewing is that there can exist 944 multiple different encodings/versions of the same video, many of 945 which are acceptable substitutes for one another. This is a unique 946 aspect of video viewing and differentiates video delivery from other 947 data transports. 949 Other application data types don't have or leverage the concept of 950 semantic equivalences to the same extent as video. Even email, which 951 supports multiple encodings in a multipart MIME message, has a finite 952 number of representations of "the message", shipped as one unit, 953 whereas video often has many distinct different encodings each as 954 separate file or container of files managed as a distinct entity from 955 the others. 957 A.3. Video Transport 959 A.3.1. File vs Stream 961 There are two common ways of transporting video on the Internet: 1) 962 File based; 2) Streaming. File based transport can use any file 963 transport protocol with FTP and BitTorrent being two popular choices. 965 File based playback involves copying a file and then playing it. 966 There are schemes which permit playing portions of the file while it 967 progressively is copied, but these schemes involve moving the file 968 from A->B then playing on B. FTP and BitTorrent are examples of file 969 copy protocols. 971 Streaming playback is most similar to a traditional Cable or OTA 972 viewing of a video. The video is delivered from the streaming 973 service to the playback device in real time enabling the playback 974 device to receive, decode, and display the video data in real time. 975 Communication between the player and the source enable pausing, fast 976 forward and rewind by managing the data blocks which are sent to the 977 player device. 979 Authors' Addresses 981 Glenn Deen 982 NBCUniversal 984 Email: rgd.ietf@gmail.com 985 Leslie Daigle 986 Thinking Cat Enterprises LLC 988 Email: ldaigle@thinkingcat.com