idnits 2.17.1 

draft-deen-daigle-ggie-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (October 24, 2016) is 2741 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                            G. Deen
3	Internet-Draft                                              NBCUniversal
4	Intended status: Informational                                 L. Daigle
5	Expires: April 27, 2017                     Thinking Cat Enterprises LLC
6	                                                        October 24, 2016

8	             Glass to Glass Internet Ecosystem Introduction
9	                       draft-deen-daigle-ggie-02

11	Abstract

13	   This document introduces the Glass to Glass Internet Ecosystem
14	   (GGIE).  GGIE's purpose is to improve how the Internet is used create
15	   and consume video, both amateur and professional, reflecting that the
16	   line between amateur and professional video technology is
17	   increasingly blurred.  Glass to Glass refers to the entire video
18	   ecosystem, from the camera lens to the viewing screen.  As the name
19	   implies, GGIE's scope is the entire video ecosystem from capture,
20	   through the steps of editing, packaging, distributed and searching,
21	   and finally viewing.  GGIE is not a complete end to end architecture
22	   or solution, it provides foundational elements that can serve as
23	   building blocks for new Internet video innovation.

25	   This is a companion effort to the GGIE W3C Taskforce in the W3C Web
26	   and TV Interest Group.

28	   This document is being discussed on the ggie@ietf.org mailing list.

30	Status of This Memo

32	   This Internet-Draft is submitted in full conformance with the
33	   provisions of BCP 78 and BCP 79.

35	   Internet-Drafts are working documents of the Internet Engineering
36	   Task Force (IETF).  Note that other groups may also distribute
37	   working documents as Internet-Drafts.  The list of current Internet-
38	   Drafts is at http://datatracker.ietf.org/drafts/current/.

40	   Internet-Drafts are draft documents valid for a maximum of six months
41	   and may be updated, replaced, or obsoleted by other documents at any
42	   time.  It is inappropriate to use Internet-Drafts as reference
43	   material or to cite them other than as "work in progress."

45	   This Internet-Draft will expire on April 27, 2017.

47	Copyright Notice

49	   Copyright (c) 2016 IETF Trust and the persons identified as the
50	   document authors.  All rights reserved.

52	   This document is subject to BCP 78 and the IETF Trust's Legal
53	   Provisions Relating to IETF Documents
54	   (http://trustee.ietf.org/license-info) in effect on the date of
55	   publication of this document.  Please review these documents
56	   carefully, as they describe your rights and restrictions with respect
57	   to this document.  Code Components extracted from this document must
58	   include Simplified BSD License text as described in Section 4.e of
59	   the Trust Legal Provisions and are provided without warranty as
60	   described in the Simplified BSD License.

62	Table of Contents

64	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
65	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
66	   3.  Motivation: Video is filling up the pipes . . . . . . . . . .   4
67	   4.  Video is different  . . . . . . . . . . . . . . . . . . . . .   5
68	   5.  Historical Approaches to supporting Video on the Internet . .   6
69	     5.1.  Video as an application . . . . . . . . . . . . . . . . .   6
70	     5.2.  Video as a network problem  . . . . . . . . . . . . . . .   7
71	     5.3.  Video Ecosystem Encapsulation . . . . . . . . . . . . . .   7
72	   6.  Problem Statement and Solution Criteria . . . . . . . . . . .   8
73	   7.  The Glass to Glass Internet Ecosystem: GGIE . . . . . . . . .   8
74	     7.1.  Related work:  W3C GGIE Taskforce . . . . . . . . . . . .   9
75	   8.  GGIE work of relevance to the IETF  . . . . . . . . . . . . .   9
76	     8.1.  Affected IETF work areas  . . . . . . . . . . . . . . . .   9
77	     8.2.  Example use cases . . . . . . . . . . . . . . . . . . . .   9
78	     8.3.  Core GGIE elements  . . . . . . . . . . . . . . . . . . .  11
79	   9.  Conclusion and Next Steps . . . . . . . . . . . . . . . . . .  15
80	   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  15
81	   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  15
82	   12. Security Considerations . . . . . . . . . . . . . . . . . . .  15
83	     12.1.  Privacy Concerns . . . . . . . . . . . . . . . . . . . .  15
84	   13. Normative References  . . . . . . . . . . . . . . . . . . . .  16
85	   Appendix A.  Overview of the details of the video lifecycle . . .  16
86	     A.1.  Media Lifecycle . . . . . . . . . . . . . . . . . . . . .  16
87	     A.2.  Video is not like other Internet data . . . . . . . . . .  19
88	     A.3.  Video Transport . . . . . . . . . . . . . . . . . . . . .  21
89	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  21

91	1.  Introduction

93	   In terms of shear bandwidth, the Internet's largest use, without any
94	   close second competitor, is video.  This is thanks to the
95	   proliferation of Internet connected devices capable of capturing and/
96	   or watching streamed video.  As of 2015 there are reports that
97	   YouTube users upload over 500 hours of video every minute, and that
98	   during evening hours NetFlix accounts for a staggering 50+% of
99	   Internet traffic.  The number of users using the Internet for both
100	   ends of the video create-view lifecycle grows daily worldwide, and
101	   this is creating an enormous strain on the underlying Internet
102	   infrastructure at nearly every point from the core to the edge.

104	   While video is one of the most conceptually simple uses of the
105	   Internet, it is perhaps one of the most complex technically, built
106	   from standards created by a large number of organizations and groups
107	   some dating from before the modern Internet even existed.  Many
108	   critical parts of this complex ecosystem were not created with either
109	   video's particular characteristics or vast scale of popularity in
110	   mind.  This has lead to both the degradation of the viewer experience
111	   and many Internet policy issues around access to bandwidth for video
112	   and the needed infrastructure to support the continued explosion in
113	   video transport on the Internet.

115	   The pace of video growth has been faster than new bandwidth for the
116	   past many years, and all indicators are that, instead of abating, it
117	   is actually accelerating as new users, new ways of sharing video, and
118	   new types of video continue to be added.  The Cisco Visual Networking
119	   Index an excellent source of detail on this subject.

121	   The combined current high levels of bandwidth consumed by video, plus
122	   the accelerating pace of video's growth mean that to meet users'
123	   demand for video, we must do more than simply rely on adding more
124	   bandwidth.  While other traditional improvements such as more
125	   efficient codecs with better compression ratios are expected to
126	   contribute to keep video flowing on the Internet, many in the
127	   Internet video technology world have explored options to see if any
128	   new approaches could be added to the mix to help the problem.  That
129	   was the motivation behind the creation of the GGIE Taskforce within
130	   the W3C in 2014 with the charter to examine the end to end video
131	   ecosystem and identify new areas of opportunity to improve video's
132	   use of the Internet.

134	   The W3C GGIE taskforce explored ways that video uses the Internet and
135	   developed a series of use cases detailing specific scenarios ranging
136	   from video capture, the editing and production cycle, through to
137	   delivery to viewers.  Out of these use cases there emerged a
138	   recognition that there might be a new opportunity to improve Internet
139	   video by enabling edge devices, and the underlying network to more
140	   actively participate in making delivery optimization choices beyond
141	   the simple ways the do currently.

143	   The GGIE approach is to apply and evolve existing technologies to the
144	   task of optimizing Internet video transport to permit applications,
145	   video devices, and the network to more actively participate in making
146	   smart access and transport decisions.  This approach recognizes that
147	   there are already extensively-deployed video infrastructure elements
148	   that need to continue to work and be part of the optimized video
149	   ecosystem.  These deployed devices, applications, players, and tools
150	   are responsible for the already high levels of video bandwidth
151	   consumption, and to only address new devices would not be solving the
152	   larger, most important problem.  This is why GGIE is an evolution of
153	   how video uses the Internet, and not a revolution involving wholesale
154	   replacement of existing architecture or protocols.

156	   GGIE is not a complete solution to the video problem.  It provides
157	   foundational building blocks that are intended to be used by
158	   innovators in their work to create new optimizations, and novel
159	   techniques to help address the video problem in the long term.

161	   GGIE initially proposes a simple framework of three components that
162	   will permit improved playback device identification of viewing
163	   sources and enable network level awareness of video transport and new
164	   cache selection chocies.  GGIE proposes: Using existing content
165	   identifiers as a means to identify a work, or title; Data level
166	   identifiers to identify the encoded video data for a particular
167	   manifestation of the title; A mapping service that permits bi-
168	   directional resolution of these identifiers.

170	   This document outlines the basic proposal for these three base GGIE
171	   components and introduces the overall GGIE approach to evolving the
172	   current video ecosystem by introducing basic standardized building
173	   blocks for innovators to build upon the Glass to Glass Internet
174	   Ecosystem.

176	2.  Terminology

178	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
179	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
180	   document are to be interpreted as described in [RFC2119].

182	3.  Motivation: Video is filling up the pipes

184	   The growth in video bandwidth need is exceeding the growth in the
185	   bandwidth provisioning.  This trend is in fact accelerating, meaning
186	   the growth rate of video is growing faster than the growth rate of
187	   provisioning.  Traditional techniques of caching, higher efficiency
188	   codecs, etc, are all being used to help address the probiem and have
189	   helped the Internet to continue to support the growth of video thus
190	   far.

192	   Video has been the top use of Internet bandwidth for several years
193	   and is larger than the bandwidth used by all other applications
194	   combined.  This trend is unlikely to ease or reverse itself as users
195	   of the Internet continue to make Internet transported video one of
196	   their top uses of the Internet, either for uploading and sharing
197	   video they creator, or as a primary sources for viewing video to a
198	   wide variety of viewing devices: computers, tablets, phones,
199	   connected televisions, game consoles, and AV receivers.

201	   Adding to user demand, video itself is continually experiencing
202	   innovation introducing ever higher resolutions (SD, HD, 4K, 8K...),
203	   higher video quality, new distribution services (live one to many
204	   streaming), and new user uses.  The Cisco Visual Networking Index
205	   projects that by 2019 there will be nearly a million minutes of video
206	   per second transported by the Internet, a making up 80-90 percent of
207	   all IP traffic.

209	   The movitation behind GGIE is to help find new methods that can be
210	   brought to bear, in addition to all the exiting ones, to help manage
211	   the explosion in Internet video.

213	4.  Video is different

215	   Video is different than other uses of the network due to its combimed
216	   high bandwidth demands and high sensitivity to latency and dropped
217	   packets.  Streaming of basic high-definition 1080p requires bandwidth
218	   in the low Mbps translating into Gigabytes for each hour of video,
219	   all transported with consistent low latency and very little packet
220	   loss in order to deliver a suitable watching experience the viewer.
221	   This differentiates video from other Internet applications as some
222	   have low latency and packet loss requirements but don't need high
223	   bandwidth, while others may demand high bandwidth, they will tolerate
224	   high latency and dropped packets.  An email user can tolerate an
225	   extra moment to retransmit dropped packets, and a web page user can
226	   tolerate a slow DNS lookup, but a video viewer sees latency and
227	   dropped packets as jittery playback and low bandwidth as a
228	   fundamental barrier to streaming at all.  From the user's perspective
229	   the network has faield to meet their need.  (Audio has similar
230	   challenges in terms of intolerance of delay and jitter, but the data
231	   sizes are significantly smaller).

233	   Video data sizes continue to grow at roughly 4x per format iteration
234	   as cameras and playback devices are able to capture and display
235	   higher quality images.  Early digital video was often captured at
236	   either 320x240 pixel resolution or 640x480 standard definition
237	   resolution.  High definition or HD video at 1920x1080 became possible
238	   on some parts of the Internet after 2011, although even in 2016 it
239	   remains unavailable or unreliable through many connections such as
240	   DSL and many mobile networks.  Camera and player technologies are
241	   currently expanding again to permit 4K or 3840x2160 pixel resolution
242	   reflecting a 4x data increase over HD.

244	   Streaming is very demanding, requiring consistent frame to frame
245	   playback in consistent constant time.  Advanced features such as
246	   pause, fast forward, rewind, slow motion, and fine scrubbing are
247	   considered by users as standard features in players that the network
248	   must support and serve to further the challenge facing the Internet.

250	   New video abilities such as live streaming by users (both one to one
251	   and one to many) bring what has traditionally been done by
252	   professional broadcasters with dedicated broadcast infrastructure
253	   into the realm of every day users with connected smartphones using
254	   the Internet as a real-time global broadcast infrastructure.

256	5.  Historical Approaches to supporting Video on the Internet

258	5.1.  Video as an application

260	   Internet video engineering began by adapting preexisting standards
261	   used for over the air broadcast (OTA) and physical media.  Video
262	   encodings, such as AVI and MPEG2, originally designed for playback
263	   from local storage connected to the player where added to the data
264	   types carried by existing protocols like HTTP, and new protocols such
265	   as RTSP and HLS.  Early use of the Internet for video was a copy-and-
266	   play model replacing the use of OTA broadcast and physical media to
267	   copy video between systems.

269	   As Internet bandwidth became sufficient to allow delivery of video
270	   data at the same rate it was being decoded, it became possible to
271	   stream video originally at very low resolutions such as 160x120
272	   pixels (19.2 kilopixels), eventually permitting standard definition
273	   (SD) 640x480 pixels (0.3 megapixels), and later high definition of
274	   1920x1080 pixels (2 megapixels).  This trend continues with some
275	   providers beginning to offer 4K or 3840x2160 pixels (8.3 megapixels)
276	   requiring very reliable and generous Internet bandwidth end to end
277	   connection between the viewer and source.

279	   Unlike the Web, email, and network file sharing which have been
280	   engineered and standardized in Internet focused organizations such as
281	   the W3C and IETF, video is dependent on standards developed by a very
282	   large number of groups, companies, and organizations which include
283	   the IETF, W3C but also MPEG, SMPTE, CTA, IEEE, ANSI, ISO, networking
284	   and technology companies, many others.  In contrast to the extensive
285	   end to end expert knowledge and engineering done to create the Web
286	   and email, Internet video has largely been an evolved cobbling and
287	   adaption exercise done by engineers with their focus on a few, or
288	   one, particular aspect or problem at a time, and little interaction
289	   between other parts of the Internet video ecosystem.  While it is
290	   very much possible to deliver video over the Internet, this
291	   uncoordinated cobbling has resulted in many areas of inefficiency
292	   where engineering done from an end to end perspective could provide
293	   the opportunity to vastly improve how video uses the Internet, which
294	   offers the hope of improving the quality of video and increasing the
295	   amount of video which can be delivered.

297	5.2.  Video as a network problem

299	   Network, video, and application engineers have constructed elaborate
300	   solutions for dealing with bandwidth and processing limitations,
301	   network congestion, lossy transport protocols, and the ever growing
302	   size of video data.  These solutions commonly fall into one of
303	   several solution types:

305	   1.  Reducing data sizes through resolution changes, compression, and
306	       more efficient encodings

308	   2.  Downloading before playing instead of real-time streaming

310	   3.  Positioning the data close to the viewer via caches, typically on
311	       the network edge

313	   4.  Fetching of video data at a rate faster than playback

315	   5.  Transport protocols that attempt to deliver video data such that
316	       the data arrives as if it were done on a congestion free/lossless
317	       network

319	   6.  Dynamic reselection of sources and transport routes on either a
320	       real-time or frequent intervals, 10-15 seconds, using player
321	       feedback mechanisms or network telemetry

323	5.3.  Video Ecosystem Encapsulation

325	   The current delivery ecosystem for video has been primarily developed
326	   at the higher application layers of the stack.  While there has been
327	   some video work done at lower levels such as general-purpose
328	   transport improvements, caching protocols in CDNi, various
329	   multicasting approaches, and other efforts, the majority of video-
330	   specific work has previously been done by groups such as ISO's Moving
331	   Pictures Expert Group (MPEG) which have focused on codecs and codec
332	   transport optimized for use on the Internet.  These efforts have made
333	   video possible on the Internet, but they have done so largely while
334	   treating the underlying network as a basic transporter of data.  This
335	   has resulted in little information being exposed to the network,
336	   information that could be used to optimize delivery of the video, and
337	   in an architecture that pushes more and more of the intelligence into
338	   an ever more complex and isolated core.

340	   The current video model benefits from a significant amount of
341	   operational, feature, and protocol encapsulation that has come about
342	   due to different groups working independently on the components that
343	   make it up.  Like any system in which distinct pieces are well
344	   encapsulated from one another, this means it is possible to engage in
345	   improvements at the networking layer without the need to coordinate
346	   with higher levels of the video architecture.

348	6.  Problem Statement and Solution Criteria

350	   At its most basic the problem to be solved for video delivery is how
351	   to simultaneous maximize all of the following conditions: The number
352	   of viewing devices simultaneously supported by the network; The
353	   quality of video as measured by bit-rate and resolution; The number
354	   of distinct unique streams that can be delivered.

356	   Solution Constraints

358	   1.  Bandwidth growth alone is not a solution

360	   2.  Codec efficiency improvements alone are not a solution

362	   3.  Existing devices, infrastructure, video delivery techniques must
363	       as much as possible continue to be supported and benefit from new
364	       solutions.

366	7.  The Glass to Glass Internet Ecosystem: GGIE

368	   GGIE is an effort to improve video's use of the Internet by examining
369	   the end to end video ecosystem from the glass lens of the camera
370	   through to the glass the screen, and to identify areas of
371	   simplifications, standardization, and reengineering to make better
372	   use of bandwidth enabling smarter network use by video creators,
373	   distributors, and viewers.  GGIE is focused on how video uses the
374	   Internet, and not on how it is encoded or compressed.  Likewise GGIE
375	   does not deal with content protection.  GGIE's scope however does
376	   include creator and viewer privacy, content identification and
377	   recognition as a means to enable smarter network usage, edge caching,
378	   and discoverability.

380	   GGIE benefits from the encapsulation of the video ecosystem elements
381	   enabling it to introduce evolutional features to elements without
382	   disrupting other distinct encapsulated parts.

384	   GGIE is intended to work with a wide variety of video encoding
385	   codecs, and video distribution and transport protocols.  While
386	   examples using MPEG-DASH are used due to is pervasive use, GGIE is
387	   not limited to MPEG-DASH or any other video distribution system or
388	   codec.

390	   Beyond improving the simple experience of a viewer using the Internet
391	   to watch linear video, it is hoped that a set of improved Internet
392	   video infrastructure standards will provide a foundation that permits
393	   innovators to create the next generation of Internet video content
394	   (such as multisource personalized composite experiences, interactive
395	   stories, and live personal broadcasting, to name a few).

397	   Due to the very diverse and large deployment of existing video
398	   playback devices and infrastructure, it is viewed as essential that
399	   any evolved ecosystem continues to work with the majority of the
400	   legacy deployment without the need for updates or changes to the
401	   existing ecosystem.

403	7.1.  Related work: W3C GGIE Taskforce

405	   A companion effort ran through 2015 in the W3C Web and TV Interest
406	   Group's GGIE Taskforce.  The W3C GGIE group developed a series of
407	   use-cases on discovery, search, delivery, identity, and metadata
408	   which can be found at https://www.w3.org/2011/webtv/wiki/GGIE_TF

410	8.  GGIE work of relevance to the IETF

412	   This section assumes a working familiarity with video creation and
413	   consumption "life cycle".  For reference, an overview has been
414	   provided in the Appendix.

416	8.1.  Affected IETF work areas

418	   It is expected that significant improvement is possible in the video
419	   transport ecosystem by modest evolution and adaption of existing
420	   standards for addressing, transporting, and routing of video data
421	   flows between sources and display.

423	8.2.  Example use cases

425	   The following example use case help illustrate the use of the GGIE
426	   core elements

428	8.2.1.  Alternate Source Discovery

430	   Description: A video player is streaming a movie from a CDN cache in
431	   the core of the network.  This use case illustrates the use of a
432	   media identifier to query a media address resolution service to
433	   locate additional alternate sources that offer the same movie.

435	   1.  The video player user selects a movie to watch from a list using
436	       the player application UI.

438	   2.  The video player application has the media identifier of the
439	       movie in the metadata description of the movie.  This identifier
440	       is passed to the playback device when the movie selected.

442	   3.  The playback device send a search query to the Media Address
443	       Resolution Service (MARS) which includes the media identifier,
444	       and additional query parameters use to filter the results
445	       returned.

447	   4.  The MARS server searches its database and returns all the Media
448	       Encoding Networks matching the media identifier and filters the
449	       results using the additional parameters submitted in the query.
450	       Each Media Encoding Network represents a different encoding of
451	       the video.

453	   5.  The player then examines the returned list of media encoding
454	       networks and selects, from its perspecitve, the optimal source
455	       for the title.

457	   6.  The player then directs its streaming requests to the selected
458	       Media Encoding Network addresses to obtain the video data for the
459	       movie.

461	   7.  The video data is decoded and displayed on the screen

463	8.2.2.  Alternate Format Discovery

465	   Description: A video player is streaming a movie, and wants to send
466	   the audio to another device for playback.  However, the current video
467	   data being streamed does not contain any audio that matches the
468	   codecs the audio device can play.  The audio device uses the core
469	   GGIE services to locate an alternate encoding of the movie that
470	   contains audio it can decode.

472	   1.  The user directs the video player to send the audio portion of
473	       the playing video to an external audio device.

475	   2.  The video player application passes the media idenfitier for the
476	       video to the audio device as well as the media encoding network
477	       address the video player is using.

479	   3.  The audio device begins streaming from the media encoding network
480	       is was given, but discovers the data does not include audio that
481	       is able to decode.

483	   4.  The audio device sends a search query to the Media Address
484	       Resolution Service (MARS) which includes the media identifier,
485	       and additional query parameters including the list of audio
486	       codecs and language choice it is able to decode.

488	   5.  The MARS server searches its database and returns all the Media
489	       Encoding Networks matching the media identifier and filters the
490	       results to only those matching the language and audio codec
491	       supplied in the search.

493	   6.  The audio player examines the returned list of media encoding
494	       networks, selects a media encoding network and begins streaming
495	       data from it.

497	   7.  The external audio player decodes the returned movie data and
498	       plays it for the user.

500	8.3.  Core GGIE elements

502	   GGIE proposes three initial fundamental pieces:

504	   1.  Media Identifiers which identify the video at the title, or work
505	       level;

507	   2.  Media Encoded Networks which are subnets used to reference the
508	       encoded video data;

510	   3.  Media Address Resolution Service which maps Media Identifiers for
511	       a title to the Media Encoded Networks containing the encoded
512	       video versions of the title.

514	   These three foundational elements help by exposing information that
515	   can be used in selection in a way that is independent of the video
516	   encoding and video data storage choice.  It also enables more
517	   sophisticated video use cases beyond the basic single device playing
518	   a video stream from an origin server over a flow controlled protocol.

520	8.3.1.  Media Identifiers

522	   A Media Identifier is a URI that carries a content identifier system
523	   declaration, and a content identifier from the system that refers
524	   unambiguously to a work, or title.  This can be any content
525	   identification system, GGIE does not specify the system used.

527	   For example, a media identifier for a title identified by an EIDR
528	   value would include a declaration that the identifier is from EIDR,
529	   and would additionally contain the EIDR value.

531	   At the application level, such as UI program guide applications,
532	   search engines, and metadata databases, it is the identification of
533	   the work or identity of the video that is typically of interest and
534	   not the encoding, bit-rate, or the location of CDN caches etc.  For
535	   example, a UI would indicate that "the Minions movie" as opposed to
536	   "a 15 megabit per second, HEVC encode with high dynamic range and
537	   Dolby encoded 7.1 English audio of the Minions movie".  Those
538	   additional technical details are important when choosing a particular
539	   encoded manifestation of the movie for delivery, decode, and
540	   playback, but they are not generally needed as information to be
541	   presented to the user or used to make viewing choices.  Such
542	   technical information is used after the user has chosen the title to
543	   watch, but is used by the playback device not the user in selecting
544	   the video.  Media Identifiers in GGIE contain only title information,
545	   and not encoding information.

547	   There are many media identifiers in use for both personal and
548	   professional content, with new ones being introduced seemingly
549	   weekly.  To try to create a single identifier to either harmonize or
550	   replace the others, repeatedly been proven in practice to be an
551	   impossible task.  Recognizing this, the GGIE instead proposes to
552	   standardize a URI which would contain at least two fields: 1) A
553	   scheme identifier; 2) An unambiguous title identifier (note: this is
554	   unambiguous only within domain of the identified scheme).

556	   For professional content, titles are increasingly identified with a
557	   scheme called EIDR that can identify both master versions of works,
558	   and edit level versions.  Likewise, advertisments use a scheme called
559	   AD-ID.

561	8.3.2.  Media Address Resolution Service (MARS)

563	   The media address resolution service (MARS) provides bidirectional
564	   mapping of Media Identifiers to Media Encoding Networks.  It is
565	   queryable using a query protocol which returns any results matching
566	   the terms of the query parameters.

568	   A Media Identifier alone isn't sufficient to connect a device to a
569	   video data source.  The media identifier distinguishes the work, but
570	   not the technical details of an instance of the work such as codec,
571	   bit-rate, resolution, high dynamic range video, audio encoding, nor
572	   does it include information about available streaming sources etc.
573	   The Media Address Resolution Service (MARS) provides this
574	   association.  It can be queried with the Media Identifier, and
575	   optional filtering parameters, and will return Media Encoding Network
576	   addresses for instances of matching encodings of the work.

578	   This translation is used commonly in video streaming services today.
579	   The link provided in the program guide UI will include a unique
580	   identifier for the work which is then mapped by the streaming service
581	   backend into a URI containing a network identifier and other info
582	   which point to a caching server and the media data files in the
583	   cache.  MARS generalizes this and make it available via query over
584	   the network.

586	8.3.3.  Media Encoding Networks (MEN)

588	   Media Encoding Encoding Networks are arrangements of encoded video
589	   data that are assigned addresses under a shared prefix and subnet
590	   following a scheme appropriate for the encoding used by the video
591	   data.  Each Media Encoding Network instance represents a distinct
592	   instance of a set of associated encodings for a work.  Different
593	   Media Encoded Network address assignment schemes would be defined
594	   under GGIE to handle different encode data such as MPEG-DASH and HLS.

596	   For example, a single MEN instance would hold each of the different
597	   variable bit-rate encodes for a single encoding of a video If a new
598	   encoding instance of the video was prepared, it would have separate
599	   and distinct MEN assigned to it.

601	8.3.3.1.  Example: Using Media Encoding Networks with MPEG-DASH

603	   A very basic form a video delivery uses persistent connection from a
604	   player to a video file source which then streams the video by
605	   transmitting the video file data, byte by byte in sequence, from the
606	   first byte of the file until the last.  This trivial approach
607	   requires the device to know the server IP address and port number to
608	   connect to.  Essentially this involved simply transporting the file
609	   from the source to the playback device in byte order.

611	   In practice simple file streaming is not used beyond local device to
612	   device playing in home networks as it doesn't permit dynamic bit rate
613	   selection, source or session fail over, or trick play (pause, skip
614	   forward, skip backward) etc.  Instead manifest files contain lists of
615	   available servers holding MPEG-DASH encodings of the larger video
616	   file divided into fragments containing short portions (e.g. 2-15
617	   seconds) of the video called chunks by MPEG-DASH.  (GGIE generalizes
618	   the MPEG-DASH chunk term into the more general shards).  Each shard
619	   is a distinct file typically named to reflects the video encode it
620	   belongs to, and it's sequence position.

622	   For example the shards for MY-VIDEO might be names MY-VIDEO-001, MY-
623	   VIDEO-002, ... MY-VIDEO-nnn.  The player then requests the shards in
624	   the order it wants them over a data transport protocol such as http,
625	   with the translation of the actual data sent in response to requests
626	   for the named shards being handled by the data server.

628	   So under MPEG-DASH the player is sent a manifest file containing the
629	   address of the data server and the shard name to request.  The player
630	   then iterates over the available shards in the order desired by the
631	   user.  The manifest then contains URI's with the SERVER-ADDRESS and
632	   the CHUNK name.  This file can be sent once per video play, or more
633	   commonly is sent at an interval of ~15 seconds to permit the sending
634	   CDN to customize for each player, and to respond quickly to changes
635	   in the network delivery performance and availability.

637	   Each shard request by the device involves a network level server IP
638	   address and port number, and an application level shard name.  The
639	   network is thus able to manage the routing of request to the server,
640	   and the routing of the response, but it lacks the information needed
641	   to do anything else to help optimize the video data transport.

643	   GGIE proposes using Media Encoding Networks an evolution of this that
644	   has the benefit of being backward compatible with manifest files,
645	   while enabling the transport network and video ecosystem to have more
646	   information to the network about the video transport flowing over it.

648	   Using Media Encoding Networks for MPEG-DASH will be described in
649	   another Internet-Draft, but the basic proposal is to assign the
650	   shards into a sequence of IP addresses organized to reflect the same
651	   ordering association that the chunk names followed in the MPEG-DASH
652	   scheme.  These shard addresses form a Media Encoding Network, and
653	   they expose to the network layer knowledge of the specific video data
654	   being transported between requesting device and the file server
655	   holding the data.

657	   This in practice means that Media Encoding Network addresses refer to
658	   the shard and not the server holding the shard.  This then permits
659	   the network to be involved in the routing of the request for the
660	   shard, as opposed to the CDN preparing the manifest file.  Among
661	   other benefits, this permits the network to provide path failover
662	   functionality beyond the CDN manifest.

664	   This enables the network to be involved in shard source selection.
665	   Consider the use case wherein the network becomes aware of a local
666	   cache that holds the requested shard, and is closer to the device
667	   than another cache deeper in the network.  The network can direct the
668	   request to the local cache and save the transit cost and bandwidth of
669	   sending the request and response exchange with the deeper cache.
670	   This can reduce network congestion as well as deliver faster
671	   transport for the shard to the playback device.

673	8.3.4.  Media Encoding Network Gateways

675	   In this new approach, the server providing the shard data is possibly
676	   better viewed as acting as a gateway to the shard addresses versus
677	   being just a file server.  In practical terms, existing CDN caches
678	   can perform this role by mapping the requested shard address to the
679	   on disk file containing the shard.  However, new CDN caches can be
680	   developed work directly with the Media Encoding Network scheme, and
681	   can act as smart caches proactively provisioning data within the
682	   Media Encoding Network address space.

684	9.  Conclusion and Next Steps

686	   GGIE seeks to help address this problem by establish standards based
687	   foundational building blocks that innovators can build upon creating
688	   smarter delivery and transport architectures instead of relying on
689	   raw bandwidth growth to satisfy video's growth.

691	   Next steps will include describing the working prototypes of the GGIE
692	   core elements and more exended use cases addressed by GGIE many of
693	   which were defined in the W3C GGIE Taskforce.

695	10.  Acknowledgements

697	   Contributions to this document came from Bill Rose, Gaurav Naik, John
698	   Brzozowski.

700	11.  IANA Considerations

702	   None (yet).

704	12.  Security Considerations

706	12.1.  Privacy Concerns

708	   The assignment of persistent IPv6 Prefixes to MEN permits the video
709	   being streamed to be identified at the network level by observing the
710	   destrination addreses sent from the player to the media gateway.  In
711	   situations where it is desired by the user to prevent this level of
712	   observation is necessary to obscure the true MEN prefix of the video
713	   being streamed.

715	12.1.1.  Privacy via VPN

717	   One remediation is the use of a VPN that will encapsulate and hide
718	   the traffic between the player and the streaming cache, or at least
719	   between the trusted network the player resides on and the streaming
720	   cache network.  This will make identification of the actual video
721	   title from the open Internet during transit.

723	12.1.2.  Session Prefix Renumbering

725	   Another technique is to have the player and streaming cache remap the
726	   IPv6 prefix for the streaming session to a new prefix.  Under such a
727	   renumbering the cache will advertise to the routing layer and respond
728	   to requests sent from the player to the session prefix just as it
729	   would to the original video MEN prefix.

731	13.  Normative References

733	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
734	              Requirement Levels", BCP 14, RFC 2119,
735	              DOI 10.17487/RFC2119, March 1997,
736	              <http://www.rfc-editor.org/info/rfc2119>.

738	Appendix A.  Overview of the details of the video lifecycle

740	   This section outlines the details of the video lifecycle -- from
741	   creation to consumption -- including the key handholds for building
742	   applications and services around this complex data.  The section also
743	   provides more detail about the scope and requirements of video (scale
744	   of data, real-time requirements).

746	   Note: this document only deals with streaming video as used by
747	   movies, TV shows, news broadcasts, sports events, music concert
748	   broadcasts, product videos, personal videos, etc.  It does not deal
749	   with video conferencing or WebRTC style video transport.

751	A.1.  Media Lifecycle

753	   The complex workflow of creating media and consuming it is
754	   decomposable into a series of distinct common phases.

756	A.1.1.  Capture

758	   The capture phase involves the original recording of the elements
759	   which will be edited together to make the final work.  Captured media
760	   elements can be static images, images with audio, audio only, video
761	   only, or video with audio.  In sophisticated capture scenarios more
762	   than one device maybe simultaneously recording.

764	A.1.1.1.  Capture Metadata

766	   The creation of metadata for the element, and for the final video
767	   begins at capture.  Typical basic capture metadata includes Camera
768	   ID, exposure, encoder, capture time, and capture format.  Some
769	   systems record GPS location data, assigned asset ids, assigned camera
770	   name, camera spatial location and orientation.

772	A.1.2.  Store

774	   The storage phase involves the transport and storage of captured
775	   elements data.  During the capture phase, an element is typically
776	   captured into memory in the capture device and is then stored onto
777	   persistent storage such as disc, SD or memory card.  Storage can
778	   involve network transport from the recording device to an external
779	   storage system using either storage over IP protocols such as iSCSI,
780	   a data transport such as FTP, or encapsulated data transport over a
781	   protocol such as HTTP.

783	   Storage systems can range from basic disk block storage, to
784	   sophisticated media asset libraries

786	A.1.2.1.  Storage Metadata

788	   Storage systems add to the metadata associated with media elements.
789	   For basic block storage, a file name and file size is typical, as are
790	   a hierarchical grouping, creation date, and last-access date.  For
791	   library system an identifier unique to the library is typical, as
792	   well as grouping by one or more attributes, a time stamp recording
793	   the addition to the library and a last access time.

795	A.1.3.  Edit

797	   Editing is the phase where one or more elements are combined and
798	   modified to create the final video work.  In the case of live
799	   streaming, the edit phase maybe bypassed.

801	A.1.4.  Package

803	   Packaging is the phase in which the work is encoded in one or more
804	   video and audio codecs.  These maybe produce multiple data files, or
805	   they may be combined into a single file container.  Typically,
806	   creation or registration of a unique work identifier, for example an
807	   Entertainment Identifier from EIDR, is assigned in the packaging
808	   phase.

810	A.1.4.1.  Package Metadata

812	A.1.5.  Distribute

814	   The distribute phase is publishing or sharing the packaged work to
815	   viewers.  Often it involved uploading to a site such as YouTube, or
816	   Facebook for social media, or sending the packaged media to streaming
817	   sites such as Hulu.

819	   It is common for the distribution site to repackage the video often
820	   transcoding it to codecs and bitrates chosen by the distributor as
821	   more efficient for their needs.  Distribution of content expected to
822	   be widely viewed often includes prepositioning of the content on a
823	   CDN (Content Distribution Network).

825	   Distribution involves delivery of the video data to the viewer.

827	A.1.5.1.  Distribution Metadata

829	   Distribution often adds or changes considerable amounts of metadata.
830	   The distributor typically assigns a Content Identifier to the work,
831	   that is unique to the distributor and their content management system
832	   (CMS).  Additional actions by the distributor such as repacking and
833	   transcoding to new codecs or bitrates can require significant changes
834	   to the media metadata.

836	   A secondary use of distribution metadata is enabling easy discovery
837	   of the content either through a library catalog, EPG (electronic
838	   program guide), or search engine.  This phase often includes
839	   significant new metadata generation involving tagging the work by
840	   genre (sci-fi, drama, comedy), sub-genre (space opera, horror,
841	   fantasy), actors, director, release date, similar works, rating level
842	   (PG, PG-13), language level, etc.

844	A.1.6.  Discovery

846	   The discovery phase is the precursor to viewing the work.  It is
847	   where the viewer locates the work either through a library catalog, a
848	   playlist, an EPG, or a search.  The discover phase connects
849	   interested viewers with distribution sources.

851	A.1.6.1.  Discovery Metadata

853	   It is typical for discovery systems to parse media metadata to use
854	   the information as part of the discovery process.  Discovery systems
855	   may parse the content to extract imagery and audio as additional new
856	   metadata for the work to ease the viewers navigation of the discovery
857	   process perhaps as UI elements.  The system may import new externally
858	   generated metadata about the work and associate it in its search
859	   system, such as viewer reviews, metadata cross reference indices.

861	A.1.7.  Viewing

863	   The viewing phase encompasses the consumption of the work from the
864	   distributor.  For Internet delivered video it is typical for delivery
865	   to involve a CDN to perform the actual delivery.

867	A.2.  Video is not like other Internet data

869	   Video is distinctly different from other Internet data.  There are
870	   many characteristics that contribute to video's unique Internet
871	   needs.  The most significant characteristics are:

873	   1.  large size of video data (Gigabytes per hour of video)

875	   2.  high bandwith demands (Mbps to Gbps)

877	   3.  low latency demands of streamed video

879	   4.  responsiveness to trick play requests by the user (stop, fast
880	       forward, fast reverse, jump ahead, jump back)

882	   5.  multiplicity of formats and encodings/bit rates that are
883	       acceptable substitutes for one another

885	A.2.1.  Data Sizes

887	   Simply put compared to all other common Internet data sizes, video is
888	   huge.  A still image often ranges from 100KB to 10MB.  A video file
889	   can commonly range from 100MB to 50GB.  Encoding and compression
890	   options permit streaming videos using bandwidth ranging from 700Kbps
891	   for extremely compressed SD video, to 1.5-3.0 Mbps for SD video, to
892	   2.5-6.0 Mbps for HD video, and 11-30Mbps for 4K video.

894	   Still images have 4 dimensional properties that affect their data
895	   size:

897	   1.  number of horizontal X pixels

899	   2.  number of vertical Y pixels

901	   3.  bytes per pixel

903	   4.  compression factor for the image encoding.

905	   Video adds to this:

907	   1.  frames per second playback rate

909	   2.  visual continuity between frames (meaning users notice when
910	       frames are skipped or played out of order)

912	   3.  discontiguous jumps between frames such as skipping forward or
913	       backwards to inserting frames from other sources between
914	       contiguous frames (advertisement placement)

916	   Each video format roughly increases by x4 the data needs of the
917	   previously resolution: (1) SD is 640x480 pixels; (2) HD is 1920x1080
918	   pixels; (3) 4K is 3840x2160 pixels.

920	   Video, like still images, assigns a number of pixels to store color
921	   and luminance information.  This currently evolving alongside
922	   resolutions after being stagnant for many years.  The introduction of
923	   high dynamic range videos or HDR has changed the color gamut for
924	   video and increased the number of bits needed to carry luminance from
925	   8 to 10 and in some formats more.

927	   Compression is often misunderstood by viewers.  Compression does not
928	   change the video resolution, SD is still 640x480 pixels, HD is still
929	   1980x1080 pixels.  What changes is the quality of the detail in each
930	   frame, and between frames.

932	   Video is in its simplest form a series of still images shown
933	   sequentially over time, adding an additional attribute to manage.

935	A.2.2.  Low Latency Transport

937	   Viewers demand that video plays back without any stutter, skips, or
938	   pauses, which translates into low latency, high reliability transport
939	   of the video data.

941	A.2.3.  Multiplicity of Acceptable Formats

943	   One of the unique aspects of video viewing is that there can exist
944	   multiple different encodings/versions of the same video, many of
945	   which are acceptable substitutes for one another.  This is a unique
946	   aspect of video viewing and differentiates video delivery from other
947	   data transports.

949	   Other application data types don't have or leverage the concept of
950	   semantic equivalences to the same extent as video.  Even email, which
951	   supports multiple encodings in a multipart MIME message, has a finite
952	   number of representations of "the message", shipped as one unit,
953	   whereas video often has many distinct different encodings each as
954	   separate file or container of files managed as a distinct entity from
955	   the others.

957	A.3.  Video Transport

959	A.3.1.  File vs Stream

961	   There are two common ways of transporting video on the Internet: 1)
962	   File based; 2) Streaming.  File based transport can use any file
963	   transport protocol with FTP and BitTorrent being two popular choices.

965	   File based playback involves copying a file and then playing it.
966	   There are schemes which permit playing portions of the file while it
967	   progressively is copied, but these schemes involve moving the file
968	   from A->B then playing on B.  FTP and BitTorrent are examples of file
969	   copy protocols.

971	   Streaming playback is most similar to a traditional Cable or OTA
972	   viewing of a video.  The video is delivered from the streaming
973	   service to the playback device in real time enabling the playback
974	   device to receive, decode, and display the video data in real time.
975	   Communication between the player and the source enable pausing, fast
976	   forward and rewind by managing the data blocks which are sent to the
977	   player device.

979	Authors' Addresses

981	   Glenn Deen
982	   NBCUniversal

984	   Email: rgd.ietf@gmail.com
985	   Leslie Daigle
986	   Thinking Cat Enterprises LLC

988	   Email: ldaigle@thinkingcat.com