idnits 2.17.1 

draft-mcbride-edge-data-discovery-overview-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (October 22, 2018) is 2012 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	T2TRG                                                         M. McBride
3	Internet-Draft                                               D. Kutscher
4	Intended status: Standards Track                                  Huawei
5	Expires: April 25, 2019                                      E. Schooler
6	                                                                   Intel
7	                                                           CJ. Bernardos
8	                                                                    UC3M
9	                                                        October 22, 2018

11	                    Overview of Edge Data Discovery
12	             draft-mcbride-edge-data-discovery-overview-00

14	Abstract

16	   This document describes the problem of distributed data discovery in
17	   edge computing.  Increasing numbers of IoT devices and sensors are
18	   generating a torrent of data that originates at the very edges of the
19	   network and that flows upstream, if it flows at all.  Sometimes that
20	   data must be processed or transformed (transcoded, subsampled,
21	   compressed, analyzed, annotated, combined, aggregated, etc.) on edge
22	   equipment along the way, particularly in places where multiple high
23	   bandwidth streams converge and where resources are limited.  Support
24	   for edge data analysis is critical to make local, low-latency
25	   decisions (e.g., regarding predictive maintenance, the dispatch of
26	   emergency services, identity, authorization, etc.).  In addition,
27	   (transformed) data may be cached, copied and/or stored at multiple
28	   locations in the network on route to its final destination.  Although
29	   the data might originate at the edge, for example in factories,
30	   automobiles, video cameras, wind farms, etc., as more and more
31	   distributed data is created, processed and stored, it becomes
32	   increasingly dispersed throughout the network and there needs to be a
33	   standard way to find it.  New and existing protocols will need to be
34	   identified/developed/enhanced for distributed data discovery at the
35	   network edge and beyond.

37	Status of This Memo

39	   This Internet-Draft is submitted in full conformance with the
40	   provisions of BCP 78 and BCP 79.

42	   Internet-Drafts are working documents of the Internet Engineering
43	   Task Force (IETF).  Note that other groups may also distribute
44	   working documents as Internet-Drafts.  The list of current Internet-
45	   Drafts is at https://datatracker.ietf.org/drafts/current/.

47	   Internet-Drafts are draft documents valid for a maximum of six months
48	   and may be updated, replaced, or obsoleted by other documents at any
49	   time.  It is inappropriate to use Internet-Drafts as reference
50	   material or to cite them other than as "work in progress."

52	   This Internet-Draft will expire on April 25, 2019.

54	Copyright Notice

56	   Copyright (c) 2018 IETF Trust and the persons identified as the
57	   document authors.  All rights reserved.

59	   This document is subject to BCP 78 and the IETF Trust's Legal
60	   Provisions Relating to IETF Documents
61	   (https://trustee.ietf.org/license-info) in effect on the date of
62	   publication of this document.  Please review these documents
63	   carefully, as they describe your rights and restrictions with respect
64	   to this document.  Code Components extracted from this document must
65	   include Simplified BSD License text as described in Section 4.e of
66	   the Trust Legal Provisions and are provided without warranty as
67	   described in the Simplified BSD License.

69	Table of Contents

71	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
72	     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   4
73	     1.2.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   4
74	   2.  The Edge Data Discovery Scope . . . . . . . . . . . . . . . .   4
75	     2.1.  Types of Discovery  . . . . . . . . . . . . . . . . . . .   5
76	   3.  Protocols for Discovering Resources . . . . . . . . . . . . .   6
77	   4.  Protocols for Discovering Functions . . . . . . . . . . . . .   7
78	   5.  Naming the Data . . . . . . . . . . . . . . . . . . . . . . .   8
79	   6.  Edge Data Discovery . . . . . . . . . . . . . . . . . . . . .   8
80	   7.  Use Cases of edge data discovery  . . . . . . . . . . . . . .   8
81	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
82	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .   9
83	   10. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . .   9
84	   11. Normative References  . . . . . . . . . . . . . . . . . . . .   9
85	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   9

87	1.  Introduction

89	   Edge computing is an architectural shift that migrates Cloud
90	   functionality (compute, storage, networking, control, data
91	   management, etc.) out of the back-end data center to be more
92	   proximate to the IoT data being generated at the edges of the
93	   network.  Edge computing provides local compute, storage and
94	   connectivity services, often required for latency- and bandwidth-
95	   sensitive applications.  Thus, Edge Computing plays a key role in
96	   verticals such as Energy, Manufacturing, Automotive, Video Analytics,
97	   Gaming, Healthcare, Mining, Buildings and Smart Cities.

99	   Edge computing is motivated at least in part by the sheer volume of
100	   data that is being created by IoT devices (sensors, cameras, lights,
101	   vehicles, drones, wearables, etc.) at the very network edge and that
102	   flows upstream, in a direction for which the network was not
103	   originally provisioned.  In fact, in dense IoT deployments (e.g.,
104	   many video cameras are streaming high definition video), where
105	   multiple data flows collect or converge at edge nodes, data is likely
106	   to need transformation (transcoded, subsampled, compressed, analyzed,
107	   annotated, combined, aggregated, etc.) to fit over the next hop link,
108	   or even to fit in memory or storage.  Note also that the act of
109	   performing compute on the data creates yet another new data stream!
110	   In addition, (transformed) data may be cached, copied and/or stored
111	   at multiple locations in the network on route to its final
112	   destination.  With an increasing percentage of devices connecting to
113	   the Internet being mobile, support for in-the-network caching and
114	   replication is critical for continuous data availability, not to
115	   mention efficient network and battery usage for endpoint devices.
116	   Additionally, as mobile devices' memory/storage fill up, in an edge
117	   context they may have the ability to offload their data to other
118	   proximate devices or resources, leaving a bread crumb trail of data
119	   in their wakes.  Therefore, although data might originate at edge
120	   devices, as more and more data is continuously created, processed and
121	   stored, it becomes increasingly dispersed throughout the physical
122	   world (outside of or scattered across managed local data centers),
123	   increasingly isolated in separate local edge clouds or data silos.
124	   Thus there needs to be a standard way to find it.  New and existing
125	   protocols will need to be identified/developed/enhanced for these
126	   purposes.  Being able to discover distributed data at the edge or in
127	   the middle of the network - will be an important component of Edge
128	   computing.

130	   An IETF T2T RG Edge discussion was held and a comparative study on
131	   the definition of Edge computing was presented in multiple sessions
132	   in T2T RG this last year.  An IETF BEC (beyond edge computing) effort
133	   has been evaluating potential gaps in existing edge computing
134	   architectures.  Edge Data Discovery is one potential gap that needs
135	   evaluation and a solution.

137	   And businesses, such as industrial companies, are starting to
138	   understand how valuable the data is that they've kept in silo's.
139	   Once this data is able to be aggregated on edge computing platforms,
140	   they will be able to monetize the value of the data.  But this will
141	   happen only if data can be discovered and searched among equipment in
142	   a standard way.  Discovering the data, that its most useful to a
143	   given market segment, will be extremely useful in building business
144	   revenues.  Having a mechanism to provide this granular discovery is
145	   the problem that needs solving either with existing, or new,
146	   protocols.

148	1.1.  Requirements Language

150	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
151	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
152	   document are to be interpreted as described in RFC 2119 [RFC2119].

154	1.2.  Terminology

156	   o  Edge: The device edge is the boundary between digital and physical
157	      entities in the last mile network.  Sensors, gateways, compute
158	      nodes are included.  The infrastructure edge includes equipment on
159	      the network operator side of the last mile network including cell
160	      towers, edge data centers, cable headends, etc.

162	   o  Edge Computing: distributed computation that is performed near the
163	      edge, where the nearness is determined by the system requirements.
164	      This includes high performance compute, storage and network
165	      equipment on either the device or infrastructure edge.

167	   o  Data Discovery: process of finding required data from edge
168	      databases and consolidating it into a single source, perhaps name,
169	      that can be evaluated

171	   o  NDN: Named Data Networking.  IP packets name information, content
172	      or endpoints (IP addresses) at the network layer.

174	2.  The Edge Data Discovery Scope

176	   Edge Computing data will typically be found at the device or
177	   infrastructure edges.  This is where we are focusing our efforts in
178	   defining this edge data discovery problem space.  Edge data will also
179	   be sent to the cloud as needed.  Discovering data which has be sent
180	   to the cloud is out of scope of this document.

182	                     +-------------------------------+
183	                     |   Core Data Center            |
184	                     +-------------------------------+
185	                              ***   Backbone
186	                             *   *  Network
187	                              ***
188	                     +-------------------------------+
189	                     |   Regional Data Center        |
190	                     +-------------------------------+
191	                              ***   Metropolitan
192	                             *   *  Network
193	                              ***
194	                     +-------------------------------+
195	                     | Infrastructure Edge|
196	                     +-------------------------------+
197	                              ***   Access
198	                             *   *  Network
199	                              ***
200	                     +-------------------------------+
201	                     |          |Device Edge
202	                     +-------------------------------+

204	                    Figure 1: Edge Data Discovery Scope

206	2.1.  Types of Discovery

208	   There are many aspects of discovery.

210	   Discovery of new devices added to an environment.  Discovery of their
211	   capabilities/services in client/server environments.  Discovery of
212	   these new devices automatically.  Discovering a device and then
213	   synchronizing the device inventory and configuration for edge
214	   services.  There are many existing protocols to help in this
215	   discovery: UPnP, mDNS, DNS-SD, SSDP, NFC, XMPP, W3C network service
216	   discovery, etc.

218	   Edge devices discover each other in a standard way.  We can use DHCP,
219	   SNMP, SMS, COAP, LLDP, and routing protocols such as OSPF for devices
220	   to discovery one another.

222	   Discovery of link state and traffic engineering data/services by
223	   external devices.  BGP-LS is one solution.

225	   There is discovery of aggregated data on edge compute device, which
226	   is the focus of this draft.  How can we discover aggregated data on
227	   the edge and make use of it.

229	   Besides sensor data being aggregated on the edge computing
230	   infrastructure, there will also be streaming data (from a camera),
231	   meta data (about the data or about the device that generated the data
232	   or about the context, etc), or control data regarding an event that
233	   triggered, or an executable that embodies a function, method or
234	   service, or other piece of code or algorithm.  And it could be new
235	   data that is created after (multiple) streams converge at the edge
236	   node and are processed/transformed in some manner.

238	   Discovery of functions in an SFC environment: Service function
239	   chaining (SFC) allows the instantiation of an ordered set of service
240	   functions and subsequent "steering" of traffic through them.  Service
241	   functions provide an specific treatment of received packets,
242	   therefore they need to be known so they can be used in a given
243	   service composition via SFC.  So far, how the SFs are discovered and
244	   composed has been out of the scope of discussions in IETF.  While
245	   there are some mechanisms that can be used and/or extended to provide
246	   this functionality, work needs to be done.  An example of this can be
247	   found in "I-D.bernardos- sfc-discovery".

249	   Discovery of resources in an NFV environment: virtualized resources
250	   do not need to be limited to those available in traditional data
251	   centers, where the infrastructure is stable, static, typically
252	   homogeneous and managed by a single admin entity.  Computational
253	   capabilities are becoming more and more ubiquitous, with terminal
254	   devices getting extremely powerful, as well as other types of devices
255	   that are close to the end users at the edge (e.g., vehicular onboard
256	   devices for infotainment, micro data centers deployed at the edge,
257	   etc.).  It is envisioned that these devices would be able to offer
258	   storage, computing and networking resources to nearby network
259	   infrastructure, devices and things (the fog paradigm).  These
260	   resources can be used to host functions, for example to offload/
261	   complement other resources available at traditional data centers, but
262	   also to reduce the end-to- end latency or to provide access to
263	   specialized information (e.g., context available at the edge) or
264	   hardware.  Similarly to the discovery of functions, while there are
265	   mechanisms that can be reused/extended, there is no complete solution
266	   yet defined.  An example of work in this area is I-D.bernardos-
267	   intarea-vim-discovery"

269	3.  Protocols for Discovering Resources

271	   Mainly two types of situations need to be covered:

273	   1.  A set of resources appears (e.g., by a mobile node hosting them
274	       joining a network) and they have to be discovered by an existing
275	       virtualization infrastructure.

277	   2.  A mobile device wants to discover virtualization resources
278	       available at the current location.

280	   Different alternatives of protocols can be used for this: from
281	   approaches coupled with the access technology used, to solutions over
282	   the top such as UPnP, mDNS, DNS-SD, SSDP, also including solutions
283	   embedded into IP discovery/autoconfiguration, such as Neighbor
284	   Discovery or DHCP.

286	4.  Protocols for Discovering Functions

288	   In an SFC environment deployed at the edge, the discovery protocol
289	   may need to make available the following information per SF:

291	   o  Service Function Type, identifying the category of SF provided.

293	   o  SFC-aware: Yes/No.  Indicates if the SF is SFC-aware.

295	   o  Route Distinguisher (RD): IP address indicating the location of
296	      the SF(I).

298	   o  Pricing/costs details.

300	   o  Migration capabilities of the SF: whether a given function can be
301	      moved to another provider (potentially including information about
302	      compatible providers topologically close).

304	   o  Mobility of the device hosting the SF, with e.g. the following
305	      sub- options:

307	         Level: no, low, high; or a corresponding scale (e.g., 1 to 10).

309	         Current geographical area (e.g., GPS coordinates, post code).

311	         Target moving area (e.g., GPS coordinates, post code).

313	   o  Power source of the device hosting the SF, with e.g. the following
314	      sub- options:

316	         Battery: Yes/No.  If Yes, the following sub-options could be
317	         defined:

319	         Capacity of the battery (e.g., mmWh).

321	         Charge status (e.g., %).

323	         Lifetime (e.g., minutes).

325	5.  Naming the Data

327	   Named Data Networking (NDN) is one of five research projects funded
328	   by the U.S.  National Science Foundation under its Future Internet
329	   Architecture Program.  NDN has its roots in an earlier project,
330	   Content-Centric Networking (CCN), which Van Jacobson started at Xerox
331	   PARC around the time of his Google talk, to turn his architecture
332	   vision into a running prototype (see also his CoNEXT 2009 paper and
333	   especially Jacobsons ACM Queue interview).  The motivation is the
334	   mis-match of todays Internet architecture and its usage.  Today we
335	   build, support, and use Internet applications and services on top of
336	   an extremely capable architecture not designed to support them.  What
337	   if we had an architecture designed to support them?  Specifically,
338	   todays IP packets can name only endpoints of conversations (IP
339	   addresses) at the network layer.  What if we generalize this layer to
340	   name any information (or content), not just endpoints?  We make it
341	   easier to develop, manage, secure, and use our networks.  NDN can be
342	   applied to edge data discovery to make it much easier to extract data
343	   by naming it.  If data was named we would be able to discover the
344	   appropriate data simply by its name.

346	6.  Edge Data Discovery

348	   How can we discover aggregated data on the edge and make use of it?
349	   There are proprietary implementations of collecting data from various
350	   databases and consolidating it for evaluation.  We need a standard
351	   protocol set for doing this data discovery, on the device or
352	   infrastructure edge, in order to meet the requirements of many use
353	   cases.  We will have terabytes of data on the edge and need a way to
354	   identify its existance and find the desired data.  A user requires
355	   the need to search for specific data in a data set and evaluate it
356	   using their own tools.  The tools are outside the scope of this
357	   document, but the discovery of that data is in scope.

359	7.  Use Cases of edge data discovery

361	   1.  Autonomous Vehicles

363	   Description: Autonomous vehicles rely on the processing of huge
364	   amounts of complex data in real-time for fast and accurate decisions.
365	   These vehicles will rely on high performance compute, storage and
366	   network resources to process the volumes of data they produce in a
367	   low latency way.  Various systems will need a standard way to
368	   discover the pertinent data for decision making

370	   1.  Video Surveillance
371	   Description: The majority of the video surveillance footage will
372	   remain at the edge infrastructure (not sent to the cloud data
373	   center).  This footage is coming from vehicles, factories, hotels,
374	   universities, farms, etc.Much of the video footage will not be
375	   interesting to those evaluating the data.  A mechanism, set of
376	   protocols perhaps, is needed to identify the interesting data at the
377	   edge.  The data will be in storage systems or in flight in networking
378	   equipment.

380	   1.  Elevator Networks

382	   Description: Elevators are one of many industrial applications of
383	   edge computing.  Edge equipment receives data from 100's of elevator
384	   sensors.  The data coming into the edge equipment is vibration,
385	   temperature, speed, level, video, etc.  We need the ability to
386	   identify where the data we need to evalute is located.

388	8.  IANA Considerations

390	   N/A

392	9.  Security Considerations

394	   Security considerations will be a critical component of edge data
395	   discovery particularly as intelligence is moved to the extreme edge
396	   where data is to be extracted.

398	10.  Acknowledgement

400	11.  Normative References

402	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
403	              Requirement Levels", BCP 14, RFC 2119,
404	              DOI 10.17487/RFC2119, March 1997,
405	              <https://www.rfc-editor.org/info/rfc2119>.

407	Authors' Addresses

409	   Mike McBride
410	   Huawei

412	   Email: michael.mcbride@huawei.com

414	   Dirk Kutscher
415	   Huawei

417	   Email: dirk.kutscher@huawei.com
418	   Eve Schooler
419	   Intel

421	   Email: eve.m.schooler@intel.com

423	   Carlos J. Bernardos
424	   Universidad Carlos III de Madrid
425	   Av. Universidad, 30
426	   Leganes, Madrid  28911
427	   Spain

429	   Phone: +34 91624 6236
430	   Email: cjbc@it.uc3m.es
431	   URI:   http://www.it.uc3m.es/cjbc/