idnits 2.17.1 draft-mcbride-edge-data-discovery-overview-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (October 22, 2018) is 2012 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 T2TRG M. McBride 3 Internet-Draft D. Kutscher 4 Intended status: Standards Track Huawei 5 Expires: April 25, 2019 E. Schooler 6 Intel 7 CJ. Bernardos 8 UC3M 9 October 22, 2018 11 Overview of Edge Data Discovery 12 draft-mcbride-edge-data-discovery-overview-00 14 Abstract 16 This document describes the problem of distributed data discovery in 17 edge computing. Increasing numbers of IoT devices and sensors are 18 generating a torrent of data that originates at the very edges of the 19 network and that flows upstream, if it flows at all. Sometimes that 20 data must be processed or transformed (transcoded, subsampled, 21 compressed, analyzed, annotated, combined, aggregated, etc.) on edge 22 equipment along the way, particularly in places where multiple high 23 bandwidth streams converge and where resources are limited. Support 24 for edge data analysis is critical to make local, low-latency 25 decisions (e.g., regarding predictive maintenance, the dispatch of 26 emergency services, identity, authorization, etc.). In addition, 27 (transformed) data may be cached, copied and/or stored at multiple 28 locations in the network on route to its final destination. Although 29 the data might originate at the edge, for example in factories, 30 automobiles, video cameras, wind farms, etc., as more and more 31 distributed data is created, processed and stored, it becomes 32 increasingly dispersed throughout the network and there needs to be a 33 standard way to find it. New and existing protocols will need to be 34 identified/developed/enhanced for distributed data discovery at the 35 network edge and beyond. 37 Status of This Memo 39 This Internet-Draft is submitted in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF). Note that other groups may also distribute 44 working documents as Internet-Drafts. The list of current Internet- 45 Drafts is at https://datatracker.ietf.org/drafts/current/. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 This Internet-Draft will expire on April 25, 2019. 54 Copyright Notice 56 Copyright (c) 2018 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (https://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 72 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 73 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 74 2. The Edge Data Discovery Scope . . . . . . . . . . . . . . . . 4 75 2.1. Types of Discovery . . . . . . . . . . . . . . . . . . . 5 76 3. Protocols for Discovering Resources . . . . . . . . . . . . . 6 77 4. Protocols for Discovering Functions . . . . . . . . . . . . . 7 78 5. Naming the Data . . . . . . . . . . . . . . . . . . . . . . . 8 79 6. Edge Data Discovery . . . . . . . . . . . . . . . . . . . . . 8 80 7. Use Cases of edge data discovery . . . . . . . . . . . . . . 8 81 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 82 9. Security Considerations . . . . . . . . . . . . . . . . . . . 9 83 10. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 9 84 11. Normative References . . . . . . . . . . . . . . . . . . . . 9 85 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 87 1. Introduction 89 Edge computing is an architectural shift that migrates Cloud 90 functionality (compute, storage, networking, control, data 91 management, etc.) out of the back-end data center to be more 92 proximate to the IoT data being generated at the edges of the 93 network. Edge computing provides local compute, storage and 94 connectivity services, often required for latency- and bandwidth- 95 sensitive applications. Thus, Edge Computing plays a key role in 96 verticals such as Energy, Manufacturing, Automotive, Video Analytics, 97 Gaming, Healthcare, Mining, Buildings and Smart Cities. 99 Edge computing is motivated at least in part by the sheer volume of 100 data that is being created by IoT devices (sensors, cameras, lights, 101 vehicles, drones, wearables, etc.) at the very network edge and that 102 flows upstream, in a direction for which the network was not 103 originally provisioned. In fact, in dense IoT deployments (e.g., 104 many video cameras are streaming high definition video), where 105 multiple data flows collect or converge at edge nodes, data is likely 106 to need transformation (transcoded, subsampled, compressed, analyzed, 107 annotated, combined, aggregated, etc.) to fit over the next hop link, 108 or even to fit in memory or storage. Note also that the act of 109 performing compute on the data creates yet another new data stream! 110 In addition, (transformed) data may be cached, copied and/or stored 111 at multiple locations in the network on route to its final 112 destination. With an increasing percentage of devices connecting to 113 the Internet being mobile, support for in-the-network caching and 114 replication is critical for continuous data availability, not to 115 mention efficient network and battery usage for endpoint devices. 116 Additionally, as mobile devices' memory/storage fill up, in an edge 117 context they may have the ability to offload their data to other 118 proximate devices or resources, leaving a bread crumb trail of data 119 in their wakes. Therefore, although data might originate at edge 120 devices, as more and more data is continuously created, processed and 121 stored, it becomes increasingly dispersed throughout the physical 122 world (outside of or scattered across managed local data centers), 123 increasingly isolated in separate local edge clouds or data silos. 124 Thus there needs to be a standard way to find it. New and existing 125 protocols will need to be identified/developed/enhanced for these 126 purposes. Being able to discover distributed data at the edge or in 127 the middle of the network - will be an important component of Edge 128 computing. 130 An IETF T2T RG Edge discussion was held and a comparative study on 131 the definition of Edge computing was presented in multiple sessions 132 in T2T RG this last year. An IETF BEC (beyond edge computing) effort 133 has been evaluating potential gaps in existing edge computing 134 architectures. Edge Data Discovery is one potential gap that needs 135 evaluation and a solution. 137 And businesses, such as industrial companies, are starting to 138 understand how valuable the data is that they've kept in silo's. 139 Once this data is able to be aggregated on edge computing platforms, 140 they will be able to monetize the value of the data. But this will 141 happen only if data can be discovered and searched among equipment in 142 a standard way. Discovering the data, that its most useful to a 143 given market segment, will be extremely useful in building business 144 revenues. Having a mechanism to provide this granular discovery is 145 the problem that needs solving either with existing, or new, 146 protocols. 148 1.1. Requirements Language 150 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 151 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 152 document are to be interpreted as described in RFC 2119 [RFC2119]. 154 1.2. Terminology 156 o Edge: The device edge is the boundary between digital and physical 157 entities in the last mile network. Sensors, gateways, compute 158 nodes are included. The infrastructure edge includes equipment on 159 the network operator side of the last mile network including cell 160 towers, edge data centers, cable headends, etc. 162 o Edge Computing: distributed computation that is performed near the 163 edge, where the nearness is determined by the system requirements. 164 This includes high performance compute, storage and network 165 equipment on either the device or infrastructure edge. 167 o Data Discovery: process of finding required data from edge 168 databases and consolidating it into a single source, perhaps name, 169 that can be evaluated 171 o NDN: Named Data Networking. IP packets name information, content 172 or endpoints (IP addresses) at the network layer. 174 2. The Edge Data Discovery Scope 176 Edge Computing data will typically be found at the device or 177 infrastructure edges. This is where we are focusing our efforts in 178 defining this edge data discovery problem space. Edge data will also 179 be sent to the cloud as needed. Discovering data which has be sent 180 to the cloud is out of scope of this document. 182 +-------------------------------+ 183 | Core Data Center | 184 +-------------------------------+ 185 *** Backbone 186 * * Network 187 *** 188 +-------------------------------+ 189 | Regional Data Center | 190 +-------------------------------+ 191 *** Metropolitan 192 * * Network 193 *** 194 +-------------------------------+ 195 | Infrastructure Edge| 196 +-------------------------------+ 197 *** Access 198 * * Network 199 *** 200 +-------------------------------+ 201 | |Device Edge 202 +-------------------------------+ 204 Figure 1: Edge Data Discovery Scope 206 2.1. Types of Discovery 208 There are many aspects of discovery. 210 Discovery of new devices added to an environment. Discovery of their 211 capabilities/services in client/server environments. Discovery of 212 these new devices automatically. Discovering a device and then 213 synchronizing the device inventory and configuration for edge 214 services. There are many existing protocols to help in this 215 discovery: UPnP, mDNS, DNS-SD, SSDP, NFC, XMPP, W3C network service 216 discovery, etc. 218 Edge devices discover each other in a standard way. We can use DHCP, 219 SNMP, SMS, COAP, LLDP, and routing protocols such as OSPF for devices 220 to discovery one another. 222 Discovery of link state and traffic engineering data/services by 223 external devices. BGP-LS is one solution. 225 There is discovery of aggregated data on edge compute device, which 226 is the focus of this draft. How can we discover aggregated data on 227 the edge and make use of it. 229 Besides sensor data being aggregated on the edge computing 230 infrastructure, there will also be streaming data (from a camera), 231 meta data (about the data or about the device that generated the data 232 or about the context, etc), or control data regarding an event that 233 triggered, or an executable that embodies a function, method or 234 service, or other piece of code or algorithm. And it could be new 235 data that is created after (multiple) streams converge at the edge 236 node and are processed/transformed in some manner. 238 Discovery of functions in an SFC environment: Service function 239 chaining (SFC) allows the instantiation of an ordered set of service 240 functions and subsequent "steering" of traffic through them. Service 241 functions provide an specific treatment of received packets, 242 therefore they need to be known so they can be used in a given 243 service composition via SFC. So far, how the SFs are discovered and 244 composed has been out of the scope of discussions in IETF. While 245 there are some mechanisms that can be used and/or extended to provide 246 this functionality, work needs to be done. An example of this can be 247 found in "I-D.bernardos- sfc-discovery". 249 Discovery of resources in an NFV environment: virtualized resources 250 do not need to be limited to those available in traditional data 251 centers, where the infrastructure is stable, static, typically 252 homogeneous and managed by a single admin entity. Computational 253 capabilities are becoming more and more ubiquitous, with terminal 254 devices getting extremely powerful, as well as other types of devices 255 that are close to the end users at the edge (e.g., vehicular onboard 256 devices for infotainment, micro data centers deployed at the edge, 257 etc.). It is envisioned that these devices would be able to offer 258 storage, computing and networking resources to nearby network 259 infrastructure, devices and things (the fog paradigm). These 260 resources can be used to host functions, for example to offload/ 261 complement other resources available at traditional data centers, but 262 also to reduce the end-to- end latency or to provide access to 263 specialized information (e.g., context available at the edge) or 264 hardware. Similarly to the discovery of functions, while there are 265 mechanisms that can be reused/extended, there is no complete solution 266 yet defined. An example of work in this area is I-D.bernardos- 267 intarea-vim-discovery" 269 3. Protocols for Discovering Resources 271 Mainly two types of situations need to be covered: 273 1. A set of resources appears (e.g., by a mobile node hosting them 274 joining a network) and they have to be discovered by an existing 275 virtualization infrastructure. 277 2. A mobile device wants to discover virtualization resources 278 available at the current location. 280 Different alternatives of protocols can be used for this: from 281 approaches coupled with the access technology used, to solutions over 282 the top such as UPnP, mDNS, DNS-SD, SSDP, also including solutions 283 embedded into IP discovery/autoconfiguration, such as Neighbor 284 Discovery or DHCP. 286 4. Protocols for Discovering Functions 288 In an SFC environment deployed at the edge, the discovery protocol 289 may need to make available the following information per SF: 291 o Service Function Type, identifying the category of SF provided. 293 o SFC-aware: Yes/No. Indicates if the SF is SFC-aware. 295 o Route Distinguisher (RD): IP address indicating the location of 296 the SF(I). 298 o Pricing/costs details. 300 o Migration capabilities of the SF: whether a given function can be 301 moved to another provider (potentially including information about 302 compatible providers topologically close). 304 o Mobility of the device hosting the SF, with e.g. the following 305 sub- options: 307 Level: no, low, high; or a corresponding scale (e.g., 1 to 10). 309 Current geographical area (e.g., GPS coordinates, post code). 311 Target moving area (e.g., GPS coordinates, post code). 313 o Power source of the device hosting the SF, with e.g. the following 314 sub- options: 316 Battery: Yes/No. If Yes, the following sub-options could be 317 defined: 319 Capacity of the battery (e.g., mmWh). 321 Charge status (e.g., %). 323 Lifetime (e.g., minutes). 325 5. Naming the Data 327 Named Data Networking (NDN) is one of five research projects funded 328 by the U.S. National Science Foundation under its Future Internet 329 Architecture Program. NDN has its roots in an earlier project, 330 Content-Centric Networking (CCN), which Van Jacobson started at Xerox 331 PARC around the time of his Google talk, to turn his architecture 332 vision into a running prototype (see also his CoNEXT 2009 paper and 333 especially Jacobsons ACM Queue interview). The motivation is the 334 mis-match of todays Internet architecture and its usage. Today we 335 build, support, and use Internet applications and services on top of 336 an extremely capable architecture not designed to support them. What 337 if we had an architecture designed to support them? Specifically, 338 todays IP packets can name only endpoints of conversations (IP 339 addresses) at the network layer. What if we generalize this layer to 340 name any information (or content), not just endpoints? We make it 341 easier to develop, manage, secure, and use our networks. NDN can be 342 applied to edge data discovery to make it much easier to extract data 343 by naming it. If data was named we would be able to discover the 344 appropriate data simply by its name. 346 6. Edge Data Discovery 348 How can we discover aggregated data on the edge and make use of it? 349 There are proprietary implementations of collecting data from various 350 databases and consolidating it for evaluation. We need a standard 351 protocol set for doing this data discovery, on the device or 352 infrastructure edge, in order to meet the requirements of many use 353 cases. We will have terabytes of data on the edge and need a way to 354 identify its existance and find the desired data. A user requires 355 the need to search for specific data in a data set and evaluate it 356 using their own tools. The tools are outside the scope of this 357 document, but the discovery of that data is in scope. 359 7. Use Cases of edge data discovery 361 1. Autonomous Vehicles 363 Description: Autonomous vehicles rely on the processing of huge 364 amounts of complex data in real-time for fast and accurate decisions. 365 These vehicles will rely on high performance compute, storage and 366 network resources to process the volumes of data they produce in a 367 low latency way. Various systems will need a standard way to 368 discover the pertinent data for decision making 370 1. Video Surveillance 371 Description: The majority of the video surveillance footage will 372 remain at the edge infrastructure (not sent to the cloud data 373 center). This footage is coming from vehicles, factories, hotels, 374 universities, farms, etc.Much of the video footage will not be 375 interesting to those evaluating the data. A mechanism, set of 376 protocols perhaps, is needed to identify the interesting data at the 377 edge. The data will be in storage systems or in flight in networking 378 equipment. 380 1. Elevator Networks 382 Description: Elevators are one of many industrial applications of 383 edge computing. Edge equipment receives data from 100's of elevator 384 sensors. The data coming into the edge equipment is vibration, 385 temperature, speed, level, video, etc. We need the ability to 386 identify where the data we need to evalute is located. 388 8. IANA Considerations 390 N/A 392 9. Security Considerations 394 Security considerations will be a critical component of edge data 395 discovery particularly as intelligence is moved to the extreme edge 396 where data is to be extracted. 398 10. Acknowledgement 400 11. Normative References 402 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 403 Requirement Levels", BCP 14, RFC 2119, 404 DOI 10.17487/RFC2119, March 1997, 405 . 407 Authors' Addresses 409 Mike McBride 410 Huawei 412 Email: michael.mcbride@huawei.com 414 Dirk Kutscher 415 Huawei 417 Email: dirk.kutscher@huawei.com 418 Eve Schooler 419 Intel 421 Email: eve.m.schooler@intel.com 423 Carlos J. Bernardos 424 Universidad Carlos III de Madrid 425 Av. Universidad, 30 426 Leganes, Madrid 28911 427 Spain 429 Phone: +34 91624 6236 430 Email: cjbc@it.uc3m.es 431 URI: http://www.it.uc3m.es/cjbc/