﻿<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="info" docName="draft-mcbride-data-discovery-use-cases-00"
     ipr="trust200902">
  <front>
    <title abbrev="Data Discovery Use Cases">Data Discovery Use Cases</title>

    <author fullname="Mike McBride" initials="M" surname="McBride">
      <organization>Futurewei</organization>

      <address>
        <email>michael.mcbride@futurewei.com</email>
      </address>
    </author>
    
    <author fullname="Jim Guichard" initials="J" surname="Guichard">
      <organization>Futurewei</organization>

      <address>
        <email>james.n.guichard@futurewei.com</email>
      </address>
    </author>   

   <author fullname="Yingzhen Qu" initials="Y" surname="Qu">
      <organization>Futurewei</organization>

      <address>
        <email>yingzhen.qu@futurewei.com</email>
      </address>
    </author>
    
        <author fullname="Thomas Hardjono" initials="T" surname="Hardjono">
      <organization>MIT</organization>
      <address>
        <email>hardjono@mit.edu</email>
      </address>
    </author>
    
        <author fullname="Carlos J. Bernardos" initials="CJ." surname="Bernardos">
      <organization abbrev="UC3M">Universidad Carlos III de Madrid</organization>
      <address>
        <postal>
          <street>Av. Universidad, 30</street>
          <city>Leganes, Madrid</city>
          <code>28911</code>
          <country>Spain</country>
        </postal>
        <phone>+34 91624 6236</phone>
        <email>cjbc@it.uc3m.es</email>
        <uri>http://www.it.uc3m.es/cjbc/</uri>
      </address>
    </author>

    <date day="19" month="February" year="2021"/>

    <abstract>

   <t>There needs to be a solution for locating and capturing data in a standardized way. Data 
   may be cached, copied and/or stored at multiple locations in the network on route to its final destination.  
   With an increasingly high volume of devices connecting to the Internet, support for network 
   caching and replication is critical for continuous data availability.  There are data repositories throughout
   a modern network and there needs to be a standardized way to locating the repositories and discovering 
   the desired data within.</t>
   <t>There are several use cases which illustrate a need for a data discovery solution. An application 
   might need to query the network to discover resources (program, service, resource) that can help the local 
   application perform a particular task. Additionally, there could be volumes of data which needs to be 
   searched and discovered in order to provide a result to be acted upon by the application. These are a 
   couple of the use cases being addressed in this document.</t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
     <t>An application might need to query the network to discover resources  
     that can help the local application perform a particular task. There could be 
     volumes of data which needs to be searched and discovered in order to provide a result to 
     be acted upon. </t>
     <t>Data discovery might involve an application requesting data. It might involve a device looking to store 
     data or to request the processing from a data store and then gather the result. Or it could be execution of a set of 
     instructions at an appropriate device in the network. Another possible area is service chaining
     where an application needs to run its data through a firewall but the selected firewall must
     have a particular rule set applicable to this particular application. Perhaps the service function 
     has to be located within a particular environment (security level). Or a particular device must 
     be found that is capable of executing upon a set of instructions provided in the data packet. 
     This document focuses on various data discovery use cases.</t>
    </section>

      <section anchor="requirements-language" title="Requirements Language">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
        document are to be interpreted as described in <xref
        target="RFC2119">RFC 2119</xref>.</t>
      </section>

      <section anchor="terminology" title="Terminology">
        <t><list style="symbols">
            <t>SFC: Service Function Chaining</t>
            <t>APN: Application-Aware Networking</t>
            <t>DLT: Distributed Ledger Technologies</t>
          </list></t>
      </section>

    <section title="Problem Statement">
      
      <t>As discussed in <xref target="I-D.mcbride-data-discovery-problem-statement"/>, there are 
      many proprietary and standardized ways of discovering networking devices and hosts.  There 
      are many solutions for discovering data within a database.  There are proprietary, non-standardized, 
      ways of discovering the data that may be stored throughout an environment of networking devices.  
      We can discover information about the devices but can't locate and capture stored data (resource, 
      program, service, etc) in a standard way. With more networking devices storing collected data there 
      needs to be a standard way of discovering the specific data needed amongst a potentially huge lake of databases.</t>
      <t>This data discovery problem is particularly true for use cases where it will be important to have the capability to express a 
      data request within the data packets and have the network route the traffic accordingly. This might be an application requesting 
      data. It might be a device looking to store data or to request the processing, and result, from a data store. It could be 
      execution of a set of instructions at an appropriate device in the network. An application may need to run its data through a 
      firewall but the selected firewall must have a particular rule set applicable to this particular application. Perhaps a service 
      function needs to be located within a particular environment (security level). Or a particular device must be found that is 
      capable of executing upon a set of instructions provided in the data packet. This document focuses on data discovery use cases. </t>
     
    <section title="Types of Data">
    
    <t>Discoverable data can be a resource, program, service etc. And an infinite amount, and types, of data can be discoverable
    including statistics, measurements, temperature, location, metadata, health, transactions and so on.</t>
         <t></t>
       <t><list style="">
       <t>Program: applets, graphics, games, spreadsheets, database systems, browsers, etc</t>
       <t>Service: firewalls, load balancers, spam filters, header manipulators, etc</t>
       <t>Resource: CPU, memory, etc</t>
       </list></t>
      <t></t>
     </section>
    </section>


    <section title="Use Cases">
      <t>Here are some use cases to illustrate the need for data discovery:</t>
      
      <section title="Application-Aware Service Function Chaining">
      
      <t>Application Aware Networking (APN), as described in <xref target="I-D.li-apn-problem-statement-usecases"/>, 
      allows applications to specify finer granularity requirements to the network operator by providing application knowledge to the 
      network layer. This granularity includes the ability to convey the characteristics of an 
      application's traffic flow and program the network infrastructure accordingly to provide 
      service assurance.</t>
     <t>An application might need to query the network to discover resources that can help the local application 
     perform a particular task. Additionally, there could be volumes of data which needs to be searched and 
     discovered in order to provide a result to be acted upon by the application. </t>
      <t>End-to-end service delivery often needs to go through various service functions, including 
      traditional network service functions such as firewalls, DPIs as well as new application-specific 
      functions, both physical and virtual. APN provides assigning a given traffic flow to a specific service function 
      chain (SFC) but also specifically allows the subsequent steering according to the application information 
      carried in the APN packets.</t>
      <t>When an application needs to run its data through a firewall, but the selected firewall must have a particular 
      rule set applicable to this particular application, then the application can leverage data discovery functionality.
      The service function may be required to be located within a particular environment such as a with a certain security 
      level. Data discovery is needed to find that particular rule set (amongst the various firewalls) and then steer the 
      packet accordingly. Or a particular device, along the SFC, may need to be found that is capable of executing upon 
      a set of instructions provided in the data packet. The data capabilities of devices needs to be
      discoverable in order to steer the application packets towards them along a SFC.</t>
      </section>
      
      <section title="Available CPU and Memory Resources">
      <t>An application, or service, may need to discover the available server memory and compute resources from the network. 
      A certain amount of CPU resources may be required to support a particular application workload. And the application may 
      need to know the maximum CPU utilization threshold available on a compute device. Gathering info on available clock speeds 
      and amount of cores can help determine how quickly servers load and interact with a set of applications. The network can 
      provide the discoverability of the necessary data (cpu, memory) in order for applications to properly execute. A network planning 
      app can also utilize this information to help predict future resource demands in order to meet applications performance requirements.</t>
      </section>
 
      <section title="Data Dependency">
      <t>There may be scenarios where it's critical to find X type of data that can help a local application, or service, successfully perform a particular 
      task. Perhaps an industrial application needs real time measurement data, such as temperature, in order to execute a process. 
      This required data may be cached, copied and/or stored at multiple locations in the network on route to its final destination.  With an 
      increasing percentage of devices connecting to the Internet being mobile, support for in-the-network caching and replication is critical for 
      continuous data availability, not to mention efficient network and battery usage for endpoint devices. In order for some applications to 
      properly execute, we need to find a way for the network to provide support for data discovery.</t>
      </section> 
      
    <section title="Distributed Ledgers">
      <t>DLT Gateways, as discussed in <xref target="I-D.sardon-blockchain-gateways-usecases"/>, will be given a permissioned 
      view of assets/transactions, that they are requested to transfer, within their attached DLT domain. GW’s may also need to discover 
      assets/transactions, not explicitly provided, within the DLT domain. It may become necessary for the GW (or other network element.. if 
      permitted) to discover the data (asset, resource, service…) in order to transfer the required asset. Discovery of the data parts is also needed 
      to validate the transfer after the asset movement.  The ledger in the DLT will not hold all the relevant information pertaining to a previous 
      asset transfer. So there needs to be ways to search/discover these. The data parts, to be discovered, include: </t>
      <t></t>
       <t><list style="">
       <t>Relevant DLT transaction public-keys of the involved entities (i.e. public-keys (addresses) used on both DLTs.</t>
       <t>Relevant entity public-keys and X.509 certs (Originator, owner of gateway G1, owner of gateway G2, Beneficiary). This is similar 
       to the X.509 certs and cert-profiles used in the SWIFT banking network.</t>
       <t>Relevant asset-related JSON documents (e.g. asset profiles).</t>
       </list></t>
      <t></t>
      </section> 
      
    <section title="Edge Computing">
      <t>As described in <xref target="I-D.mcbride-edge-data-discovery-overview"/>, the required data may be distributed across 
      thousands of edge computing devices. Edge computing is motivated by the sheer volume of data that is being created 
      by endpoint devices (sensors, cameras, lights, vehicles, drones, wearables, etc.) at the very network edge. In dense IoT 
      deployments (e.g., many video cameras are streaming high definition video), where multiple data flows collect or converge at edge 
      nodes, data is likely to need transformation (transcoded, subsampled, compressed, analyzed, annotated, combined, aggregated, etc.) 
      to fit over the next hop link, or even to fit in memory or storage.  This data, distributed across the edge, will need to be discovered
      in order to perform any number of functions such as an IoT application needing elevator vibration data in order to execute a process.</t>
      </section> 
 
      </section>

    <section title="IANA Considerations">
      <t></t>
    </section>

    <section title="Security Considerations">
      <t/>
      <t></t>
    </section>

    <section title="Acknowledgements">
      <t/>

      <t></t>
    </section>
  </middle>

  <back>
     <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>
      <?rfc include='reference.I-D.mcbride-data-discovery-problem-statement'?>
      <?rfc include='reference.I-D.li-apn-problem-statement-usecases'?>
      <?rfc include='reference.I-D.mcbride-edge-data-discovery-overview'?>
      <?rfc include='reference.I-D.sardon-blockchain-gateways-usecases'?>     
    </references>

    
    
  </back>
</rfc>