idnits 2.17.1 draft-irtf-icnrg-flic-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 11 instances of too long lines in the document, the longest one being 24 characters in excess of 72. == There are 2 instances of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 4, 2019) is 1634 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Missing Reference: 'SecurityCtx' is mentioned on line 428, but not defined == Missing Reference: 'AuthTag' is mentioned on line 428, but not defined == Missing Reference: 'NodeData' is mentioned on line 434, but not defined == Missing Reference: 'SubtreeSize' is mentioned on line 459, but not defined == Missing Reference: 'SubtreeDigest' is mentioned on line 459, but not defined == Missing Reference: 'Locators' is mentioned on line 435, but not defined == Missing Reference: 'GroupData' is mentioned on line 449, but not defined == Missing Reference: 'LeafSize' is mentioned on line 459, but not defined == Missing Reference: 'LeafDigest' is mentioned on line 459, but not defined == Missing Reference: 'NsId' is mentioned on line 459, but not defined == Missing Reference: 'NIST 800-38D' is mentioned on line 607, but not defined == Missing Reference: 'Name' is mentioned on line 761, but not defined == Missing Reference: 'ExpiryTime' is mentioned on line 761, but not defined -- Looks like a reference, but probably isn't: '1' on line 1166 == Unused Reference: 'NDNTLV' is defined on line 1384, but no explicit reference was found in the text == Unused Reference: 'RFC3552' is defined on line 1392, but no explicit reference was found in the text == Unused Reference: 'RFC5226' is defined on line 1397, but no explicit reference was found in the text == Outdated reference: A later version (-08) exists of draft-irtf-icnrg-terminology-07 -- Obsolete informational reference (is this intentional?): RFC 5226 (Obsoleted by RFC 8126) Summary: 1 error (**), 0 flaws (~~), 19 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 ICNRG C. Tschudin 3 Internet-Draft University of Basel 4 Intended status: Experimental C. Wood 5 Expires: May 7, 2020 University of California Irvine 6 M. Mosko 7 PARC, Inc. 8 D. Oran, Ed. 9 Network Systems Research & Design 10 November 4, 2019 12 File-Like ICN Collections (FLIC) 13 draft-irtf-icnrg-flic-02 15 Abstract 17 This document describes a bare bones "index table"-approach for 18 organizing a set of ICN data objects into a large, File-Like ICN 19 Collection (FLIC). At the core of this collection is a so called 20 manifest which acts as the collection's root node. The manifest 21 contains an index table with pointers, each pointer being a hash 22 value pointing to either a final data block or another index table 23 node. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at https://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on May 7, 2020. 42 Copyright Notice 44 Copyright (c) 2019 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (https://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 61 2. Design Goals . . . . . . . . . . . . . . . . . . . . . . . . 5 62 3. FLIC Structure . . . . . . . . . . . . . . . . . . . . . . . 6 63 3.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 64 3.2. Locators . . . . . . . . . . . . . . . . . . . . . . . . 7 65 3.3. Namespaces . . . . . . . . . . . . . . . . . . . . . . . 7 66 3.4. Manifest Metadata . . . . . . . . . . . . . . . . . . . . 8 67 3.5. Pointer Annotations . . . . . . . . . . . . . . . . . . . 8 68 3.6. Manifest Grammar (ABNF) . . . . . . . . . . . . . . . . . 9 69 3.7. Manifest Trees . . . . . . . . . . . . . . . . . . . . . 11 70 3.7.1. Traversal . . . . . . . . . . . . . . . . . . . . . . 11 71 3.8. Manifest Encryption . . . . . . . . . . . . . . . . . . . 13 72 3.8.1. Preshared Key Algorithm . . . . . . . . . . . . . . . 13 73 3.8.2. RSA Key Encapsulation . . . . . . . . . . . . . . . . 14 74 3.8.3. RSA KEM-DEM . . . . . . . . . . . . . . . . . . . . . 16 75 3.9. Protocol Encodings . . . . . . . . . . . . . . . . . . . 16 76 3.9.1. CCNx Encoding . . . . . . . . . . . . . . . . . . . . 16 77 3.9.1.1. CCNx Hash Naming . . . . . . . . . . . . . . . . 17 78 3.9.1.2. CCNx Single Prefix . . . . . . . . . . . . . . . 17 79 3.9.1.3. CCNx Segmented Prefix . . . . . . . . . . . . . . 18 80 3.9.1.4. CCNx Hybrid Schema . . . . . . . . . . . . . . . 19 81 3.9.2. NDN Encoding . . . . . . . . . . . . . . . . . . . . 19 82 3.9.2.1. NDN Hash Naming . . . . . . . . . . . . . . . . . 19 83 3.9.2.2. NDN Single Prefix . . . . . . . . . . . . . . . . 20 84 3.9.2.3. NDN Segmented Prefix . . . . . . . . . . . . . . 21 85 3.9.2.4. NDN Hybrid Schema . . . . . . . . . . . . . . . . 22 86 3.10. Example Structures . . . . . . . . . . . . . . . . . . . 22 87 3.10.1. Leaf-only data . . . . . . . . . . . . . . . . . . . 22 88 3.10.2. Linear . . . . . . . . . . . . . . . . . . . . . . . 22 89 4. Usage Examples . . . . . . . . . . . . . . . . . . . . . . . 23 90 4.1. Locating FLIC leaf and manifest nodes . . . . . . . . . . 23 91 4.2. Seeking . . . . . . . . . . . . . . . . . . . . . . . . . 24 92 4.3. Block-level de-duplication . . . . . . . . . . . . . . . 25 93 4.4. Growing ICN collections . . . . . . . . . . . . . . . . . 25 94 4.5. Re-publishing a FLIC under a new name . . . . . . . . . . 26 95 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27 96 6. Security Considerations . . . . . . . . . . . . . . . . . . . 27 97 7. Appendix A: Building Trees . . . . . . . . . . . . . . . . . 27 98 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 29 99 8.1. Normative References . . . . . . . . . . . . . . . . . . 29 100 8.2. Informative References . . . . . . . . . . . . . . . . . 30 101 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 30 103 1. Introduction 105 ICN architectures such as Content-Centric Networking (CCN)[RFC8569] 106 and Named Data Networking [NDN] are well suited for static content 107 distribution. Each piece of possibly immutable) static content is 108 assigned a name by its producer. Consumers fetch this content using 109 said name. Optionally, consumers may specify the full name of 110 content, which includes its name and a unique (with overwhelming 111 probability) cryptographic digest of said content. (See 112 [I-D.irtf-icnrg-terminology] for a formal definition of "full name".) 114 To enable requests with full names, consumers need a priori knowledge 115 of content digests. Manifests, or catalogs, are data structures 116 commonly proposed to transport this information. Typically, 117 manifests are signed content objects (data) which carry a collection 118 of hash digests. However, as content objects, manifests themselves 119 may be fetched by full name. Thus, manifests may contain hash 120 digests of, or pointers to, other manifests or content objects. A 121 collection of manifests and content objects represents a large piece 122 of application data, e.g., one that cannot otherwise fit in a single 123 content object. 125 Structurally, this relationship between manifests and content objects 126 is reminiscent of the UNIX inode concept with index tables and memory 127 pointers. In this document, we specify a simple, yet extensible, 128 manifest data structure called FLIC - File-Like ICN Collection. FLIC 129 is suitable for ICNs such as CCN and NDN. We describe the FLIC 130 design, grammar, and various use cases, e.g., seeking, de- 131 duplication, extension, and variable-sized encoding. We also include 132 FLIC encoding examples for CCN and NDN. 134 The purpose of a manifest is to concisely name the constiuent pieces 135 of a larger object. A FLIC manifest does this by using a first 136 manifest to name and cryptographically sign the data structure and 137 then use more concise lists of hash-based names to indicate the 138 constituent pieces. This maintains strong security from a single 139 signature. A Manifest entry gives one enough information to create 140 an Interest for that entry, so it must specify the name, the hash 141 digest, and if needed the locators (routing hints). 143 FLIC is a distributed data structure best illustrated by the 144 following picture. 146 root manifest 147 .------------------------------------. 148 | optional name: | 149 | /icn/name/of/this/flic | 150 | | 151 | HashGroup (HG): | 152 | optional metadata: | 153 | overall digest, locator, etc. | .------. 154 | hash-valued data pointer -----------> | data | 155 | ... | `------' sub manifest 156 | hash-valued manifest pointer ------. .------------------. 157 | | `--> | -----> 158 | optional additional HashGroups .. | | -----> 159 | | `------------------' 160 | optional signature | 161 `------------------------------------' 163 Figure 1: A FLIC manifest and its directed acyclic graph 165 A key question is how one names the root manifest, the application 166 data, and other subsequent manifests. The question of namespaces is 167 specific to the names of each Content Object (CCNx) or Data (NDN), 168 and is separate from the question of Locators. FLIC allows one to 169 use a first namespace for the manifests and a second namespace for 170 the application data. A given namespace may use one of three 171 schemas: hash-based naming, single-prefix naming, or segmented 172 naming. We describe the allowed methods in Section 3.3. There are 173 also particulars of how to encode the name schema in a given ICN 174 protocol, which we describe in Section 3.9. 176 Locators are routing hints to find a Content Object / Data. They 177 exist in both CCNx and NDN, though the specifics differ. A FLIC 178 manifest encodes locators the same for both ICN protocols, though 179 they are encoded differently in the underlying protocol. See 180 Section 3.9 for encoding differences. 182 We follow the CCNx [RFC8569] terminology where a Content Object is 183 the data structure that holds application payload. It is made up of 184 an optional Name, a PayloadType, a Payload, and an optional 185 Signature. 187 FLIC has encodings for CCNx encoding (Section 3.9.1) as per RFC 8609 188 [RFC8609] and for NDN (Section 3.9.2). 190 An example implementation in Python may be found at 191 [FLICImplementation]. 193 1.1. Requirements Language 195 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 196 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 197 document are to be interpreted as described in RFC 2119 [RFC2119]. 199 2. Design Goals 201 The preferred FLIC structure copies the proven UNIX inode concept of 202 direct and indirect pointers, but without the specific structural 203 forms of direct versus indirect. 205 In FLIC terms, a direct pointer links to application-level data, 206 which is a Content Object with application data in the Payload. An 207 indirect pointer links to a Content Object with a FLIC Manifest in 208 the Payload. 210 Links in FLIC use hash-based naming of Content Objects, rather than 211 inode block numbers. Both CCNx and NDN support hash-based naming, 212 though the details vary. See Section 3.9.1 and Section 3.9.2. 213 Another advantage of using hash-based naming is it permits block- 214 level de-duplication of application data because two blocks with the 215 same payload will have the same hash name. 217 Because FLIC uses hash-based naming, FLIC graphs are inherently 218 acyclic. 220 The preferred FLIC structure includes a root manifest with a strong 221 cryptographic signature and then strong hash names to other manifests 222 (e.g. SHA256). The advantage of this structure is the single 223 signature in the root manifest covers the entire data structure no 224 matter how many additional manifests are in the data structure. 225 Another advantage of this structure is it removes the need to use 226 chunk (CCNx) or segment (NDN) name components for the subordinate 227 manifests. 229 FLIC supports manifest encryption separate from application payload 230 encryption. It has a flexible encryption envelope to support various 231 encryption algorithms and key discovery mechanisms. The byte layout 232 allows for in-place encryption and decryption. 234 A limitation of this approach is that one cannot construct a hash- 235 based name for a child until one knows the payload of that child. In 236 practical terms, this means that one must have the complete 237 application payload available at the time of manifest creation. 239 FLIC's design allows straightforward applications that just need to 240 traverse a linear set of related objects to do so simply, but FLIC 241 has two extensibility mechanisms that allow for more sophisticated 242 uses: manifest metadata, and pointer annotations. These are 243 described in Section 3.4 and Section 3.5 respectively. 245 3. FLIC Structure 247 3.1. Terminology 249 Data Object: a CCNx nameless Content Object that usually only has 250 Payload. It might also have an ExpiryTime to limit the lifetime 251 of the data. 253 Direct Pointer: borrowed from inode terminology, it is a CCNx link 254 using a content object hash restriction and a locator name to 255 point to a Data Object. 257 Indirect Pointer: borrowed from inode terminology, it is a CCNx link 258 using a content object hash restriction and a locator name to 259 point to a manifest content object. 261 Manifest: a CCNx ContentObject with PayloadType 'Manifest' and a 262 Payload of the encoded manifest. A leaf manifest only has direct 263 pointers. An internal manifest has a mixture of direct and 264 indirect manifests. 266 Leaf Manifest: all pointers are direct pointers. 268 Internal Manifest: some or all pointers are indirect. The order and 269 number of each is up to the manifest builder. By convention, all 270 the direct manifests come first, then the indirect. 272 Manifest Waste: a metric used to measure the amount of waste in a 273 manifest tree. Waste is the number of unused pointers. For 274 example, a leaf manifest might be able to hold 40 direct pointers, 275 but only 30 of them are used, so the waste of this node is 10. 276 Manifest tree waste is the sum of waste over all manifests in a 277 tree. 279 Root Manifest: A signed, named, manifest that points to nameless 280 manifest nodes. This structure means that the internal tree 281 structure of internal and leaf manifests have no names and thus 282 may be located anywhere in a namespace, while the root manifest 283 has a name to fetch it by. 285 Top Manifest: A preferred manifest structure is to use a Root 286 manifest that points to a single Internal manifest called the Top 287 Manifest. The Top manifest the begins the structure used to 288 organize manifests. 290 Namespace: The prefix and object name that goes inside a Content 291 Object. It may include typed name components specifying a version 292 and/or chunk/segment number. 294 Locator: A routing hint in an Interest used by forwarding to get the 295 Interest to where it can be matched based on its Namespace-derived 296 name. 298 3.2. Locators 300 Locators are routing hints used by forwarders to get an Interest to a 301 node in the network that can resolve the Interest's name. In some 302 naming conventions, the name might only be a hash-based name so the 303 Locator is the only available routing information. 305 A manifest Node may define one or more Locator prefixes that can be 306 used in the construction of Interests from the pointers in the 307 manifest. The Locators are inherited when walking a manifest tree, 308 so they do not need to be defined everywhere. It is RECOMMENDED that 309 only the Root manifest contain Locators so that a single operation 310 can update the locators. One usecase when storing application 311 payloads at different replicas is to replace the Root manifest with a 312 new one that contains locators for the current replicas. 314 3.3. Namespaces 316 A FLIC manifest may define zero or more namespaces. If none are 317 defined, FLIC uses the default Hash Naming approach. If using 318 namespaces, typically there are two defined: one for the manifest 319 namespace and one for the application data namespace. If the two are 320 the same, they can share a namepace. There may be more than two 321 namespaces. 323 A namespace follows a naming convention. The naming convention 324 governs how FLIC creates the ICN Name that goes in an Interest's Name 325 and must match a Content Object / Data Name. The conventions are: 326 (1) Hash Naming, (2) Single Prefix, and (3) Segmented Prefix. The 327 default is to use Hash Naming. Hash Naming does not include anything 328 besides a hash name in the Interest's name and relies on the Locator 329 to forward the Interest. Single Prefix uses the same name 330 differntiated only by a Content Object's implicit hash. Segmented 331 Prefix keeps a counter for the namespace, starting with 0, and 332 increments it after each use of the namespace. 334 The namespace definitions may be inherited from the Root manifest or 335 the Top manifest, or any prior manifest. It is RECOMMENDED that the 336 namespace definitions appear in the Root manifest so they can be 337 updated by a single operation. Because Segmented Prefix namespaces 338 use a counter, it is RECOMMENDED to only define them in the Root 339 manifest or Top manifest and not elsewhere, as it may confuse the 340 counters. 342 In the NodeData, there may be zero or more NSDef contains. Each 343 NSDef defines a namespace identifier (octet string) and its naming 344 convention. For the Hash Naming convention, no further information 345 is required. For the Single Prefix and Segmented Prefix conventions, 346 the NSDef specifies the ICN Name prefix used by the namespace. 348 A HashGroup may have an NSRef container that indicates which 349 namespace it is using, and by implication which naming convention and 350 the corresponding prefix. If there is no NSRef, the hash group uses 351 Hash Naming convention. 353 3.4. Manifest Metadata 355 The FLIC Manifest may be extended by defining TLVs that apply to the 356 Manifest as a whole, or alternatively, individually to every data 357 object pointed to by the Manifest. This basic specification does not 358 specify any, but metadata TLVs may be defined through additional RFCs 359 or via Vendor TLVs. FLIC uses a Vendor TLV structure similar to 360 [RFC8609] for vendor-specific annotations that require no 361 standardization process. 363 For example, some applications may find it useful to allow 364 specialized consumers such as _repositories_ (for example 365 [repository]) or enhanced forwarder caches to pre-place, or 366 adaptively pre-fetch data in order to improve robustness of delay 367 performance. We note in passing that FLICs use of separate 368 namespaces for the Manifest and the underlying Data allows different 369 encryption keys to be used, hence giving a element like a cache or 370 repository access to the Manifest data does not as a side effect 371 reveal the contents of the application data itself. 373 3.5. Pointer Annotations 375 FLIC allows each manifest pointer to be annotated with extra data. 376 Annotations allow applications to exploit metadata about each Data 377 Object pointed to without having to first fetch the corresponding 378 Content Object. This specification defines one such annotation. The 379 _SizeAnnotation_ specifies the number of application layer octets 380 covered by the pointer. 382 An annotation may, for example, give hints about a preferred 383 traversal order for fetching the data, or an importance/precedence 384 indication to aid applications that do not require every content 385 object pointed to in the manifest to be fetched. This can be very 386 useful for real-time or streaming media applications that can perform 387 error concealment when rendering the media. 389 Additional annotations may be defined through additional RFCs or via 390 Vendor TLVs. FLIC uses a Vendor TLV structure similar to [RFC8609] 391 for vendor-specific annotations that require no standardization 392 process. 394 3.6. Manifest Grammar (ABNF) 396 The manifest grammar is mostly independent of the transport ICN 397 protocol. The TLV encoding therefore follows the corresponding ICN 398 protocol, so for CCNx FLIC uses 2 octet length, 2 octet type and for 399 NDN uses the 1/3/5 octet types and lengths. There are also some 400 differences in how one structures and resolves links. [RFC8569] 401 defines HashValue and Link for CCNx encodings. The NDN 402 ImplicitSha256DigestComponent defines HashValue and NDN Delegation 403 (from Link Object) defines Link for NDN. The Section 3.9 section 404 below specifies these differences. 406 The basic structure of a FLIC manifest is a security context, a node, 407 and an authentication tag. The security context and authentication 408 tag are not needed if the node is unencrypted. A node is made up of 409 a set of metadata, the NodeData, that applies to the entire node, and 410 one or more HashGroups that contain pointers. 412 The NodeData element defines the namespaces used by the manifest. 413 There may be multiple namespaces, depending on how one names 414 subsequent manifests or data objects. Each HashGroup may reference a 415 single namespace to control how one forms Interests from the 416 HashGroup. If one is using separate namespaces for manifests and 417 application data, one needs at least two HashGroups. For a manifest 418 structure of "MMMDDD," (where M means manifest (indirect pointer) and 419 D means data (direct pointer)) for example, one would have a first 420 hash group for the child manifests with its namespace and a second 421 HashGroup for the data pointers with the other namespace. If one 422 used a structure like "MMMDDDMMM," then one would need three 423 HashGroups. 425 TYPE = 2OCTET / {1,3,5}OCTET ; As per CCNx or NDN TLV 426 LENGTH = 2OCTET / {1,3,5}OCTET ; As per CCNx or NDN TLV 428 Manifest = TYPE LENGTH [SecurityCtx] (EncryptedNode / Node) [AuthTag] 430 SecurityCtx = TYPE LENGTH AlgorithmCtx 431 AlgorithmCtx = PresharedKeyCtx / RsaKemCtx / RsaKemDemCtx 432 AuthTag = TYPE LENGTH *OCTET ; e.g. AEAD authentication tag 433 EncryptedNode = TYPE LENGTH *OCTET ; Encrypted Node 434 Node = TYPE LENGTH [NodeData] 1*HashGroup 435 NodeData = TYPE LENGTH [SubtreeSize] [SubtreeDigest] [Locators] 0*NSDef 436 SubtreeSize = TYPE LENGTH INTEGER 437 SubtreeDigest = TYPE LENGTH HashValue 438 NSDef = TYPE LENGTH NsId NsSchema 439 NsId = TYPE LENGTH INTEGER 440 NsSchema = HashSchema / SinglePrefixSchema / SegmentedPrefixSchema 441 HashSchema = TYPE 0 442 SinglePrefixSchema = TYPE LENGTH Name 443 SegmentedPrefixSchema = TYPE LENGTH Name 445 Locators = TYPE LENGTH 1*Link 446 HashValue = TYPE LENGTH *OCTET ; As per ICN Protocol 447 Link = TYPE LENGTH *OCTET ; As per ICN protocol 449 HashGroup = TYPE LENGTH [GroupData] (Ptrs / AnnotatedPtrs) 450 Ptrs = TYPE LENGTH *HashValue 451 AnnotatedPtrs = TYPE LENGTH *PointerBlock 452 PointerBlock = TYPE LENGTH *Annotation Ptr 453 Ptr = TYPE LENGTH HashValue 455 Annotation = SizeAnnotation / Vendor 456 SizeAnnotation = TYPE LENGTH Integer 457 Vendor = TYPE LENGTH PEN *OCTET 459 GroupData = TYPE LENGTH [LeafSize] [LeafDigest] [SubtreeSize] [SubtreeDigest] [NsId] 460 LeafSize = TYPE LENGTH INTEGER 461 LeafDigest = TYPE LENGTH HashValue 463 PresharedKeyCtx = TYPE LENGTH PresharedKeyData 464 PresharedKeyData = KeyNum IV Mode 465 KeyNum = TYPE LENGTH INTEGER 466 IV = TYPE LENGTH 1*OCTET 467 Mode = TYPE LENGTH (AES-GCM-128 / AES-GCM-256) 469 RsaKemCtx = 2 LENGTH RsaKemData 470 RsaKemData = KeyId IV Mode WrappedKey LocatorPrefix 471 KeyId = TYPE LENGTH HashValue; ID of Key Encryption Key 472 WrappedKey = TYPE LENGTH 1*OCTET 473 LocatorPrefix = TYPE LENGTH Link 475 RsaKemDemCtx = 3 LENGTH RsaKemDemData 476 RsaKemDemData = KeyId IV Mode WrappedKey LocatorPrefix 478 Figure 1: FLIC Grammar 480 SecurityCtx: information about how to decrypt an EncryptedNode. The 481 structure will depend on the specific encryption algorithm. 483 AlgorithmId: The ID of the encryption method (e.g. preshared key, a 484 broadcast encryption scheme, etc.) 486 AlgorithmData: The context for the encryption algorithm. 488 EncryptedNode: An opaque octet string with an optional 489 authentication tag (i.e. for AEAD authentication tag) 491 Node: A plain-text manifest node. The structure allows for in-place 492 encryption/decryption. 494 NodeData: the metadata about the Manifest node 496 SubtreeSize: The size of all application data at and below the Node 497 or Group 499 SubtreeDigest: The cryptographic digest of all application data at 500 and below the Node or Group 502 Locators: An array of routing hints to find the manifest components 504 HashGroup: A set of child pointers and associated metadata 506 Ptrs: A list of one or more Hash Values 508 GroupData: Metadata that applies to a HashGroup 510 LeafSize: Size of all application data immediately under the Group 511 (i.e. via direct pointers) 513 LeafDigest: Digest of all application data immediately under the 514 Group 516 Ptr: The ContentObjectHash of a child, which may be a data 517 ContentObject (i.e. with Payload) or another Manifest Node. 519 3.7. Manifest Trees 521 3.7.1. Traversal 523 FLIC manifests use a pre-order traversal. This means they are read 524 top to bottom, left to right. The algorithms in Figure 2 show the 525 in-order forward traversal code and the reverse-oder traversal code, 526 which we use below to construct such a tree. This document does not 527 mandate how to build trees. Appendix A provides a detailed example 528 of building inode-like trees. 530 If using Annotated Pointers, an annotation could influence the 531 traversal order. 533 preorder(node) 534 if (node = null) 535 return 536 visit(node) 537 for (i = 0, i < node.child.length, i++) 538 preorder(node.child[i]) 540 reverse_preorder(node) 541 if (node = null) 542 return 543 for (i = node.child.length - 1, i >= 0, i-- ) 544 reverse_preorder(node.child[i]) 545 visit(node) 547 Figure 2: Traversal Pseudocode 549 In terms of the FLIC grammar, one expands a node into its hash 550 groups, visiting each hash group in order. In each hash group, one 551 follows each pointer in order. Figure Figure 3 shows how hash groups 552 inside a manifest expand like virtual children in the tree. The in- 553 order traversal is M0, HG1, M1, HG3, D0, D1, D2, HG2, D3, D4. 555 M0 ____ 556 | \ 557 HG1 HG2 558 | \ | \ 559 M1 D2 D3 D4 560 | 561 HG3 562 | \ 563 D0 D1 565 Figure 3: Node Expansion 567 Using the example manifest tree shown in Figure Figure 9, the in- 568 order traversal would be: Root, M0, M1, D0, D1, D2, M2, D3, D4, D5, 569 M3, D6, D7, D8. 571 3.8. Manifest Encryption 573 This document specifies three encryption methods. The first is a 574 preshared key algorithm, where the parties are assumed to have the 575 decryption keys already. This is useful, for example, when using a 576 key agreement protocol such as CCNxKE. The second is an RSA key 577 encapsulation mechanisms (RsaKem). The third is a standard RSA KEM- 578 DEM mechanism that uses a shared group key (RsaKemDem). 580 For group key based encryption, RsaKem and RsaKemDem, this 581 specification only details the pertinent aspects of the encryption. 582 It does not specify aspects of a key manager which may or may not be 583 used as part of key distribution and management, nor does it specify 584 the protocol between a key manager and a publisher. In it's 585 simpliest form, the publisher could be the key manager, so there is 586 no extra protocol needed between the publisher and key manager. This 587 specification does describe how a consumer locates the appropriate 588 keys. 590 While the preshared key algorithm is limited in use, the AES 591 encryption mechanisms described apply to the group key mechanisms 592 too. They group key mechanisms simply facilitate distribution of the 593 shared key without an on-line key agreement protocol like CCNxKE. 595 A fourth encryption mechanism based on elliptic curve key 596 distribution is forthcoming. 598 3.8.1. Preshared Key Algorithm 600 The KeyNum identifies a key on the receiver. The key must be of the 601 correct length of the Mode used. If the key is longer, use the left 602 bits. Many receivers many have the same key with the same KeyId. A 603 publisher creates a signed root manifest with a security context. A 604 consumer must ensure that the root manifest signer is the expected 605 publisher for use with the pre-shared key, which may be shared with 606 many other consumers. The publisher may use either method 8.2.1 607 (deterministic IV) or 8.2.2 (RBG-based IV) [NIST 800-38D] for 608 creating the IV. 610 Each encrypted manifest node (root manifest or internal manifest) has 611 a full security context (KeyNum, IV, Mode). The AES-GCM decryption 612 is independent for each manifest so Manifest objects can be fetched 613 and decrypted in any order. This design also ensures that if a 614 manifest tree points to the same subtree repeatedly, such as for 615 deduplication, the decryptions are all idempotent. 617 The functions for authenticated encryption and authenticated 618 decryption are as given in Sections 7.1 and 7.2 of NIST 800-38D: GCM- 619 AE_K(IV, P, A) and GCM-AD_K(IV, C, A, T). 621 EncryptNode(SecurityCtx, Node, K, IV) -> GCM-AE_K(IV, P, A) -> (C, T) 622 Node: The wire format of the Node (P) 623 SecurityCtx: The wire format of the SecurityCtx as the Additional Authenticated Data (A) 624 K: the pre-shared key (128 or 256 bits) 625 IV: The initialization vector (usually 96 or 128 bits) 626 C: The cipher text 627 T: The authentication tag 629 Figure 4: Preshared Key Encrypt 631 The pair (C,T) is the OpaqueNode encoded as a TLV structure: 632 (OpaqueNode (CipherText C) (AuthTag T)). 634 DecryptNode(SecurityCtx, C, T, K, IV) -> GCM-AD_K (IV, C, A, T) -> (Node, FailFlag) 635 Node: The wire format of the decrypted Node 636 FailFlag: Indicates authenticated decryption failure (true or false) 638 Figure 5: Preshared Key Decrypt 640 If doing in-place decryption, the cipher text C will be enclosed in 641 an EncryptedNode TLV value. After decryption, change the TLV type to 642 Node. The length should be the same. After decryption the AuthTag 643 is no longer needed. The TLV type should be changed to T_PAD and the 644 value zeroed. The SecurityCtx could be changed to T_PAD and zeroed 645 or left as-is. 647 3.8.2. RSA Key Encapsulation 649 See also RFC 5990, NIST SP 800-56B Rev. 2 and 650 https://lists.w3.org/Archives/Public/public-xmlsec/2009May/att-0032/ 651 Key_Encapsulation.pdf 653 In this system, a key manager (KM) (which could be the publisher) 654 creates a symmetric Content Encryption Key (CEK) and a key wrapping 655 pair with a Key Encryption Key (KEK) and Key Decryption Key (KDK). 656 Each publisher and consumer has its own public/private key pair, and 657 the KM knows each publisher's and consumer's identity and its public 658 key (PK_x). 660 We do not describe the publisher-key manager protocol to request a 661 CEK. The publisher will obtain the (CEK, E_KEK(Z), KeyId, Locator), 662 where each element is: the content encryption key, the CEK precursor, 663 Z, encrypted with the KEK (an RSA operation), and the KeyId of the 664 corresponding KDK, and the Locator is the CCNx name prefix to fetch 665 the KDK (see below). The precursor Z is chosen randomly z < n-1, 666 where n is KEK's public modulus. Note that CEK = KDF(Z). Note that 667 the publisher does not see KEK or Z. 669 We use HKDF (RFC 5869) for the KDF. CEK = HKDF-Expand(HKDF- 670 Extract(0, Z), "CEK", KeyLen), where KenLen is usually 32 bytes (256 671 bits). 673 In the ABNF grammar, the RsaKemData includes a KeyId, IV, Mode, 674 WrappedKey, and LocatorPrefix. The KeyId is the ID (sha256) of the 675 KEK. The IV and Mode are as per preshared key, and describe how the 676 manifest is encrypted with AES-GCM. The WrappedKey is the AES key to 677 decrypt the manifest. The LocatorPrefix is used to construct an 678 Interest to fetch the KDK. 680 To fetch the KDK, a consumer with public key PK_c constructs an 681 Interest with name /LocatorPrefix/{KeyId}/{PK_c keyid} and a 682 KeyIdRestriction of the KM's KeyId (from the LocatorPrefix Link). It 683 should receive back a signed Content Object with the KDK wrapped for 684 the consumer, or a NAK from the KM. The payload of the ContentObject 685 will be RsaKemWrap(PK, KDK). The signed ContentObject must have a 686 KeyLocator to the KM's public key. The consumer will trust the KM's 687 public key because the publisher, whom the consumer trusts, relayed 688 that KeyId inside its own signed Manifest. 690 The signed Content Object should have an ExpiryTime, which may be 691 shorter than the Manifest's, but should not be substantially longer 692 than the Manifest's ExpiryTime. The KM may decide how to handle the 693 Recommended Cache Time, or if caching of the response is even 694 permissible. The KM may require on-line fetching of the response via 695 a CCNxKE encrypted transport tunnel. 697 RsaKemWrap(PK, K, KeyLen = 256): 698 choose z < n-1, where n is PK's public modulus 699 encrypt c = z^e mod n 700 prk = HKDF-Extract(0, Z) 701 kek = HKDF-Expand(prk, "RsaKemWrap", KeyLen) 702 WK = E_KEK(K) # [AES-WRAP, RFC 3394] 703 output (c, WK) 705 Figure 6: RSA KEM Wrap 707 A consumer must verify the signed content object's signature against 708 the Key Manager's public key. The consumer then unwraps the KDK from 709 the Content Object's payload using RsaKemUnwrap(). The KeyLen is 710 taken from the WrapMode parameter. 712 RsaKemUnwrap(SK, c, WK, KeyLen = 256): 713 Using the consumers private key SK, decrypt Z from c. 714 prk = HKDF-Extract(0, Z) 715 kek = HKDF-Expand(prk, "RsaKemWrap", KeyLen) 716 K = D_KEK(WK) # [AES-UNWRAP, RFC 33940] 717 output K 719 Figure 7: RSA KEM Unwrap 721 The consumer then unwraps the CEK precursor by using the KDK to 722 decrypt Z. It then derives CEK as above. Manifest encryption and 723 decryption proceed as with PresharedKey, but using the CEK. 725 3.8.3. RSA KEM-DEM 727 In this scheme a Key Manager (KM), who could be the publisher, 728 creates a Key Encryption Key (KEK) and Key Decryption Key (KDK) key 729 pair. The publisher obtains the KEK. The KM distributes the KDK to 730 each group member by encrypting it under each member's public key. 731 To encrypt data, the publisher generates a symmetric Content 732 Encryption Key (CEK), wraps it with the KEK, then encrypts the 733 manifest with the CEK. It places the wrapped CEK in the manifest. 735 The KM communicates the KEK to the publisher through an unspecified 736 means particular to the KM. 738 The KM distributes the KDK to each group member. It uses a name 739 /{km-prefix}/{publisher-prefix}/{KDK KeyId}/{member KeyId} to publish 740 the encrypted KDK under a member's public key. It uses RSA-OAEP for 741 the encryption. 743 The publisher creates a random symmetric CEK of an appropriate bit 744 length. It uses the KEK to wrap the CEK using RSA-OAEP. It places 745 the wrapped key in the manifest's RsaKemDemData along with the KeyId 746 set to the KDK's KeyId and the KeyLocator prefix /{km- 747 prefix}/{publisher-prefix}/. Each member appends the KDK KeyId and 748 their public key KeyId to the name to attemt to fetch the KDK. When 749 forming the Interest to fetch the key, a consumer should also use a 750 KeyIdRestriction of the KM's KeyId, which it can retrieve from the 751 KeyLocator. 753 3.9. Protocol Encodings 755 3.9.1. CCNx Encoding 757 In CCNx, all Manifest content objects use a PayloadType of 758 T_PYLDTYPE_MANIFEST, while all application data content objects use a 759 PayloadType of T_PYLDTYPE_DATA. 761 ManifestContentObject = TYPE LENGTH [Name] [ExpiryTime] PayloadType Payload 762 Name = TYPE LENGTH *OCTET ; As per RFC8569 763 ExpiryTime = TYPE LENGTH *OCTET ; As per RFC8569 764 PayloadType = TYPE LENGTH T_PYLDTYPE_MANIFEST ; Value TBD 765 Payload : TYPE LENGTH *OCTET ; the serialized Manifest object 767 Figure 8: CCNx Embedding Grammar 769 3.9.1.1. CCNx Hash Naming 771 The Hash Naming namespace uses CCNx nameless content objects. 773 It proceeds as follows: 775 o The Root Manifest content object has a name used to fetch the 776 manifest. It is signed by the publisher. It has a set of 777 Locators used to fetch the remainder of the manifest. It has a 778 single HashPointer that points to the Top Manifest. It may also 779 have cache control directives, such as ExpiryTime. 781 o The Root Manifest has an NsDef that specifies HashSchema. It's 782 GroupData uses that NsId. All internal and leaf manifests use the 783 same GroupData NsId. A Manifest Tree MAY omit the NsDef and NsId 784 elements and rely on the default being HashSchema. 786 o The Top Manifest is a nameless CCNx content object. It may have 787 cache control directies. 789 o Internal and Leaf manifests are nameless CCNx content objects, 790 possibly with cache control directives. 792 o The Data content objects are nameless CCNx content objects, 793 possibly with cache control directives. 795 o To form an Interest for a direct or indirect pointer, use a Name 796 from one of the Locators and put the pointer HashValue into the 797 ContentObjectHashRestriction. 799 3.9.1.2. CCNx Single Prefix 801 The Single Prefix schema uses the same name in all Content Objects 802 and distinguishes them via their ContentObjectHash. Note that in 803 CCNx, using a SinglePrefix name means that we do not use Locators. 805 It proceeds as follows: 807 o The Root Manifest content object has a name used to fetch the 808 manifest. It is signed by the publisher. It has a set of 809 Locators used to fetch the remainder of the manifest. It has a 810 single HashPointer that points to the Top Manifest. It may also 811 have cache control directives, such as ExpiryTime. 813 o The Root Manifest has an NsDef that specifies SinglePrefix and the 814 SinglePrefixSchema element specifies the SinglePrefixName. 816 o The Top Manifest has the name SinglePrefixName. It may have cache 817 control directies. It's GroupData elements must have an NsId that 818 references the NsDef. 820 o An Internal or Leaf manifest has the name SinglePrefixName, 821 possibly with cache control directives. It's GroupData elements 822 must have an NsId that references the NsDef. 824 o The Data content objects have the name SinglePrefixName, possibly 825 with cache control directives. 827 o To form an Interest for a direct or indirect pointer, use 828 SinglePrefixName as the Name and put the pointer HashValue into 829 the ContentObjectHashRestriction. 831 3.9.1.3. CCNx Segmented Prefix 833 The Segmented Prefix schema uses a different name in all Content 834 Objects and distinguishes them via their ContentObjectHash. Note 835 that in CCNx, using a SegmentedPrefixSchema means that we do not use 836 Locators. OPTIONAL: Use AnnotatedPointers to indicate the segment 837 number of each hash pointer to avoid needing to infer the segment 838 numbers. 840 It proceeds as follows: 842 o The Root Manifest content object has a name used to fetch the 843 manifest. It is signed by the publisher. It has a set of 844 Locators used to fetch the remainder of the manifest. It has a 845 single HashPointer that points to the Top Manifest. It may also 846 have cache control directives, such as ExpiryTime. 848 o The Root Manifest has an NsDef that specifies SegmentedPrefix and 849 the SegmentedPrefixSchema element specifies the 850 SegmentedPrefixName. 852 o The publisher will track the chunk number of each content object 853 within the NsId. Objects will be numbered in their traversal 854 order. Within each manifest, the name will be constructed from 855 the SegmentedPrefixName plus a Chunk name component. 857 o The Top Manifest has the name SegmentedPrefixName plus chunk 858 number. It may have cache control directies. It's GroupData 859 elements must have an NsId that references the NsDef. 861 o An Internal or Leaf manifest has the name SegmentedPrefixName plus 862 chunk number, possibly with cache control directives. It's 863 GroupData elements must have an NsId that references the NsDef. 865 o The Data content objects have the name SegmentedPrefixName plus 866 chunk number, possibly with cache control directives. 868 o To form an Interest for a direct or indirect pointer, use 869 SegmentedPrefixName plus chunk number as the Name and put the 870 pointer HashValue into the ContentObjectHashRestriction. A 871 consumer must track the chunk number in traversal order for each 872 SegmentedPrefixSchema NsId. 874 3.9.1.4. CCNx Hybrid Schema 876 A manifest may use multiple schemas. For example, the application 877 payload in data content objects might use SegmentedPrefix while the 878 manifest content objects might use HashNaming. 880 The Root Manifest should specify an NsDef with a first NsId (say 1) 881 as the HashNaming schema and a second NsDef with a second NsId (say 882 2) as the SegmentedPrefix schema along with the SegmentedPrefixName. 884 Each manifest (Top, Internal, Leaf) uses two or more HashGroups, 885 where eash HashGroup has only Direct (with the second NsId) or 886 Indirect (with the first NsId). The number of hash groups will 887 depend on how the publisher wishes to interleave direct and indirect 888 pointers. 890 Manifests and data objects are named as appropriate for their naming 891 schema. 893 3.9.2. NDN Encoding 895 In NDN, all Manifest Data objects use a ContentType of FLIC (1024), 896 while all application data content objects use a PayloadType of Blob. 898 3.9.2.1. NDN Hash Naming 900 In NDN Hash Naming, a Data Object has a 0-length name. This means 901 that an Interest will only have an ImplicitDigest name component in 902 it. This method relies on using NDN ForwardingHints. 904 It proceeds as follows: 906 o The Root Manifest Data has a name used to fetch the manifest. It 907 is signed by the publisher. It has a set of Locators used to 908 fetch the remainder of the manifest. It has a single HashPointer 909 that points to the Top Manifest. It may also have cache control 910 directives. 912 o The Root Manifest has an NsDef that specifies HashSchema. It's 913 GroupData uses that NsId. All internal and leaf manifests use the 914 same GroupData NsId. A Manifest Tree MAY omit the NsDef and NsId 915 elements and rely on the default being HashSchema. 917 o The Top Manifest has a 0-length Name. It may have cache control 918 directies. 920 o Internal and Leaf manifests has a 0-length Name, possibly with 921 cache control directives. 923 o The application Data use a 0-length name, possibly with cache 924 control directives. 926 o To form an Interest for a direct or indirect pointer, the name is 927 only the Implicit Digest name component derived from a pointer's 928 HashValue. The ForwardingHints come from the Locators. In NDN, 929 one may use one or more locators within a single Interest. 931 3.9.2.2. NDN Single Prefix 933 In Single Prefix, the Data name is a common prefix used between all 934 objects in that namespace, without a Segment or other counter. They 935 are distinguished via the Implicit Digest name component. The FLIC 936 Locators go in the ForwardingHints. 938 It proceeds as follows: 940 o The Root Manifest Data object has a name used to fetch the 941 manifest. It is signed by the publisher. It has a set of 942 Locators used to fetch the remainder of the manifest. It has a 943 single HashPointer that points to the Top Manifest. It may also 944 have cache control directives. 946 o The Root Manifest has an NsDef that specifies SinglePrefix and the 947 SinglePrefixSchema element specifies the SinglePrefixName. 949 o The Top Manifest has the name SinglePrefixName. It may have cache 950 control directies. It's GroupData elements must have an NsId that 951 references the NsDef. 953 o An Internal or Leaf manifest has the name SinglePrefixName, 954 possibly with cache control directives. It's GroupData elements 955 must have an NsId that references the NsDef. 957 o The Data content objects have the name SinglePrefixName, possibly 958 with cache control directives. 960 o To form an Interest for a direct or indirect pointer, use 961 SinglePrefixName as the Name and append the pointer's HashValue 962 into an ImplicitDigest name component. Set the ForwardingHints 963 from the FLIC locators. 965 3.9.2.3. NDN Segmented Prefix 967 In Segmented Prefix, the Data name is a common prefix plus a segment 968 number, so each manifest or application data object has a unique full 969 name before the implicit digest. This means the consumer must 970 maintain a counter for each SegmentedPrefix namespace. OPTIONAL: Use 971 AnnotatedPointers to indicate the segment number of each hash pointer 972 to avoid needing to infer the segment numbers. 974 It proceeds as follows: 976 o The Root Manifest Data object has a name used to fetch the 977 manifest. It is signed by the publisher. It has a set of 978 Locators used to fetch the remainder of the manifest. It has a 979 single HashPointer that points to the Top Manifest. It may also 980 have cache control directives. 982 o The Root Manifest has an NsDef that specifies SegmentedPrefix and 983 the SegmentedPrefixSchema element specifies the 984 SegmentedPrefixName. 986 o The publisher will track the segment number of each Data object 987 within a SegmentedPrefix NsId. Data will be numbered in their 988 traversal order. Within each manifest, the name will be 989 constructed from the SegmentedPrefixName plus a Segment name 990 component. 992 o The Top Manifest has the name SegmentedPrefixName plus segment 993 number. It may have cache control directies. It's GroupData 994 elements must have an NsId that references the NsDef. 996 o An Internal or Leaf manifest has the name SegmentedPrefixName plus 997 segment number, possibly with cache control directives. It's 998 GroupData elements must have an NsId that references the NsDef. 1000 o The Data content objects have the name SegmentedPrefixName plus 1001 chunk number, possibly with cache control directives. 1003 o To form an Interest for a direct or indirect pointer, use 1004 SegmentedPrefixName plus segment number as the Name and put the 1005 pointer HashValue into the ImplicitDigest name component. A 1006 consumer must track the segment number in traversal order for each 1007 SegmentedPrefixSchema NsId. 1009 3.9.2.4. NDN Hybrid Schema 1011 A manifest may use multiple schemas. For example, the application 1012 payload in data content objects might use SegmentedPrefix while the 1013 manifest content objects might use HashNaming. 1015 The Root Manifest should specify an NsDef with a first NsId (say 1) 1016 as the HashNaming schema and a second NsDef with a second NsId (say 1017 2) as the SegmentedPrefix schema along with the SegmentedPrefixName. 1019 Each manifest (Top, Internal, Leaf) uses two or more HashGroups, 1020 where eash HashGroup has only Direct (with the second NsId) or 1021 Indirect (with the first NsId). The number of hash groups will 1022 depend on how the publisher wishes to interleave direct and indirect 1023 pointers. 1025 Manifests and data objects are named as appropriate for their naming 1026 schema. 1028 3.10. Example Structures 1030 3.10.1. Leaf-only data 1032 Root 1033 | 1034 ______ M0 _____ 1035 / | \ 1036 M1 M2 M3 1037 / | \ / | \ / | \ 1038 D0 D1 D2 D3 D4 D5 D6 D7 D8 1040 Figure 9: Leaf-only manifest tree 1042 3.10.2. Linear 1044 Of special interest are "skewed trees" where a pointer to a manifest 1045 may only appear as last pointer of (sub-) manifests. Such a tree 1046 becomes a sequential list of manifests with a maximum of datapointers 1047 per manifest packet. Beside the tree shape we also show this data 1048 structure in form of packet content where D stands for a data pointer 1049 and M is the hash of a manifest packet. 1051 Root -> M0 ----> M1 ----> ... 1052 |->DDDD |->DDDD 1054 4. Usage Examples 1056 4.1. Locating FLIC leaf and manifest nodes 1058 The names of manifest and data objects are often missing or not 1059 unique, unless using specific naming conventions. In this example, 1060 we show how using manifest locators is used to generate Interests. 1061 Take for example the figure below where the root manifest is named by 1062 hash h0. It has nameless children with hashes with hashes h1 ... hN. 1064 Objects: 1065 manifest(name=/a/b/c, ptr=h1, ptr=hN) - has hash h0 1066 nameless(data1) - has hash h1 1067 ... 1068 nameless(dataN) - has hash hN 1070 Query for the manifest: 1071 interest(name=/a/b/c, implicitDigest=h0) 1073 Figure 10: Data Organization 1075 After obtaining the manifest, the client fetches the contents. In 1076 this first instance, the manifest does not provide any Locators data 1077 structure, so the client must continue using the name it used for the 1078 manifest. 1080 interest(name=/a/b/c, implicitDigest=h1) 1081 ... 1082 interest(name=/a/b/c, implicitDigest=hN) 1084 Figure 11: Data Interests 1086 Using the locator metadata entry, this behavior can be changed: 1088 Objects: 1089 manifest(name=/a/b/c, 1090 hashgroup(loc=/x/y/z, ptr=h1) 1091 hashgroup(ptr=h2) - has hash h0 1092 nameless(data1) - has hash h1 1093 nameless(data2) - has hash h2 1095 Queries: 1096 interest(name=/a/b/c, implicitDigest=h0) 1097 interest(name=/x/y/z, implicitDigest=h1) 1098 interest(name=/a/b/c, implicitDigest=h2) 1100 Figure 12: Using Locators 1102 4.2. Seeking 1104 Fast seeking (without having to sequentially fetch all content) works 1105 by skipping over entries for which we know their size. The following 1106 expression shows how to compute the byte offset of the data pointed 1107 at by pointer P_i, offset_i. In this formula, let P_i represent the 1108 Size value of the i-th pointer. 1110 offset_i = \sum_{i = 1}^{i - 1} P_i.size 1112 With this offset, seeking is done as follows: 1114 Input: seek_pos P, a FLIC manifest with a hash group having N entries 1115 Output: pointer index i and byte offset o, or out-of-range error 1116 Algo: 1117 offset = 0 1118 for i in 1..N do 1119 if (P < P_i.size) 1120 return (i, P - offset) 1121 offset += P_i.size 1122 return out-of-range 1124 Figure 13: Seeking Algorithm 1126 Seeking in a BlockHashGroup is different since offsets can be quickly 1127 computed. This is because the size of each pointer P_i except the 1128 last is equal to the SizePerPtr value. For a BlockHashGroup with N 1129 pointers, OverallByteCount D, and SizePerPointer L, the size of P_i 1130 is equal to the following: 1132 D - ((i - 1) * L) 1134 In a BlockHashGroup with k pointers, the size of P_k is equal to: 1136 D - L * (k - 1) 1138 Using these, the seeking algorithm can be thus simplified to the 1139 following: 1141 Input: seek_pos P, a FLIC manifest with a hash group having 1142 OverallByteCount S and SizePerPointer L. 1143 Output: pointer index i and byte offset o, or out-of-range error 1144 Algo: 1145 if (P > S) 1146 return out-of-range 1147 i = floor(P / L) 1148 if (i > N) 1149 return out-of-range # bad FLIC encoding 1150 o = P mod L 1151 return (i, o) 1153 Figure 14: Seeking Algorithm 1155 Note: In both cases, if the pointer at position i is a manifest 1156 pointer, this algorithm has to be called once more, seeking to 1157 seek_pos o inside that manifest. 1159 4.3. Block-level de-duplication 1161 Consider a huge file, e.g. an ISO image of a DVD or program in binary 1162 form, that had previously been FLIC-ed but now needs to be patched. 1163 In this case, all existing encoded ICN chunks can remain in the 1164 repository while only the chunks for the patch itself is added to a 1165 new manifest data structure, as is shown in the picture below. For 1166 example, the venti [1] archival file system of Plan9 uses this 1167 technique. 1169 old_mfst - - > h1 --> oldData1 <-- h1 < - - new_mfst 1170 \ - > h2 --> oldData2 <-- h2 < - - / 1171 \ replace3 <-- h5 < - -/ 1172 \- > h3 --> oldData3 / 1173 \ > h4 --> oldData4 <-- h4 < - / 1175 Figure 15: De-duplication 1177 4.4. Growing ICN collections 1179 A log file, for example, grows over time. Instead of having to re- 1180 FLIC the grown file it suffices to construct a new manifest with a 1181 manifest pointer to the old root manifest plus the sequence of data 1182 hash pointers for the new data (or additional sub-manifests if 1183 necessary). Note that this tree will not be skewed (anymore). 1185 old data < - - - mfst_old <-- h_old - - mfst_new 1186 / 1187 new data1 <-- h_1 - - - - - - - - -/ 1188 new data2 / 1189 ... / 1190 new dataN <-- h_N - - - - - - - -/ 1192 Figure 16: Growing A Collection 1194 4.5. Re-publishing a FLIC under a new name 1196 There are several usecases for republishing a collection under a new 1197 namespace, or having one collection exist under several namespaces: 1199 o It can happen that a publisher's namespace is part of a service 1200 provider's prefix. When switching provider, the publisher may 1201 want to republish the old data under a new name. 1203 o A publishes wishes to distribute its content to several caches and 1204 would like a local result to appear. For example, the publisher 1205 /alpha wishes to place content at /beta and /gamma, but routing to 1206 /alpha would not send a request to either of those sites. Each of 1207 /beta and /gamma could create a locally named and signed version 1208 of the root manifest with appropriate keys (or delegate that to 1209 /alpha) so the results are always local without having to change 1210 the bulk of the maniest tree. 1212 This can easily be achieved with a single nameless root manifest for 1213 the large FLIC plus arbitrarily many per-name manifests (which are 1214 signed by whomever wants to publish this data): 1216 data < - nameless_mfst() <-- h < - mfst(/com/example/east/the/flic) 1217 < - mfst(/com/example/west/old/the/flic) 1218 < - mfst(/internet/archive/flic234) 1220 Figure 17: Relocating A Collection 1222 Note that the hash computation (of h) only requires reading the 1223 nameless root manifest, not the entire FLIC. 1225 This example points out the problem of HashGroups having locator 1226 metadata elements: A retriever would be urged to follow these hints 1227 which are "hardcoded" deep inside the FLIC but might have become 1228 outdated. We therefore recommend to name FLIC manifests only at the 1229 highest level (where these names have no locator function). Child 1230 nodes in a FLIC manifest should not be named as these names serve no 1231 purpose except retrieving a sub-tree's manifest by name, if would be 1232 required. 1234 5. IANA Considerations 1236 TODO Need IANA actions: 1238 o Create a registry for Manifest Data and Annotation TLVs 1240 o Register the SizeAnnotation TLV 1242 Also TODO: If this document is submitted as an official RG draft, 1243 this section must be updated to reflect the IANA registries described 1244 in [RFC8609] 1246 6. Security Considerations 1248 TODO Need a discussion on: 1250 o signing and hash chaining security. 1252 o republishing under a new namespace. 1254 o encryption mechanisms. 1256 o encryption key distribution mechanisms. 1258 7. Appendix A: Building Trees 1260 This section describes one method to build trees. It constructs a 1261 pre-order tree in a single pass of the application data, going from 1262 the tail to the beginning. This allows us to work up the right side 1263 of the tree in a single pass, then work down each left branch until 1264 we exhaust the data. Using the reverse-order traversal, we create 1265 the right-most-child manifest, then its parent, then the indirect 1266 pointers of that parent, then the parent's direct pointers, then the 1267 parent of the parent (repeating). This process uses recursion, as it 1268 is the clearest way to show the code. A more optimized approach 1269 could do it in a true single pass. 1271 Because we're building from the bottom up, we use the term 'level' to 1272 be the distance from the right-most child up. Level 0 is the bottom- 1273 most level of the tree, such as where node 7 is: 1275 1 1276 2 3 1277 4 5 6 7 1278 preorder: 1 2 4 5 3 6 7 1279 reverse: 7 6 3 5 4 2 1 1281 The Python-like pseudocode build_tree(data, n, k, m) algorithm 1282 creates a tree of n data objects. The data[] array is an array of 1283 Content Objects that hold application payload; the application data 1284 has already been packetized into n Content Object packets. An 1285 interior manifest node has k direct pointers and m indirect pointers. 1287 build_tree(data[0..n-1], n, k, m) 1288 # data is an array of Content Objects (Data in NDN) with application payload. 1289 # n is the number of data items 1290 # k is the number of direct pointers per internal node 1291 # m is the number of indirect pointers per internal node 1293 segment = namedtuple('Segment', 'head tail')(0, n) 1294 level = 0 1296 # This bootstraps the process by creating the right most child manifest 1297 # A leaf manifest has no indirect pointers, so k+m are direct pointers 1298 root = leaf_manifest(data, segment, k + m) 1300 # Keep building subtrees until we're out of direct pointers 1301 while not segment.empty(): 1302 level += 1 1303 root = bottom_up_preorder(data, segment, level, k, m, root) 1305 return root 1307 bottom_up_preorder(data, segment, level, k, m, right_most_child=None) 1308 manifest = None 1309 if level == 0: 1310 assert right_most_child is None 1311 # build a leaf manifest with only direct pointers 1312 manifest = leaf_manifest(data, segment, k + m) 1313 else: 1314 # If the number of remaining direct pointers will fit in a leaf node, make one of those. 1315 # Otherwise, we need to be an interior node 1316 if right_most_child is None and segment.length() <= k + m: 1317 manifest = leaf_manifest(data, segment, k+m) 1318 else: 1319 manifest = interior_manifest(data, segment, level, k, m, right_most_child) 1320 return manifest 1322 leaf_manifest(data, segment, count) 1323 # At most count items, but never go before the head 1324 start = max(segment.head(), segment.tail() - count) 1325 manifest = Manifest(data[start:segment.tail]) 1326 segment.tail -= segment.tail() - start 1327 return manifest 1329 interior_manifest(data, segment, level, k, m, right_most_child) 1330 children = [] 1331 if right_most_child is not None: 1332 children.append(right_most_child) 1334 interior_indirect(data, segment, level, k, m, children) 1335 interior_direct(data, segment, level, k, m, children) 1337 manifest = Manifest(children) 1338 return manifest, tail 1340 interior_indirect(data, segment, level, k, m, children) 1341 # Reserve space at the head of the segment for this node's direct pointers before 1342 # descending to children. We want the top of the tree packed. 1343 reserve_count = min(m, segment.tail - segment.head) 1344 segment.head += reserve_count 1346 while len(children) < m and not segment.head == segment.tail: 1347 child = bottom_up_preorder(data, segment, level - 1, k, m) 1348 # prepend 1349 children.insert(0, child) 1351 # Pull back our reservation and put those pointers in our direct children 1352 segment.head -= reserve_count 1354 interior_direct(data, segment, level, k, m, children) 1355 while len(children) < k+m and not segment.head == segment.tail: 1356 pointer = data[segment.tail() - 1] 1357 children.insert(0, pointer) 1358 segment.tail -= 1 1360 8. References 1362 8.1. Normative References 1364 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1365 Requirement Levels", BCP 14, RFC 2119, 1366 DOI 10.17487/RFC2119, March 1997, 1367 . 1369 8.2. Informative References 1371 [FLICImplementation] 1372 Mosko, M., "FLIC Implementation in Python", various, 1373 . 1375 [I-D.irtf-icnrg-terminology] 1376 Wissingh, B., Wood, C., Afanasyev, A., Zhang, L., Oran, 1377 D., and C. Tschudin, "Information-Centric Networking 1378 (ICN): CCNx and NDN Terminology", draft-irtf-icnrg- 1379 terminology-07 (work in progress), November 2019. 1381 [NDN] "Named Data Networking", various, 1382 . 1384 [NDNTLV] "NDN Packet Format Specification.", 2016, 1385 . 1387 [repository] 1388 "Repo Protocol Specification", Various, 1389 . 1392 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 1393 Text on Security Considerations", BCP 72, RFC 3552, 1394 DOI 10.17487/RFC3552, July 2003, 1395 . 1397 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1398 IANA Considerations Section in RFCs", RFC 5226, 1399 DOI 10.17487/RFC5226, May 2008, 1400 . 1402 [RFC8569] Mosko, M., Solis, I., and C. Wood, "Content-Centric 1403 Networking (CCNx) Semantics", RFC 8569, 1404 DOI 10.17487/RFC8569, July 2019, 1405 . 1407 [RFC8609] Mosko, M., Solis, I., and C. Wood, "Content-Centric 1408 Networking (CCNx) Messages in TLV Format", RFC 8609, 1409 DOI 10.17487/RFC8609, July 2019, 1410 . 1412 Authors' Addresses 1413 Christian Tschudin 1414 University of Basel 1416 Email: christian.tschudin@unibas.ch 1418 Christopher A. Wood 1419 University of California Irvine 1421 Email: woodc1@uci.edu 1423 Marc Mosko 1424 PARC, Inc. 1426 Email: marc.mosko@parc.com 1428 David Oran (editor) 1429 Network Systems Research & Design 1431 Email: daveoran@orandom.net