idnits 2.17.1 draft-jin-cdni-content-deduplication-optimization-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 485 has weird spacing: '... Suffix chec...' == Line 486 has weird spacing: '...egistry for ...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: It is well known that CDNs have their own content naming mechanisms, most of which are independent and separated from one another due to the use of different algorithms such as Hash algorithms. It implies that for the same content distributed by two CDNs, the corresponding content identifiers are likely to be quite different. [I-D.ietf-cdni-requirements] treats the information regarding CDN content naming as intra-CDN information and the CDNI solution MUST not require intra-CDN information to be exposed to other CDNs for effective and efficient delivery of the content. Therefore, establishing a uniform content naming mechanism is urgently needed for CDNi network. This mechanism which can be implemented by CDNI Metadata Distribution Protocol may have the following properties. -- The document date (March 28, 2013) is 4009 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC 6707' is mentioned on line 134, but not defined == Missing Reference: 'RFC 6770' is mentioned on line 136, but not defined == Missing Reference: 'I-D.ietf-cdni-framework-01' is mentioned on line 427, but not defined == Unused Reference: 'EIDR WHITEPAPER' is defined on line 809, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-cdni-framework' is defined on line 813, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-cdni-requirements' is defined on line 817, but no explicit reference was found in the text == Unused Reference: 'I-D.murray-cdni-triggers' is defined on line 821, but no explicit reference was found in the text == Unused Reference: 'I-D.narten-iana-considerations-rfc2434bis' is defined on line 825, but no explicit reference was found in the text == Unused Reference: 'RFC2629' is defined on line 831, but no explicit reference was found in the text == Unused Reference: 'RFC3552' is defined on line 834, but no explicit reference was found in the text == Unused Reference: 'RFC6707' is defined on line 838, but no explicit reference was found in the text -- No information found for draft-murray-cdni-triggers - is the name correct? -- Obsolete informational reference (is this intentional?): RFC 2629 (Obsoleted by RFC 7749) Summary: 0 errors (**), 0 flaws (~~), 15 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 CDNI W. Jin 3 Internet-Draft M. Li 4 Intended status: Informational B. Khasnabish 5 Expires: September 29, 2013 ZTE Corporation 6 March 28, 2013 8 Content De-duplication for CDNi Optimization 9 draft-jin-cdni-content-deduplication-optimization-04 11 Abstract 13 Recent explosive growth of content delivery/distribution networks 14 (CDNs) and their interconnection are causing unintended repetition of 15 content storage in the same dCDN. This can be avoided by using a 16 suitable de-duplication mechanism. This document explores the 17 scenarios which create the problem, and then discusses the approaches 18 to eliminate the duplicated transmission of the same content from 19 uCDN(s) to dCDN in CDNi networks. To implement the optimization, 20 some enhancement to the CDNi metadata model and interface is 21 required. 23 We realize that for business-specific purposes the same content may 24 be encrypted/packaged with different keys for different providers. 25 The impact of DRM (Digital Rights Management) technology on de- 26 duplication will be discussed in a future version of this draft. 28 Status of this Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on September 29, 2013. 45 Copyright Notice 47 Copyright (c) 2013 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 63 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 64 2. Deployment Scenarios . . . . . . . . . . . . . . . . . . . . . 4 65 2.1. Impact of Content Duplication on the Network System . . . 4 66 2.2. Current Data Deduplication Technologies . . . . . . . . . 5 67 2.3. Content Duplication Scenario Involved in this Draft . . . 5 68 2.3.1. Scenario 1 . . . . . . . . . . . . . . . . . . . . . . 5 69 2.3.2. Scenario 2 . . . . . . . . . . . . . . . . . . . . . . 6 70 2.3.3. Scenario 3 . . . . . . . . . . . . . . . . . . . . . . 7 71 3. Content Naming for CDNi . . . . . . . . . . . . . . . . . . . 8 72 3.1. Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . 8 73 3.2. Ownership . . . . . . . . . . . . . . . . . . . . . . . . 9 74 4. CDNi Content De-duplication Optimization Implementation . . . 9 75 4.1. Constant URL . . . . . . . . . . . . . . . . . . . . . . . 9 76 4.2. Content Naming Mechanism . . . . . . . . . . . . . . . . . 10 77 4.3. Content ID . . . . . . . . . . . . . . . . . . . . . . . . 11 78 4.4. Description of Content De-duplication . . . . . . . . . . 12 79 4.4.1. Pre-Positioned Content Acquisition . . . . . . . . . . 13 80 4.4.2. Dynamic Content Acquisition . . . . . . . . . . . . . 14 81 4.4.3. Content Purge and Invalidate . . . . . . . . . . . . . 16 82 5. Security Considerations . . . . . . . . . . . . . . . . . . . 18 83 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 84 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 18 85 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 86 8.1. Normative References . . . . . . . . . . . . . . . . . . . 18 87 8.2. Informative References . . . . . . . . . . . . . . . . . . 18 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 90 1. Introduction 92 In some CDNi deployment, the dCDN occasionally caches the same 93 content copy multiple times from the same Content Service Provider 94 (CSP). For example, the CSP may have the agreement with two 95 authoritative CDNs, and both could be the upstream CDNs of the same 96 dCDN. Cascading of CDNs may result in a similar scenario as well. 97 The top-layer uCDN establishes connections with two intermediate- 98 layer uCDNs respectively, and both connect to the same bottom-layer 99 dCDN. In such scenarios, the dCDN may receive the content via 'push' 100 from one of the uCDN via pre-position procedure, and then may also 101 request for downloading the same content from another uCDN via 102 another pre-position procedure or upon user's content request. 104 Copies of the same content may be transfered to one dCDN, which 105 results in waste of the dCDN's memory or storage, and transmission 106 bandwidth that is used to deliver the copies repeatedly. Therefore, 107 it is necessary to avoid delivering the same content from different 108 uCDNs to dCDN repeatedly. In this draft, we list a set of scenarios 109 which may cause repeated delivery of the same content. A feasible 110 solution for content de-duplication is then discussed. 112 In order to address the content repetition problem, several issues 113 need to be considered. 115 * How to detect content repetition by dCDN. 117 * How to avoid content repetition, when one or more uCDNs select(s) 118 one dCDN to deliver the same content to multiple User Agents. 120 This document provides detailed analysis on the issues of content 121 repetition. We realize that there is a need to develop an optimized 122 mechanism to de-duplicate the content in CDNi network. In order to 123 implement such optimization, enhancement to CDNi metadata model and 124 interface may be required. 126 1.1. Terminology 128 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 129 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 130 document are to be interpreted as described in RFC 2119 [RFC2119]. 132 This document reuses the terminology defined in: 134 [RFC 6707], 136 [RFC 6770], 138 [draft-ietf-cdni-framework], and 140 [draft-ietf-cdni-requirements]. 142 Resource Id: a metadata object (e.g., partial or whole URL, or other 143 format) which is generated by uCDN and identifies the storage of the 144 content in the uCDN. 146 Content Id: a metadata object (e.g. a URN) which uniquely identifies 147 the content in the scope of CDNi. 149 2. Deployment Scenarios 151 This section illustrates several CDNi deployment scenarios that 152 typically lead to duplicated content in the same server. 154 2.1. Impact of Content Duplication on the Network System 156 Along with the explosive growth of digital information, the storage 157 space required by data is increasing as well. In the past decade, 158 the capacity of storage system provided by many industries has 159 developed from dozens of gigabytes to hundreds of terabytes or even 160 more. With the exponential growth of data, enterprises are facing 161 with much more frequent data backup and recovery. The cost to manage 162 and store data, as well as the space and consumption of data center, 163 is also growing into a more and more serious situation. Survey 164 exposes that up to 60% of the data stored in the application system 165 is redundant and the proportion continues to grow along with the time 166 moving forward. This also leads to extra load of the network in 167 transmitting repeat copies of the same content. 169 As for CDN, the duplication of the content delivered will: 171 1. increase the complexity of CDN management. An increasing number 172 of content tables need to be maintained, which will reduce the 173 efficiency of content search; 175 2. demand larger CDN storage capacity, which would be a waste of 176 facility and investment; 178 3. lead to inaccurate statistics of hot content, which consequently 179 results in the inaccuracy of the arithmetic for the hot content 180 dispatching in the CDN; 182 4. increase the latency for the CDN content access. Such cases as 183 local content could not be hit and the same content has to be 184 acquired from other CDNs would occur more frequently. 186 2.2. Current Data Deduplication Technologies 188 At present, data deduplication technologies are widely used in the 189 storage backup and archiving system, in which the deduplication 190 module is responsible for comparing and analyzing the data content, 191 finding out redundant data and sending corresponding metadata back to 192 the storage service interface. Finally, non-duplicated data will be 193 stored into the storage medium. Main technologies can be divided 194 into: 196 1. Identical Data Detection: identical data includes identical files 197 and identical data block. Whole File Detection (WFD) technology uses 198 Hash for data mining. Fixed-sized partition (FSP) detection 199 technology, content-defined chunking (CDC) detection technology and 200 sliding block technology are used for duplicated data lookup and 201 deletion; 203 2. Resembling Data Detection: according to the resemblance 204 characteristics of the data itself, shingle technology, bloom filter 205 technology and mode match technology are used to find out the 206 duplicated data that can not be detected by Identical Data Detection 207 technologies. 209 Above technologies are applied under the prerequisite that the 210 content is completed downloaded and stored. However, for CDN, 211 comparison of the content after downloading is a waste of 212 transmission traffic and computation. In addition, with a widespread 213 deployment of CDNi system, content duplication issue will not result 214 from above reasons. Instead, it results from the redirection 215 procedures used in CDNi during which URLs pointed to the same content 216 is changed. In this case, dCDN would download the same content 217 several times due to different URLs that in fact point to the same 218 content. Therefore, in CDNi, current data deduplication technologies 219 can not be used to address the issue and scenarios proposed in this 220 draft. 222 2.3. Content Duplication Scenario Involved in this Draft 224 In this document, the content duplication results from the 225 redirection procedures used in CDNi during which URLs pointed to the 226 same content is changed. In this case, dCDN would download the same 227 content several times due to different URLs that in fact point to the 228 same content. 230 2.3.1. Scenario 1 232 As depicted in Figure 1, both CDN-A and CDN-B establish 233 interconnections with CDN-C that acts as a dCDN. Thus, CDN-C will 234 cache the content for CDN-A and CDN-B. When both CDN provider A and 235 CDN provider B have agreements with the same CSP for content 236 delivery, CDN-C may be required by CDN-A and CDN-B separately to 237 retrieve and cache the same content from the CSP. CDN-C is likely to 238 suffer from content repetition problems. 240 As the location of the content in a CDN is normally assigned by CDN 241 itself, the URLs of the same content are likely to be different 242 between CDNs. So it is not enough to determine whether the content 243 to be retrieved and cached is the same only by the URLs of the 244 content. 246 +-------+ 247 | CSP | 248 +-------+ 249 / \ 250 ,--,--,--./ \,--,--,--. 251 ,-' `-. ,-' `-. 252 ( CDN Provider A ) ( CDN Provider B ) 253 `-. (CDN-A) ,-' `-. (CDN-B) ,-' 254 `--'--'--' `--'--'--' 255 \\ // 256 \\,--,--,--.// 257 ,-' `-. 258 ( CDN Provider C ) 259 `-. (CDN-C) ,-' 260 `--'--'--' 261 | 262 +------------+ 263 | User Agent | 264 +------------+ 265 === CDN Interconnect 267 Figure 1 Interconnected CDNs with the same CSP and with one dCDN 269 2.3.2. Scenario 2 271 Now we consider the case of cascaded CDNs, as depicted in Figure 2. 272 Note that the top-layer Upstream CDN-A has direct contract with CSP 273 and interconnects with two middle-layer CDNs (CDN-B and CDN-C) that 274 have the same bottom-layer Downstream CDN (CDN-D) interconnected to 275 them. Consequently, there are two possible delivery paths for CDN-D 276 to cache the content of CSP. One is CDN-A -> CDN-B -> CDN-D, and the 277 other is CDN-A -> CDN-C -> CDN-D. CDN-D may cache the same content 278 by upstream CDNs(CDN-B and CDN-C) on different paths. If the URL of 279 the content is changed by CDN-B or CDN-C, CDN-D is not able to be 280 aware of the content to be cached and therefore this may lead to 281 duplicated storage of the same content. 283 ,--,--,--. 284 ,-' `-. +-------+ 285 ( CDN Provider A )----| CSP | 286 `-. (CDN-A) ,-' +-------+ 287 //-'--'-\\ 288 ,--,--,--.// \\,--,--,--. 289 ,-' `-. ,-' `-. 290 ( CDN Provider B ) ( CDN Provider C ) 291 `-. (CDN-B) ,-' `-. (CDN-C) ,-' 292 `--'--'--' `--'--'--' 293 \\ // 294 \\,--,--,--.// 295 ,-' `-. 296 ( CDN Provider D ) 297 `-. (CDN-D) ,-' 298 `--'--'--' 299 | 300 +------------+ 301 | User Agent | 302 +------------+ 303 === CDN Interconnect 305 Figure 2 Cascaded CDNs 307 2.3.3. Scenario 3 309 As depicted in Figure 3, both two interconnected CDNs - CDN-A(uCDN) 310 and CDN-B(dCDN) - have contracts with CSP. CDN-B plays two roles at 311 the same time: downstream CDN of CDN-A and Authoritative CDN of CSP. 312 When an end-user of CDN-A served by CDN-B initiates content request 313 from the CSP, CDN-A decides that CDN-B should be the serving CDN. 314 Then CDN-A redirects the request to CDN-B. If CDN B does not have a 315 local copy of the requested content (cache miss), CDN B ingests the 316 content from CDN A. When CSP pushes the same content to CDN-B and if 317 CDN-B cannot identify the duplication, this same content will be 318 repeatedly retrieved and cached. 320 +-------+ 321 | CSP | 322 +-------+ 323 / \ 324 ,--,--,--./ \,--,--,--. 325 ,-' `-. ,-' `-. 326 ( CDN Provider A )=====( CDN Provider B ) 327 `-. (CDN-A) ,-' `-. (CDN-B) ,-' 328 `--'--'--' `--'--'--' 329 | 330 +------------+ 331 | User Agent | 332 +------------+ 333 === CDN Interconnect 335 Figure 3 Interconnected CDNs with the same CSP 337 3. Content Naming for CDNi 339 It is well known that CDNs have their own content naming mechanisms, 340 most of which are independent and separated from one another due to 341 the use of different algorithms such as Hash algorithms. It implies 342 that for the same content distributed by two CDNs, the corresponding 343 content identifiers are likely to be quite different. [I-D.ietf- 344 cdni-requirements] treats the information regarding CDN content 345 naming as intra-CDN information and the CDNI solution MUST not 346 require intra-CDN information to be exposed to other CDNs for 347 effective and efficient delivery of the content. Therefore, 348 establishing a uniform content naming mechanism is urgently needed 349 for CDNi network. This mechanism which can be implemented by CDNI 350 Metadata Distribution Protocol may have the following properties. 352 3.1. Uniqueness 354 CDNi content naming mechanism must guarantee the uniqueness of the 355 content identification. Although URL is widely used for identifying 356 network resource, it is not quite suitable for content identification 357 in CDNi network where content de-duplication needs to be taken into 358 consideration. Although the method of URL match is commonly used by 359 many cache systems to detect the repetitive files with same name for 360 avoiding content repetition, it will probably fail for CDNi case. 361 This is due to the fact that different forwarding mechanisms are used 362 in different CDNs involved. The user-originated requests are always 363 snooped by the DPI (deep packet inspection) devices before 364 transmitted to the original server, whereas the requests received by 365 dCDN are always redirected by one or more uCDNs. Since there is no 366 guarantee that the URLs will not be changed through the redirection 367 process, a new type of object needs to be defined to represent the 368 uniqueness of content identifier. 370 For the CSP's the contents that are distributed into different 371 interconnected CDNs, the related metadata objects may be somewhat 372 different in many cases. For example, in Figure 2, when both CDN-A 373 and CDN-B delegate the delivery of the same CSP's content, the 374 content metadata such as (a) content description, (b) access policy, 375 and (c) security policy may be not be exactly the same in both cases. 376 Such metadata information is not suitable for content identification. 377 Consequently, we need to define a new type of metadata object that 378 helps uniquely identify the same content. 380 3.2. Ownership 382 CDNi content naming mechanism should embody the ownership of content 383 identification. Typically, a CDN provider may have contracts with 384 many CSPs for delivering their contents, as well as operate its own 385 Content delivery network. However, it may happen that a lot of 386 contents published by these CSPs may be very similar, and many of 387 them may even be exactly the same. Therefore, the problem is whether 388 these identical copies are from the same CSP or how the 389 interconnected CDNs can verify that these contents are identical. 390 (Note: Such copies are pointed by the same content identification 391 only if they are from the same content source.) For a traditional 392 (non-interconnected) CDN, there is no problem to distinguish them via 393 its intra content naming mechanism. When a CDN interconnects with 394 other CDNs, the condition becomes more complicated due to the lack of 395 awareness of CSP's content when the CDN acts as a dCDN. 397 4. CDNi Content De-duplication Optimization Implementation 399 4.1. Constant URL 401 In general, URL-based mechanism can be used to implement content de- 402 duplication in CDNi network. As referred in section 2, the URL 403 description for a content may be different from uCDN to uCDN and is 404 likely to be changed in the redirection process. An agreement to 405 configure a specific URL between a pair of interconnected CDN is used 406 in the draft [I-D.ietf-cdni-framework-01], however this method is not 407 flexible enough for supporting de-duplication in complex CDNi 408 network. So a feasible proposal is to specify a mechanism for CDNi 409 network to guarantee that CSP's same contents cached in different 410 uCDNs are identified by the same URL and that the URL is unchanged or 411 at least the path part remains the same in the redirection process. 412 The main problem of this mechanism is lack of resilience and we 413 prefer an alternative mechanism as introduced in section 4.2 below. 415 4.2. Content Naming Mechanism 417 This section provides a detailed description of CDNi content naming 418 mechanism using CDNi Metadata Protocol. 420 In general, CSP's content as well as its copies cached in 421 interconnected CDNs are delivered to numerous End-Users. The draft 422 [I-D.ietf-cdni-framework-01] assigns each copy an identifier in the 423 form of an URL that is embedded with "CDN-Domain" which is used to 424 distinguish whether a download request is from an end-user or from an 425 uCDN. We use the term Resource Identifier to represent the storage 426 pointing to the content copies in interconnected CDNs. However, 427 unlike the usage in the draft [I-D.ietf-cdni-framework-01], in this 428 document CDNi content naming mechanism specifies Resource Identifiers 429 as the one which is only related to contents in uCDNs. Taking the 430 example in section 2.3, we use Resource Identifier A to point to 431 content originated through uCDN A. 433 Although the Resource Identifier is able to identify a content, the 434 uniqueness of content identification can't be guaranteed, as Resource 435 Identifier is used to identify the storage of the content in the uCDN 436 and therefore will be changed during redirection processes between 437 different uCDNs. In order to resolve it, we introduce the term 438 Content Identifier that is assigned to associate with Resource 439 Identifier to uniquely identify the content and is similar to the URN 440 usage. Note that the Content Identifier MUST be globally unique. 441 Figure 4 shows a metadata model that can be used for maintaining the 442 relationship between the two types of Identifiers. Using this model, 443 the dCDN is able to uniquely identify and route requests towards the 444 same targeted content. 446 +----------------------+ 447 | Content Identifier | 448 +----------------------+ 449 / | \ 450 / | \ 451 +--------------+ +--------------+ +--------------+ 452 | Resouce | | Resouce | | Resouce | 453 | Indentifier 1| | Indentifier 2| ... | Indentifier n| 454 +--------------+ +--------------+ +--------------+ 456 Figure 4 Metadata Model for Maintaining Relationship among Multi-IDs 458 Note: We need to develop an authoritative 'Entity' for creating and 459 maintaining the Content Identifier. 461 4.3. Content ID 463 Actually, EIDR (Entertainment Identifier Registry), an industry non- 464 profit organization, has already started the research and 465 standardization of the globally unique content identifier and its 466 generating mechanism. As stated in the whitepaper of [UNIVERSAL 467 UNIQUE IDENTIFIERS IN MOVIE AND TELEVISION SUPPLY CHAIN MANAGEMENT], 468 EIDR offers an inexpensive mechanism for uniquely identifying the 469 complete range of audiovisual assets relevant to commerce including 470 micro-assets such as clips and newer types of objects such as 471 encodings. The EIDR data model can be readily extended to cover new 472 and emerging objects and relationships as the industry evolves over 473 time. EIDR naming system meets the requirements of coverage, 474 flexibility, extensibility, scalability, cost-effectiveness, 475 interoperability, etc. 477 The EIDR registry assigns a unique universal identifier for all 478 registered assets. EIDR is an opaque ID with all information about 479 the registered asset stored in the central registry. Its structure 480 consists of a standard registry prefix, the unique suffix for each 481 asset and a check digit. The suffix of an asset ID is of the form 482 XXXX-XXXX-XXXX-XXXX-XXXX-C, where X is a hexadecimal digit and C is 483 the ISO 7064 Mod 37, 36 check character. 485 Standard Prefix Unique Suffix check 486 for EIDR Registry for each asset digit 487 | | | ||| 488 10.5240/XXXX-XXXX-XXXX-XXXX-XXXX-C 490 Figure 5 Structure of Content ID 492 The content provider submits content items for registration to the 493 registry system, along with core metadata and information. The 494 system uses a sophisticated system to insure that the object 495 submitted to the registry has not already been registered and then 496 generates an EIDR for the content. All the above need to be done 497 done before the content is injected into CDNi system. After that, 498 the content identifier, used as metadata of the content, is injected 499 into CDNi system and transmitted between uCDN and dCDN via such 500 interface as Control Interface, Metadata Interface. 502 The detail information of EIDR can be referred to in the whitepaper 503 of [UNIVERSAL UNIQUE IDENTIFIERS IN MOVIE AND TELEVISION SUPPLY CHAIN 504 MANAGEMENT]. 506 +------------------+ 507 .|Web User Interface| -,, 508 ,' +------------------+ `'., 509 ,-` | `'., 510 ,' +------------------+ ``-,, 511 .` .| Web Services API |-.,_ `'., 512 +---`-----+ .` +------------------+ `'-.,, +-----+ 513 | .` | `''. | | 514 +----------+.`| +------------------+ ,.-., `-+------+ | 515 | .`--+ | EIDR Registry | | | | |-+ 516 +----------+-` | | |----|`'-'`| +------+ | 517 |Registrant|---+ +------------------+ | | |Lookup |+ 518 | | | -,,__,. | Users | 519 +----------+ | EIDR Storage +-------+ 520 +------------------+ 521 | Deduplication | 522 +------------------+ 524 Figure 6 Entertainment Identifier Registry diagram 526 From the analysis of above section, the content identifier specified 527 in EIDR is in line with the requirements of content deduplication 528 proposed in this draft. In this document, it is suggested that we 529 introduce content identifier format and a mechanism for generating it 530 into CDNi to address content duplication issue. 532 4.4. Description of Content De-duplication 534 In this section the details of the solutions that use CDNi Content 535 Naming mechanism for content de-duplication are discussed. 537 Content Identifier can be defined as a metadata object. The metadata 538 object can be distributed with/without the actual content item from 539 uCDN to dCDN for storage in the dCDN, if the uCDN wants to pre- 540 position the content item to the dCDN before the actual user request. 541 The content Identifier metadata object binds with the Resource 542 Identifer to identify the storage of the content in the uCDN and 543 forms a content identification model for the content item. By using 544 this content identification model, an interconnected CDN is able to 545 detect content repetition. The content status must be synchronously 546 updated by the interconnected CDN. According to content status, the 547 interconnected CDN can determine whether the resource copy is cached 548 or not. 550 We present several procedures for optimized implementation for 551 preventing CDNi content repetition. 553 4.4.1. Pre-Positioned Content Acquisition 555 The following flow illustrates how the two uCDNs successively pre- 556 position the same content in the dCDN. In this flow, the content to 557 be pre-positioned in the dCDN is identified by different Resource 558 Identifiers corresponding to uCDN A and uCDN B. 560 +--------+ +--------+ +--------+ 561 | dCDN | | uCDN A | | uCDN B | 562 +--------+ +--------+ +--------+ 563 | Pre-position Request | | 564 |<-------------------------| | 565 +--------------+ | | 566 | content | | | 567 | repetition | | |(1) 568 | check | | | 569 +--------------+ | | 570 | OK | | 571 |------------------------->| | 572 | | | 573 | Acquisition Request | | 574 |------------------------->| | 575 | | | 576 | Metadata/Content Data |(2) 577 |<-------------------------| | 578 +--------------+ | | 579 | Update cont- | | | 580 | ent status | | | 581 +--------------+ | | 582 | Pre-position Request | 583 |<-----------------------------------------------------| 584 +--------------+ | | 585 | content | | | 586 | repetition | | |(3) 587 | check | | | 588 +--------------+ | | 589 | OK | 590 |----------------------------------------------------->| 591 | | | 593 Figure 7 Acquisition of Pre-Positioned Content 595 The steps that are illustrated in the figure are as follows: 597 1. The uCDN A requests that the dCDN pre-positions a particular 598 content item identified by its Resource Identifier and Content 599 Identifier. This message is sent via Trigger Interface. On 600 reception of this message, the dCDN checks whether the same content 601 item is cached by looking up its stored content identification model 602 with the Content Identifier.If the metadata does not exist, the dCDN 603 replies to uCDN A with an OK message to notify that no such copy has 604 been cached and content pre-position is required. 606 2. The dCDN acquires metadata of the content or the content itself 607 from uCDN A. Once the content is pre-positioned, dCDN updates the 608 content status and maintains the binding between Resource Identifier 609 and Content Identifier metadata. 611 3. The uCDN B requests that dCDN pre-positions the same content item 612 identified by its Resource Identifier and Content Identifier. On 613 reception of this message, the dCDN looks up its stored content 614 identification model with the Content Identifier. As such metadata 615 exists, dCDN determines that the content is already cached. Then, 616 dCDN replies to uCDN A with an OK message to notify that the same 617 copy has already been cached and the pre-position request should be 618 cancelled. The dCDN locally binds the new Resource Identifier 619 provided by uCDN B with the Content Identifier. 621 4.4.2. Dynamic Content Acquisition 623 The following flows illustrates how the dCDN performs content de- 624 duplication in cases of a cache miss and a cache hit without content 625 pre-positioning. 627 +----------+ +------+ +------+ 628 | end-user | | dCDN | | uCDN | 629 +----------+ +------+ +------+ 630 | Content Request | | 631 |------------------------------------------------------->| 632 | Content Redirection | | 633 |<-------------------------------------------------------|(1) 634 | Content Request | | 635 |-------------------------->| | 636 | | Content id Acquisition | 637 | |<-------------------------->| 638 | +--------------+ | 639 | | content | | 640 | | repetition | |(2) 641 | | check | | 642 | +--------------+ | 643 | | Acquisition Request | 644 | |--------------------------->| 645 | | Content Data | 646 | |<---------------------------| 647 | +--------------+ |(3) 648 | | Update cont- | | 649 | | ent status | | 650 | +--------------+ | 651 | Content Data | | 652 |<--------------------------| | 653 | | | 655 Figure 8 Dynamic Content Acquisition (cache miss case) 657 The steps that are illustrated in the figure are as follows: 659 1. A content request originated from an end-user is received by 660 uCDN. The uCDN processes the request and recognizes that the end- 661 user is best served by a dCDN. So uCDN redirects the request to the 662 dCDN by sending redirection response to the end-user who then 663 requests the content from the dCDN. The uCDN encapsulates Resource 664 Identifier of the requested content item in the redirection response. 666 2. On reception of this request, the dCDN fetches the corresponding 667 Content Identifier from the uCDN by using the Resource Identifier 668 pointing to the requested resource. The dCDN checks whether the same 669 content item is cached by looking up its stored content 670 identification model with the Content Identifier. If such metadata 671 does not exist, the case of a cache miss is determined by the dCDN, 672 and therefore content needs to be downloaded from the uCDN before 673 delivered to the end-user. 675 3. The dCDN acquires the requested content from uCDN A. Once the 676 content is cached, dCDN updates the content status and maintains the 677 binding relation between Resource Identifier and the Content 678 Identifier metadata. The dCDN then delivers content data to the end- 679 user. 681 +----------+ +------+ +------+ 682 | end-user | | dCDN | | uCDN | 683 +----------+ +------+ +------+ 684 | Content Request | | 685 |------------------------------------------------------->| 686 | Content Redirection | | 687 |<-------------------------------------------------------|(1) 688 | Content Request | | 689 |-------------------------->| | 690 | | Content id Acquisition | 691 | |<-------------------------->| 692 | +--------------+ | 693 | | content | | 694 | | repetition | |(2) 695 | | check | | 696 | +--------------+ | 697 | Content Data | | 698 |<--------------------------| | 699 | | | 701 Figure 9 Dynamic Content Acquisition (cache hit case) 703 The steps that are illustrated in the figure are as follows: 705 Steps 1 and 2 are exactly the same as steps 1 and 2 of Figure 6, 706 except that in this figure, dCDN determines the case of a cache hit 707 according to the existence of such record in the corresponding 708 Content Identifier metadata. 710 This flow differs from that in Figure 6 only in terms of not 711 triggering dynamic content acquisition (step 3), since the content 712 has already been cached by dCDN. 714 4.4.3. Content Purge and Invalidate 716 In general, the dCDN would assign separate location for each of its 717 uCDN to store triggers. However, when it comes to complex CDNi 718 deployment as discussed in this draft, dCDN is likely to receive 719 multiple trigger operations coming from different uCDNs on the same 720 content. If one of the uCDNs requests to invalidate certain content, 721 after receiving the Invalidated Trigger, dCDN will first identify the 722 content using the Content Identifier and mark Invalid in the Trigger 723 Status Resource corresponding to the requesting uCDN. By doing so, 724 this content is unavailable to this uCDN before it is re-validated. 725 The access to this content by other uCDNs will not be impacted. 727 If one of the uCDNs requests to purge certain content, after 728 receiving the Purge Trigger, dCDN will identify the content using the 729 Content Identifier and mark Invalid in the Trigger Status Resource 730 corresponding to the requesting uCDN. By doing this, this uCDN is 731 not able to make any further operation on this content, while the 732 content itself is not deleted from the cache of dCDN. The access to 733 this content by other uCDNs will not be impacted. Only when all the 734 uCDNs binded to this content request to purge the same content and 735 dCDN accepts all these requests will the content be purged from dCDN. 737 +----------+ +------+ +------+ +------+ 738 | end-user | | dCDN | | uCDN1| |uCDN2 | 739 +----------+ +------+ +------+ +------+ 740 | | content purge | | 741 | |<-------------- | | 742 | | | | 743 | +-------------------+ | | 744 | |Invalidate uCDN1's | | | 745 | |operation on the | | | 746 | |content but still | | | (1) 747 | |maintains it in | | | 748 | |the cache | | | 749 | +-------------------+ | | 750 | | OK | | 751 | |--------------> | | 752 | | content purge | 753 | |<-------------------------- | 754 | +------------------+ | | 755 | |Remove the content| | | 756 | | and | | | (2) 757 | |conjunction data | | | 758 | +------------------+ | | 759 | | OK | | 760 | |--------------------------->| 761 | | | | 763 Figure 10 Removal of Content 765 A premise is that the content copy to be purged has already been 766 cached in the dCDN from the uCDN. The Content Identifier of the 767 content is linked with two Resource IDs from uCDN1 and uCDN2 768 respectively. The steps illustrated in the figure are as follows: 770 1. The uCDN1 requests the dCDN to remove some content resource 771 identified by the Content Identifier due to the deployment policy or 772 expiration of content's life-time. As not only uCDN1 has been binded 773 to this content, dCDN then only invalidates uCDN's operation on the 774 requested content but still maintains it in the cache so that other 775 uCDNs (e.g. uCDN2) binded to this content can still operate on this 776 content. It then replies an OK response to uCDN1. 778 2. If uCDN2 also requests dCDN to remove the same content, dCDN will 779 remove the content and all the conjunction metadata. It then replies 780 an OK response to uCDN2. 782 5. Security Considerations 784 To be discussed/aded later. 786 6. IANA Considerations 788 This document has no IANA Considerations. 790 7. Acknowledgments 792 The authors would like to thank Francois Le Faucheur, Kevin Ma, 793 Theodore Zahariadis, Ben Niven-Jenkins, Ram Krishnan, and Marcin 794 Pilarski for valuable inputs, suggestions, and discussions. 796 8. References 798 8.1. Normative References 800 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 801 Requirement Levels", BCP 14, RFC 2119, March 1997. 803 8.2. Informative References 805 [Data Deduplication Techniques] 806 Ao, L., Shu, JW., and MQ. Li, "Data Deduplication 807 Techniques", May 2010. 809 [EIDR WHITEPAPER] 810 EIDR, EIDR., "UNIVERSAL UNIQUE IDENTIFIERS IN MOVIE AND 811 TELEVISION SUPPLY CHAIN MANAGEMENT", October 2010. 813 [I-D.ietf-cdni-framework] 814 Peterson, L. and B. Davie, "Framework for CDN 815 Interconnection", February 2013. 817 [I-D.ietf-cdni-requirements] 818 Leung, K. and Y. Lee, "Content Distribution Network 819 Interconnection (CDNI) Requirements", December 2012. 821 [I-D.murray-cdni-triggers] 822 Murray, R., Niven-Jenkins, B., and Velocix, "CDN 823 Interconnect Triggers", March 2013. 825 [I-D.narten-iana-considerations-rfc2434bis] 826 Narten, T. and H. Alvestrand, "Guidelines for Writing an 827 IANA Considerations Section in RFCs", 828 draft-narten-iana-considerations-rfc2434bis-09 (work in 829 progress), March 2008. 831 [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, 832 June 1999. 834 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 835 Text on Security Considerations", BCP 72, RFC 3552, 836 July 2003. 838 [RFC6707] Niven-Jenkins, B., Le Faucheur, F., and N. Bitar, 839 "Content Distribution Network Interconnection (CDNI) 840 Problem Statement", September 2012. 842 Authors' Addresses 844 WeiYi Jin 845 ZTE Corporation 846 Nanjing, 210012 847 China 849 Phone: +86 025-52871364 850 Email: jin.weiyi@zte.com.cn 851 Mian Li 852 ZTE Corporation 853 Nanjing, 210012 854 China 856 Phone: +86 025-88014641 857 Email: li.mian@zte.com.cn 859 Bhumip Khasnabish 860 ZTE Corporation 861 New Jersey, 07960 862 USA 864 Phone: +001-781-752-8003 865 Email: bhumip.khasnabish@zteusa.com, vumip1@gmail.com