idnits 2.17.1 draft-dannewitz-ppsp-secure-naming-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 2 instances of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 22, 2010) is 4925 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Individual Submission C. Dannewitz 3 Internet-Draft University of Paderborn 4 Intended status: Informational T. Rautio 5 Expires: April 25, 2011 VTT Technical Research Centre of 6 Finland 7 O. Strandberg 8 Nokia Siemens Networks 9 B. Ohlman 10 Ericsson 11 October 22, 2010 13 Secure naming structure and p2p application interaction 14 draft-dannewitz-ppsp-secure-naming-01 16 Abstract 18 Today, each P2P system typically uses its own way to identify data. 19 The lack of a common naming scheme prevents P2P applications from 20 benefiting from available copies of the same data distributed via 21 different P2P system. In addition, today's P2P naming schemes lack 22 important security aspects that would allow the user to check the 23 data integrity and build trust in data and data publishers. This is 24 especially important in P2P applications as data is received from 25 untrusted peers. Providing a generic naming scheme for P2P systems 26 so that multiple P2P systems can use the same data regardless of data 27 location and P2P system increases the efficiency and data 28 availability of the overall data dissemination process. The proposed 29 secure naming structure provides a potential way to address these 30 challenges with a common naming structure that is flexible enough to 31 support all different needs. In addition, the secure naming scheme 32 is providing self-certification such that the receiver can verify the 33 data integrity, i.e., that the correct data has been received, 34 without requiring a trusted third party. It also enables owner 35 authentication to build up trust in (potentially anonymous) data 36 publishers. The secure naming structure should be beneficial as 37 potential design principle in defining the two protocols identified 38 as objectives in the PPSP charter. This document enumerates a number 39 of design considerations to impact the design and implementation of 40 the tracker-peer signaling and peer-peer streaming signaling 41 protocols. 43 Requirements Language 45 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 46 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 47 document are to be interpreted as described in [RFC2119]. 49 Status of this Memo 51 This Internet-Draft is submitted in full conformance with the 52 provisions of BCP 78 and BCP 79. 54 Internet-Drafts are working documents of the Internet Engineering 55 Task Force (IETF). Note that other groups may also distribute 56 working documents as Internet-Drafts. The list of current Internet- 57 Drafts is at http://datatracker.ietf.org/drafts/current/. 59 Internet-Drafts are draft documents valid for a maximum of six months 60 and may be updated, replaced, or obsoleted by other documents at any 61 time. It is inappropriate to use Internet-Drafts as reference 62 material or to cite them other than as "work in progress." 64 This Internet-Draft will expire on April 25, 2011. 66 Copyright Notice 68 Copyright (c) 2010 IETF Trust and the persons identified as the 69 document authors. All rights reserved. 71 This document is subject to BCP 78 and the IETF Trust's Legal 72 Provisions Relating to IETF Documents 73 (http://trustee.ietf.org/license-info) in effect on the date of 74 publication of this document. Please review these documents 75 carefully, as they describe your rights and restrictions with respect 76 to this document. Code Components extracted from this document must 77 include Simplified BSD License text as described in Section 4.e of 78 the Trust Legal Provisions and are provided without warranty as 79 described in the Simplified BSD License. 81 This document may contain material from IETF Documents or IETF 82 Contributions published or made publicly available before November 83 10, 2008. The person(s) controlling the copyright in some of this 84 material may not have granted the IETF Trust the right to allow 85 modifications of such material outside the IETF Standards Process. 86 Without obtaining an adequate license from the person(s) controlling 87 the copyright in such materials, this document may not be modified 88 outside the IETF Standards Process, and derivative works of it may 89 not be created outside the IETF Standards Process, except to format 90 it for publication as an RFC or to translate it into languages other 91 than English. 93 Table of Contents 95 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 96 2. Naming requirements . . . . . . . . . . . . . . . . . . . . . 4 97 3. Basic Concepts for an Application-independent P2P Naming 98 Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 99 3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 6 100 3.2. ID Structure . . . . . . . . . . . . . . . . . . . . . . . 7 101 3.3. Security Metadata Structure . . . . . . . . . . . . . . . 8 102 4. Application use of secure naming structure . . . . . . . . . . 9 103 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 11 104 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 105 7. Security Considerations . . . . . . . . . . . . . . . . . . . 12 106 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 12 107 9. Informative References . . . . . . . . . . . . . . . . . . . . 12 108 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 110 1. Introduction 112 Today's dominating naming schemes in the Internet, i.e., IP addresses 113 and URLs, are rather host-centric with respect to the fact that they 114 are bound to a location. This kind of naming scheme is not suitable 115 for P2P systems as they are based on an information-centric thinking, 116 i.e., putting the information at the focus whereas the source for 117 this information is constantly changing and might involve more than 118 one source at once. 120 Numerous P2P applications use their own data model and protocol for 121 keeping track of data and locations. This poses a challenge for use 122 of the same information for several applications. A common naming 123 scheme e.g. data model would be important to enable interconnectivity 124 between different P2P systems. To be able to build a common P2P 125 infrastructure that can serve a multitude of applications there is a 126 need for a common application independent naming scheme. With such a 127 naming scheme different applications can use and refer to the same 128 information/data objects. 130 It is possible to introduce false data into P2P systems, only 131 detectable when the content is played out in the user application. 132 The false data copies can be identified and sorted out if the P2P 133 system can verify the reference used in the tracker protocol towards 134 data received at the peer. One option to address this can be to 135 secure the naming structure i.e. make the data reference be dependent 136 on the data and related metadata. 138 For any type of caching solution (network based or P2P) and network 139 based storage, e.g. DECADE, a common application independent naming 140 scheme is essential to be able to identify cached copies of 141 information/data objects. 143 This document enumerates and explains the rationale for why a naming 144 structure for information/data objects should be part of a 145 specification for a protocol for PPSP. The main advantage is 146 probably in the definition of a protocol for signaling and control 147 between trackers and peers (the PPSP "tracker protocol") but also a 148 signaling and control protocol for communication among the peers (the 149 PPSP "peer protocol") might have benefits from a common and secure 150 naming scheme. 152 2. Naming requirements 154 In the following, we discuss the requirements that a common naming 155 scheme for P2P systems has to fulfill. 157 To enable efficient, large scale data dissemination that can make use 158 of any available data copy, identifiers (IDs) in P2P systems have to 159 be location-independent. Thereby, identical data can be identified 160 by the same ID independently of its storage location and improved 161 data dissemination can then benefit from all available copies. This 162 should be possible without compromising trust in data regardless of 163 its network source. 165 Security in a P2P system needs to be implemented differently than in 166 host-centric networks. In the latter, most security mechanisms are 167 based on host authentication and then trusting the data that the host 168 delivers. In a P2P system, host authentication cannot be relied 169 upon, or one of the main advantages of a P2P system, i.e., benefiting 170 from any available copy, is defeated. Host authentication of a 171 random, untrusted host that happens to have a copy does not establish 172 the needed trust. Instead, the security has to be directly attached 173 to the data which can be done via the scheme used to name the data. 175 Therefore, self-certification is a main requirement for the naming 176 scheme. Self-certification ensures the integrity of data and 177 securely binds this data to its ID. More precisely, this property 178 means that any unauthorized change of data with a given ID is 179 detectable without requiring a third party for verification. 180 Beforehand, secure retrieval of IDs (e.g., via search, embedded in a 181 Web page as link, etc.) is required to ensure that the user has the 182 right ID in the first place. Secure ID retrieval can be achieved by 183 using recommendations, past experience, and specialized ID 184 authentication services and mechanisms that are out of the scope of 185 this discussion. 187 Another important requirement is name persistence, not only with 188 respect to storage location changes as discussed above, but also with 189 respect to changes of owner and/or owner's organizational structure, 190 and content changes producing a new version of the information. 191 Information should always be identifiable with the same ID as long as 192 it remains essentially equivalent. Spreading of persistent naming 193 schemes like the Digital Object Identifier (DOI) [Paskin2010] also 194 emphasizes the need for a persistent naming scheme. However, name 195 persistence and self-certification are partly contradictory and 196 achieving both simultaneously for dynamic content is not trivial. 198 From a user's perspective, persistent IDs ensure that links and 199 bookmarks remain valid as long as the respective information exists 200 somewhere in the network, reducing today's problem of "404 - file not 201 found" errors triggered by renamed or moved content. From a content 202 provider's perspective, name persistence simplifies data management 203 as content can, e.g., be moved between folders and different servers 204 as desired. Name persistence with respect to content changes makes 205 it possible to identify different versions of the same information by 206 the same consistent ID. If it is important to differentiate between 207 multiple versions, a dedicated versioning mechanism is required, and 208 version numbers may be included as a special part of the ID. 210 The requirement of building trust in a P2P system combined with the 211 desire for anonymous publication as well as accountability (at least 212 for some content) can be translated into two related naming 213 requirements. The first is owner authentication, where the owner is 214 recognized as the same entity, which repeatedly acts as the object 215 owner, but may remain anonymous. The second is owner identification, 216 where the owner is also identified by a physically verifiable 217 identifier, such as a personal name. This separation is important to 218 allow for anonymous publication of content, e.g., to support free 219 speech, while at the same time building up trust in a (potentially 220 anonymous) owner. 222 In general, the naming scheme should be able to adapt to future 223 needs. Therefore, the naming scheme should be extensible, i.e., it 224 should be able to add new information (e.g., a chunk number for 225 BitTorrent-like protocols) to the naming scheme. The need for such 226 extensions is stressed by today's variety of naming schemes (e.g., 227 DOI or PermaLink) added on top of the original Internet architecture 228 that fulfill specialized needs which cannot be met by the common 229 Internet naming schemes, i.e., IP addresses and URLs. 231 3. Basic Concepts for an Application-independent P2P Naming Scheme 233 In this section, we introduce an examplary naming scheme that 234 illustrates a possible way to fulfill the requirements posed upon an 235 application-independent naming scheme for P2P networks. The naming 236 scheme integrates security deeply into the system architecture. 237 Trust is based on the data's ID in combination with additional 238 security metadata. Section 3.1 gives an overview of the naming 239 scheme in general with details about the ID structure, and Section 240 3.2 describes the security metadata in more detail. 242 3.1. Overview 244 Building on an identifier/locator split, each data element, e.g., 245 file, is given a unique ID with cryptographic properties. Together 246 with the additional security metadata, the ID can be used to verify 247 data integrity, owner authentication, and owner identification. The 248 security metadata contains information needed for the security 249 functions of the naming scheme, e.g., public keys, content hashes, 250 certificates, and a data signature authenticating the content. In 251 comparison with the security model in today's host-centric networks, 252 this approach minimizes the need for trust in the infrastructure, 253 especially in the host(s) providing the data. 255 In a P2P network, multiple copies of the same data element typically 256 exist at different locations. Thanks to the ID/locator split and the 257 application-independent naming scheme, those identical copies have 258 the same ID and, hence, each P2P application can benefit from all 259 available copies. 261 Data elements are manipulated (e.g., generated, modified, registered, 262 and retrieved) by physical entities such as nodes (clients or hosts), 263 persons, and companies. Physical entities able of generating, i.e., 264 creating or modifying data elements are called owners here. Several 265 security properties of this naming scheme are based on the fact that 266 each ID contains the hash of a public key that is part of a public/ 267 secret key pair PK/SK. This PK/SK pair is conceptually bound to the 268 data element itself and not directly to the owner as in other systems 269 like DONA [Koponen]. If desired, the PK/SK pair can be bound to the 270 owner only indirectly, via a certificate chain. This is important to 271 note because it enables owner change while keeping persistent IDs. 272 The key pair bound to the data is thus denoted as PK_D/SK_D. 274 Making the (hash of the) public key part of ID enables self- 275 certification of dynamic content while keeping persistent IDs. Self- 276 certification of static content can be achieved by simply including 277 the hash of content in the ID, but this would obviously result in 278 non-persistent IDs for dynamic content. For dynamic content, the 279 public key in the ID can be used to securely bind the hash of content 280 to the ID, by signing it with the corresponding secret key, while not 281 making it part of ID. 283 The owner's PK as part of the ID inherently provides owner 284 authentication. If the public key is bound to the owner's identity 285 (i.e., to its real-world name) via a trusted third party certificate, 286 this also allows owner identification. Without this additional 287 certificate, the owner can remain anonymous. 289 To support the potentially diverse requirements of certain groups of 290 P2P applications and adapt to future changes, the naming scheme can 291 enable flexibility and extensibility by supporting different name 292 structures, differentiated via a Type field in the ID. 294 3.2. ID Structure 296 The naming scheme uses flat IDs to support self-certification and 297 name persistence. In addition, flat IDs are advantageous when it 298 comes to mobility and they can be allocated without an administrative 299 authority by relying on statistical uniqueness in a large namespace, 300 with the rare case of ID collisions being handled by the P2P system. 301 Although IDs are not hierarchical, they have a specified basic ID 302 structure. The ID structure given as ID = (Type field | A = hash(PK) 303 | L) is described subsequently. 305 The Authenticator field A=Hash(PK_D) binds the ID to a public key 306 PK_D. The hash function Hash is a cryptographic hash function, which 307 is required to be one-way and collision-resistant. The hash function 308 serves only to reduce the bit length of PK_D. PK_D is generated in 309 accordance with a chosen public-key cryptosystem. The corresponding 310 secret key SK_D should only be known to a legitimate owner. In 311 consequence, an owner of the data is defined as any entity who 312 (legitimately) knows SK_D. 314 The pair (A, L) has to be globally unique. Hence, the Label field L 315 provides global uniqueness if PK_D is repeatedly used for different 316 data. 318 To build a flexible and extensible naming scheme, e.g., to adapt the 319 naming scheme to future changes, different types of IDs are supported 320 by the naming scheme and differentiated via a mandatory and globally 321 standardized Type field in each ID. For example, the Type field 322 specifies the hash functions used to generate the ID. If a used hash 323 function becomes insecure, the Type field can be exploited by the P2P 324 system in order to automatically mark the IDs using this hash 325 function as invalid. 327 3.3. Security Metadata Structure 329 The security metadata is extensible and contains all information 330 required to perform the security functions embedded in the naming 331 scheme. The metadata (or selected parts of it) will be signed by 332 SK_D corresponding to PK_D. This securely binds the metadata to the 333 ID, i.e., to the Hash(PK_D) which is part of the ID. For example, 334 the security metadata may include: 336 o specification of the hash function h and the algorithm DSAlg used 337 for the digital signature 339 o complete PK_D (not only Hash(PK_D)) 341 o specification of the parts of data that are self-certified, i.e., 342 authenticated via the signature 344 o hash of the self-certified data 346 o signature of the self-certified data signed by SK_D 347 o all data required for owner authentication and identification 349 A detailed description and security analysis of this naming scheme 350 and its security properties, especially self-certification, name 351 persistence, owner authentication, and owner identification can be 352 found in Dannewitz et al. [Dannewitz_10]. 354 4. Application use of secure naming structure 356 From an application perspective the main advantage of a secure naming 357 structure for a P2P infrastructure is that multiple applications can 358 have common access to the same data elements. Another benefit of 359 application-independent naming is that locally available and cached 360 copies can easily be located. The secure naming also enables that 361 data can be verified even if it is received from an untrusted host. 363 For example, when an application like BitTorrent [WWWbittorrent] uses 364 self-certifying names, the user is guaranteed that the data received 365 is actually the data that has been requested, without having to trust 366 any servers in the network (e.g., the tracker) or the peers that 367 provide the data. 369 This means that BitTorrent's validation of the data integrity can be 370 improved significantly using the presented secure naming structure. 371 Currently, a standard BitTorrent system has no means to verify the 372 integrity of the torrent file and consequently of the data. The 373 torrent file (see Figure 1) contains the SHA1 hashes of the content 374 pieces (pieces in Figure 2)). However, anyone can modify a torrent 375 file to bind different content to this file. If the torrent file 376 gets modified, the user has no means any more to verify the integrity 377 of the data. Modification of the torrent affects only to info_hash 378 (calculated SHA1 hash of the torrent's info field - see figure), 379 which is used for torrent session identification in different 380 software entities (e.g. in trackers). In practice, after changes, 381 torrent is referring to different torrent session that is carrying a 382 forged content. If, in addition, the tracker allows to use several 383 torrents with the same name - delivers forged data (consistent with 384 the forged torrent file) or if torrent is pointed to another, 385 "convenient" tracker, a user could effectively be tricked into 386 downloading forged content which would falsely be identified as being 387 correct by the BitTorrent client. I.e., in the current BitTorrent 388 system, a user has no guarantee that the downloaded content actually 389 matches the expected/correct content. 391 +---------------------------------+---------------------------------+ 392 | announce | info | 393 +---------------------------------+---------------------------------+ 395 Figure 1: Basic structure of the BitTorrent torrent file 397 +-----------+--------------+-------------+------------+-------------+ 398 | name | piece length | pieces | length | path (opt) | 399 +-----------+--------------+-------------+------------+-------------+ 401 Figure 2: Structure of info field in torrent file 403 +-----------+--------------+-------------+------------+-------------+ 404 | name | piece length | pieces | length | path (opt) | 405 +-----------+--------------+-------------+------------+-------------+ 406 +----------------------+----------------------+---------------------+ 407 | h | DSAlg | PK_D | 408 +----------------------+----------------------+---------------------+ 409 +----------------------+----------------------+---------------------+ 410 | certified pieces | signature | ID | 411 +----------------------+----------------------+---------------------+ 413 Figure 3: Structure of Secure naming enabled info field in torrent 415 The secure naming structure presented in this draft can provide a 416 simple solution for this problem by securely binding the content of 417 the torrent file to the name/ID of the torrent file. This can be 418 done by extending the torrent file to include the above described 419 security metadata information, as it is seen in Figure 3. In 420 practice, during the torrent file creation, an object owner would 421 store information about utilized algorithms (h - hash function and 422 DSAlg - digital signature algorithm), the public key (PK_D), 423 specification of signed data and ID into the torrent's info field, 424 and will sign recently added secure metadata in addition to the piece 425 hash values (pieces in the torrent's info field) with the private key 426 (SK_D). Generated signature will also be included in the extension 427 part of the info field (signature). 429 Since the content of the extended torrent is created, the respective 430 torrent file ID would be generated according to the rules described 431 in Section 3. As it is defined in the section, ID contains three 432 different fields, namely Type, A and L. In the case of BitTorrent, 433 Type field would carry on information about used hash function to 434 generate field A from PK_D, and also structure of the field L. If, 435 for example, L has name and version of the distributed file, Type 436 field should tell that by including strings "Name" and "Version" in 437 it. The next one, field A, includes hash values oh the used PK_D 438 (method defined in Type). And finally the proposed BitTorrents ID 439 field L, can take in name and version of the distributed file. 440 According to the description and by using separators - (within one 441 field) and _ (between fields) the torrent file name could look, for 442 example, like: HashMethod-Name-Version_HashofPK_Filename- 443 Fileversion.torrent. 445 Consequently, whenever a user knows the ID of the content/torrent 446 file and retrieves the torrent file, she/he can now open the torrent 447 with the secure naming supported BitTorrent client and client 448 verifies the integrity of the torrent file by comparing PK_D in 449 secure metadata and field A in the ID, in addition, conformance of ID 450 in the torrent name and ID in the metadata is verified. With respect 451 to the secure metadata the signature and actual data is compared 452 also. Once these three are verified, the client can download the 453 data pieces, and can use the BitTorrent's included (and now secured) 454 hash(es) to verify the integrity of the received data. As a result, 455 the user can be sure that the correct content was retrieved. 457 5. Conclusion 459 The secure naming structure is proposed for consideration as common 460 reference ID structure in PPSP WG. For any P2P streaming application 461 to have fair and multitude of data access, it is essential to have a 462 common naming structure that is suitable for many different needs. 463 The common naming is probably best displayed in the tracker protocol 464 case but potential benefit in the actual streaming protocol case has 465 to still be identified. The secure binding of reference ID to the 466 actual content is manifested in the end user peer possibility to 467 check correct data reception in regard to the used ID. 469 The naming structure has been implemented in the 4WARD project 470 prototypes and has been released as open source (www.netinf.org). 471 The naming structure is also available through a public NetInf 472 registration service at www.netinf.org. Three NetInf-enabled 473 applications have also been published, the InFox (Firefox plugin), 474 InBird (Thunderbird plugin), and a NetInf Information Object 475 Management Tool, all available at the www.netinf.org site. 477 6. IANA Considerations 479 This document has no requests to IANA. 481 7. Security Considerations 483 There are considerations about what private/public key and hash 484 algorithms to utilize when designing the naming structure in a secure 485 way. 487 8. Acknowledgements 489 We would like to thank all the persons participating in the Network 490 of Information work packages in the EU FP7 projects 4WARD and SAIL 491 and the Finnish ICT SHOK Future Internet 2 project for contributions 492 and feedback to this document. 494 9. Informative References 496 [Dannewitz_10] 497 Dannewitz, C., Golic, J., Ohlman, B., and B. Ahlgren, 498 "Secure Naming for a Network of Information", 13th IEEE 499 Global Internet Symposium , 2010. 501 [Koponen] Koponen, T., Chawla, M., Chun, B., Ermolinskiy, A., Kim, 502 K., Shenker, S., and I. Stoica, "A Data-Oriented (and 503 beyond) Network Architecture", Proc. ACM SIGCOMM , 2007. 505 [Paskin2010] 506 Paskin, N., "Digital Object Identifier ({DOI}(R)) System", 507 Encyclopedia of Library and Information Sciences , 2010. 509 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 510 Requirement Levels", BCP 14, RFC 2119, March 1997. 512 [WWWbittorrent] 513 Cohen, B., "The BitTorrent Protocol Specification", 514 http://www.bittorrent.org/beps/bep_0003.html , 2008. 516 Authors' Addresses 518 Christian Dannewitz 519 University of Paderborn 520 Paderborn 521 Germany 523 Email: cdannewitz@upb.de 524 Teemu Rautio 525 VTT Technical Research Centre of Finland 526 Oulu 527 Finland 529 Email: teemu.rautio@vtt.fi 531 Ove Strandberg 532 Nokia Siemens Networks 533 Espoo 534 Finland 536 Email: ove.strandberg@nsn.com 538 Borje Ohlman 539 Ericsson 540 Stockholm 541 Sweden 543 Email: Borje.Ohlman@ericsson.com