idnits 2.17.1 draft-ietf-nfsv4-nfs-rdma-problem-statement-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 739. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 749. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 756. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 762. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 21, 2008) is 5906 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 3530 (Obsoleted by RFC 7530) ** Obsolete normative reference: RFC 1831 (Obsoleted by RFC 5531) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 Working Group Tom Talpey 3 Internet-Draft NetApp 4 Intended status: Informational Chet Juszczak 5 Expires: August 23, 2008 February 21, 2008 7 NFS RDMA Problem Statement 8 draft-ietf-nfsv4-nfs-rdma-problem-statement-08 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six 23 months and may be updated, replaced, or obsoleted by other 24 documents at any time. It is inappropriate to use Internet-Drafts 25 as reference material or to cite them other than as "work in 26 progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on August 23, 2008. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2008). 40 Abstract 42 This draft addresses enabling the use of Remote Direct Memory 43 Access (RDMA) by the Network File System (NFS) protocols. NFS 44 implementations historically incur significant overhead due to data 45 copies on end-host systems, as well as other processing overhead. 46 The potential benefits of RDMA to these implementations are 47 explored, and the reasons why RDMA is especially well-suited to NFS 48 and network file protocols in general are evaluated. 50 Table Of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 2 53 2. Problem Statement . . . . . . . . . . . . . . . . . . . . 5 54 3. File Protocol Architecture . . . . . . . . . . . . . . . . 6 55 4. Sources of Overhead . . . . . . . . . . . . . . . . . . . 8 56 4.1. Savings from TOE . . . . . . . . . . . . . . . . . . . . 9 57 4.2. Savings from RDMA . . . . . . . . . . . . . . . . . . . 10 58 5. Application of RDMA to NFS . . . . . . . . . . . . . . . . 10 59 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . 11 60 Security Considerations . . . . . . . . . . . . . . . . . 12 61 IANA Considerations . . . . . . . . . . . . . . . . . . . 13 62 Acknowledgements . . . . . . . . . . . . . . . . . . . . . 13 63 Normative References . . . . . . . . . . . . . . . . . . . 13 64 Informative References . . . . . . . . . . . . . . . . . . 13 65 Authors' Addresses . . . . . . . . . . . . . . . . . . . . 16 66 Intellectual Property and Copyright Statements . . . . . . 16 67 Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . 17 69 1. Introduction 71 The Network File System (NFS) protocol (as described in [RFC1094], 72 [RFC1813], and [RFC3530]) is one of several remote file access 73 protocols used in the class of processing architecture sometimes 74 called Network Attached Storage (NAS). 76 Historically, remote file access has proven to be a convenient, 77 cost-effective way to share information over a network, a concept 78 proven over time by the popularity of the NFS protocol. However, 79 there are issues in such a deployment. 81 As compared to a local (direct-attached) file access architecture, 82 NFS removes the overhead of managing the local on-disk filesystem 83 state and its metadata, but interposes at least a transport network 84 and two network endpoints between an application process and the 85 files it is accessing. This tradeoff has to date usually resulted 86 in a net performance loss as a result of reduced bandwidth, 87 increased application server CPU utilization, and other overheads. 89 Several classes of applications, including those directly 90 supporting enterprise activities in high performance domains such 91 as database applications and shared clusters, have therefore 92 encountered issues with moving to NFS architectures. While this 93 has been due principally to the performance costs of NFS versus 94 direct attached files, other reasons are relevant, such as the lack 95 of strong consistency guarantees being provided by NFS 96 implementations. 98 Replication of local file access performance on NAS using 99 traditional network protocol stacks has proven difficult, not 100 because of protocol processing overheads, but because of data copy 101 costs in the network endpoints. This is especially true since host 102 buses are now often the main bottleneck in NAS architectures 103 [MOG03] [CHA+01]. 105 The External Data Representation [RFC4506] employed beneath NFS and 106 RPC [RFC1831bis] can add more data copies, exacerbating the 107 problem. 109 Data copy-avoidance designs have not been widely adopted for a 110 variety of reasons. [BRU99] points out that "many copy avoidance 111 techniques for network I/O are not applicable or may even backfire 112 if applied to file I/O." Other designs that eliminate unnecessary 113 copies, such as [PAI+00], are incompatible with existing APIs and 114 therefore force application changes. 116 In recent years, an effort to standardize a set of protocols for 117 Remote Direct Memory Access, RDMA, over the standard Internet 118 Protocol Suite has been chartered [RDDP]. A complete IP-based RDMA 119 procotol suite is available in the published Standards Track 120 specifications. 122 RDMA is a general solution to the problem of CPU overhead incurred 123 due to data copies, primarily at the receiver. Substantial 124 research has addressed this and has borne out the efficacy of the 125 approach. An overview of this is the RDDP "Remote Direct Memory 126 Access (RDMA) over IP Problem Statement" document, [RFC4297]. 128 In addition to the per-byte savings of off-loading data copies, 129 RDMA-enabled NICs (RNICS) offload the underlying protocol layers as 130 well, e.g., TCP, further reducing CPU overhead due to NAS 131 processing. 133 1.1. Background 135 The RDDP Problem Statement [RFC4297] asserts: 137 "High costs associated with copying are an issue primarily for 138 large scale systems ... with high bandwidth feeds, usually 139 multiprocessors and clusters, that are adversely affected by 140 copying overhead. Examples of such machines include all 141 varieties of servers: database servers, storage servers, 142 application servers for transaction processing, for e- 143 commerce, and web serving, content distribution, video 144 distribution, backups, data mining and decision support, and 145 scientific computing. 147 Note that such servers almost exclusively service many 148 concurrent sessions (transport connections), which, in 149 aggregate, are responsible for > 1 Gbits/s of communication. 150 Nonetheless, the cost of copying overhead for a particular 151 load is the same whether from few or many sessions." 153 Note that each of the servers listed above could be accessing their 154 file data as an NFS client, or NFS serving the data to such 155 clients, or acting as both. 157 The CPU overhead of the NFS and TCP/IP protocol stacks (including 158 data copies or reduced copy workarounds) becomes a significant 159 matter in these clients and servers. File access using locally 160 attached disks imposes relatively low overhead due to the highly 161 optimized I/O path and direct memory access afforded to the storage 162 controller. This is not the case with NFS, which must pass data 163 to, and especially from, the network and network processing stack 164 to the NFS stack. Frequently, data copies are imposed on this 165 transfer, in some cases several such copies in each direction. 167 Copies are potentially encountered in an NFS implementation 168 exchanging data to and from user address spaces, within kernel 169 buffer caches, in XDR marshalling and unmarshalling, and within 170 network stacks and network drivers. Other overheads such as 171 serialization among multiple threads of execution sharing a single 172 NFS mount point and transport connection are additionally 173 encountered. 175 Numerous upper layer protocols achieve extremely high bandwidth and 176 low overhead through the use of RDMA. [MAF+02] show that the RDMA- 177 based Direct Access File System (with a user-level implementation 178 of the file system client) can outperform even a zero-copy 179 implementation of NFS [CHA+01] [CHA+99] [GAL+99] [KM02]. Also, 180 file data access implies the use of large ULP messages. These 181 large messages tend to amortize any increase in per-message costs 182 due to the offload of protocol processing incurred when using RNICs 183 while gaining the benefits of reduced per-byte costs. Finally, the 184 direct memory addressing afforded by RDMA avoids many sources of 185 contention on network resources. 187 2. Problem Statement 189 The principal performance problem encountered by NFS 190 implementations is the CPU overhead required to implement the 191 protocol. Primary among the sources of this overhead is the 192 movement of data from NFS protocol messages to its eventual 193 destination in user buffers or aligned kernel buffers. Due to the 194 nature of the RPC and XDR protocols, the NFS data payload arrives 195 at arbitrary alignment, necessitating a copy at the receiver, and 196 the NFS requests are completed in an arbitrary sequence. 198 The data copies consume system bus bandwidth and CPU time, reducing 199 the available system capacity for applications [RFC4297]. 200 Achieving zero-copy with NFS has, to date, required sophisticated, 201 version-specific "header cracking" hardware and/or extensive 202 platform-specific virtual memory mapping tricks. Such approaches 203 become even more difficult for NFS version 4 due to the existence 204 of the COMPOUND operation and presence of Kerberos and other 205 security information, which further reduce alignment and greatly 206 complicate ULP offload. 208 Furthermore, NFS is challenged by high-speed network fabrics such 209 as 10 Gbits/s Ethernet. Performing even raw network I/O such as 210 TCP is an issue at such speeds with today's hardware. The problem 211 is fundamental in nature and has led the IETF to explore RDMA 212 [RFC4297]. 214 Zero-copy techniques benefit file protocols extensively, as they 215 enable direct user I/O, reduce the overhead of protocol stacks, 216 provide perfect alignment into caches, etc. Many studies have 217 already shown the performance benefits of such techniques [SKE+01] 218 [DCK+03] [FJNFS] [FJDAFS] [KM02] [MAF+02]. 220 RDMA is compelling here for another reason; hardware offloaded 221 networking support in itself does not avoid data copies, without 222 resorting to implementing part of the NFS protocol in the NIC. 223 Support of RDMA by NFS enables the highest performance at the 224 architecture level rather than by implementation; this enables 225 ubiquitous and interoperable solutions. 227 By providing file access performance equivalent to that of local 228 file systems, NFS over RDMA will enable applications running on a 229 set of client machines to interact through an NFS file system, just 230 as applications running on a single machine might interact through 231 a local file system. 233 3. File Protocol Architecture 235 NFS runs as an ONC RPC [RFC1831bis] application. Being a file 236 access protocol, NFS is very "rich" in data content (versus control 237 information). 239 NFS messages can range from very small (under 100 bytes) to very 240 large (from many kilobytes to a megabyte or more). They are all 241 contained within an RPC message and follow a variable length RPC 242 header. This layout provides an alignment challenge for the data 243 items contained in an NFS call (request) or reply (response) 244 message. 246 In addition to the control information in each NFS call or reply 247 message, sometimes there are large "chunks" of application file 248 data, for example read and write requests. With NFS version 4 (due 249 to the existence of the COMPOUND operation) there can be several of 250 these data chunks interspersed with control information. 252 ONC RPC is a remote procedure call protocol that has been run over 253 a variety of transports. Most implementations today use UDP or 254 TCP. RPC messages are defined in terms of an eXternal Data 255 Representation (XDR) [RFC4506] which provides a canonical data 256 representation across a variety of host architectures. An XDR data 257 stream is conveyed differently on each type of transport. On UDP, 258 RPC messages are encapsulated inside datagrams, while on a TCP byte 259 stream, RPC messages are delineated by a record marking protocol. 260 An RDMA transport also conveys RPC messages in a unique fashion 261 that must be fully described if client and server implementations 262 are to interoperate. 264 The RPC transport is responsible for conveying an RPC message from 265 a sender to a receiver. An RPC message is either an RPC call from 266 a client to a server, or an RPC reply from the server back to the 267 client. An RPC message contains an RPC call header followed by 268 arguments if the message is an RPC call, or an RPC reply header 269 followed by results if the message is an RPC reply. The call 270 header contains a transaction ID (XID) followed by the program and 271 procedure number as well as a security credential. An RPC reply 272 header begins with an XID that matches that of the RPC call 273 message, followed by a security verifier and results. All data in 274 an RPC message is XDR encoded. 276 The encoding of XDR data into transport buffers is referred to as 277 "marshalling", and the decoding of XDR data contained within 278 transport buffers and into destination RPC procedure result 279 buffers, is referred to as "unmarshalling". The process of 280 marshalling takes place therefore at the sender of any particular 281 message, be it an RPC request or an RPC response. Unmarshalling, 282 of course, takes place at the receiver. 284 Normally, any bulk data is moved (copied) as a result of the 285 unmarshalling process, because the destination address is not known 286 until the RPC code receives control and subsequently invokes the 287 XDR unmarshalling routine. In other words, XDR-encoded data is not 288 self-describing, and it carries no placement information. This 289 results in a data copy in most NFS implementations. 291 One mechanism by which the RPC layer may overcome this is for each 292 request to include placement information, to be used for direct 293 placement during XDR encode. This "write chunk" can avoid sending 294 bulk data inline in an RPC message and generally results in one or 295 more RDMA Write operations. 297 Similarly, a "read chunk", where placement information referring to 298 bulk data which may be directly fetched via one or more RDMA Read 299 operations during XDR decode, may be conveyed. The "read chunk" 300 will therefore be useful in both RPC calls and replies, while the 301 "write chunk" is used solely in replies. 303 These "chunks" are the key concept in an existing proposal 304 [RPCRDMA]. They convey what are effectively pointers to remote 305 memory across the network. They allow cooperating peers to 306 exchange data outside of XDR encodings but still use XDR for 307 describing the data to be transferred. And, finally, through use 308 of XDR they maintain a large degree of on-the-wire compatibility. 310 The central concept of the RDMA transport is to provide the 311 additional encoding conventions to convey this placement 312 information in transport-specific encoding, and to modify the XDR 313 handling of bulk data. 315 Block Diagram 317 +------------------------+-----------------------------------+ 318 | NFS | NFS + RDMA | 319 +------------------------+----------------------+------------+ 320 | Operations / Procedures | | 321 +-----------------------------------------------+ | 322 | RPC/XDR | | 323 +--------------------------------+--------------+ | 324 | Stream Transport | RDMA Transport | 325 +--------------------------------+---------------------------+ 327 4. Sources of Overhead 329 Network and file protocol costs can be categorized as follows: 331 o per-byte costs - data touching costs such as checksum or data 332 copy. Today's network interface hardware commonly offloads 333 the checksum, which leaves the other major source of per-byte 334 overhead, data copy. 336 o per-packet costs - interrupts and lower-layer processing. 337 Today's network interface hardware also commonly coalesce 338 interrupts to reduce per-packet costs. 340 o per-message (request or response) costs - LLP and ULP 341 processing. 343 Improvement from optimization becomes more important if the 344 overhead it targets is a larger share of the total cost. As other 345 sources of overhead, such as the checksumming and interrupt 346 handling above are eliminated, the remaining overheads (primarily 347 data copy) loom larger. 349 With copies crossing the bus twice per copy, network processing 350 overhead is high whenever network bandwidth is large in comparison 351 to CPU and memory bandwidths. Generally with today's end-systems, 352 the effects are observable at network speeds at or above 1 Gbits/s. 354 A common question is whether an increase in CPU processing power 355 alleviates the problem of high processing costs of network I/O. 356 The answer is no, it is the memory bandwidth that is the issue. 357 Faster CPUs do not help if the CPU spends most of its time waiting 358 for memory [RFC4297]. 360 TCP offload engine (TOE) technology aims to offload the CPU by 361 moving TCP/IP protocol processing to the NIC. However, TOE 362 technology by itself does nothing to avoid necessary data copies 363 within upper layer protocols. [MOG03] provides a description of 364 the role TOE can play in reducing per-packet and per-message costs. 365 Beyond the offloads commonly provided by today's network interface 366 hardware, TOE alone (w/o RDMA) helps in protocol header processing, 367 but this has been shown to be a minority component of the total 368 protocol processing overhead. [CHA+01] 370 Numerous software approaches to the optimization of network 371 throughput have been made. Experience has shown that network I/O 372 interacts with other aspects of system processing such as file I/O 373 and disk I/O. [BRU99] [CHU96] Zero-copy optimizations based on 374 page remapping [CHU96] can be dependent upon machine architecture, 375 and are not scalable to multi-processor architectures. Correct 376 buffer alignment and sizing together are needed to optimize the 377 performance of zero-copy movement mechanisms [SKE+01]. The NFS 378 message layout described above does not facilitate the splitting of 379 headers from data nor does it facilitate providing correct data 380 buffer alignment. 382 4.1. Savings from TOE 384 The expected improvement of TOE specifically for NFS protocol 385 processing can be quantified and shown to be fundamentally limited. 386 [SHI+03] presents a set of "LAWS" parameters which serve to 387 illustrate the issues. In the TOE case, the copy cost can be 388 viewed as part of the application processing "a". Application 389 processing increases the LAWS "gamma", which is shown by the paper 390 to result in a diminished benefit for TOE. 392 For example, if the overhead is 20% TCP/IP, 30% copy and 50% real 393 application work, then gamma is 80/20 or 4, which means the maximum 394 benefit of TOE is 1/gamma, or only 25%. 396 For RDMA (with embedded TOE) and the same example, the "overhead" 397 (o) offloaded or eliminated is 50% (20%+30%). Therefore in the 398 RDMA case, gamma is 50/50 or 1, and the inverse gives the potential 399 benefit of 1 (100%), a factor of two. 401 CPU overhead reduction factor 403 No Offload TCP Offload RDMA Offload 404 -----------+-------------+------------- 405 1.00x 1.25x 2.00x 407 The analysis in the paper shows that RDMA could improve throughput 408 by the same factor of two, even when the host is (just) powerful 409 enough to drive the full network bandwidth without RDMA. It can 410 also be shown that the speedup may be higher if network bandwidth 411 grows faster than Moore's Law, although the higher benefits will 412 apply to a narrow range of applications. 414 4.2. Savings from RDMA 416 Performance measurements directly comparing an NFS over RDMA 417 prototype with conventional network-based NFS processing are 418 described in [CAL+03]. Comparisons of Read throughput and CPU 419 overhead were performed on two types of Gigabit Ethernet adapters, 420 one type being a conventional adapter, and another type with RDMA 421 capability. The prototype RDMA protocol performed all transfers 422 via RDMA Read. The NFS layer in the study was measured while 423 performing read transfers, varying the transfer size and readahead 424 depth across ranges used by typical NFS deployments. 426 In these results, conventional network-based throughput was 427 severely limited by the client's CPU being saturated at 100% for 428 all transfers. Read throughput reached no more than 60MBytes/s. 430 I/O Type Size Read Throughput CPU Utilization 431 Conventional 2KB 20MB/s 100% 432 Conventional 16KB 40MB/s 100% 433 Conventional 256KB 60MB/s 100% 435 However, over RDMA, throughput rose to the theoretical maximum 436 throughput of the platform, while saturating the single-CPU system 437 only at maximum throughput. 439 I/O Type Size Read Throughput CPU Utilization 440 RDMA 2KB 10MB/s 45% 441 RDMA 16KB 40MB/s 70% 442 RDMA 256KB 100MB/s 100% 444 The lower relative throughput of the RDMA prototype at the small 445 blocksize may be attributable to the RDMA Read imposed by the 446 prototype protocol, which reduced the operation rate since it 447 introduces additional latency. As well, it may reflect the 448 relative increase of per-packet setup costs within the DMA portion 449 of the transfer. 451 5. Application of RDMA to NFS 453 Efficient file protocols require efficient data positioning and 454 movement. The client system knows the client memory address where 455 the application has data to be written or wants read data 456 deposited. The server system knows the server memory address where 457 the local filesystem will accept write data or has data to be read. 458 Neither peer however is aware of the others' data destination in 459 the current NFS, RPC or XDR protocols. Existing NFS 460 implementations have struggled with the performance costs of data 461 copies when using traditional Ethernet transports. 463 With the onset of faster networks, the network I/O bottleneck will 464 worsen. Fortunately, new transports that support RDMA have 465 emerged. RDMA excels at bulk transfer efficiency; it is an 466 efficient way to deliver direct data placement and remove a major 467 part of the problem: data copies. RDMA also addresses other 468 overheads, e.g., underlying protocol offload, and offers separation 469 of control information from data. 471 The current NFS message layout provides the performance enhancing 472 opportunity for an NFS over RDMA protocol that separates the 473 control information from data chunks while meeting the alignment 474 needs of both. The data chunks can be copied "directly" between 475 the client and server memory addresses above (with a single 476 occurrence on each memory bus) while the control information can be 477 passed "inline". [RPCRDMA] describes such a protocol. 479 6. Conclusions 481 NFS version 4 [RFC3530] has been granted "Proposed Standard" 482 status. The NFSv4 protocol was developed along several design 483 points, important among them: effective operation over wide- area 484 networks, including the Internet itself; strong security 485 integrated into the protocol; extensive cross-platform 486 interoperability including integrated locking semantics compatible 487 with multiple operating systems; and (this is key), protocol 488 extension. 490 NFS version 4 is an excellent base on which to add the needed 491 performance enhancements and improved semantics described above. 492 The minor versioning support defined in NFS version 4 was designed 493 to support protocol improvements without disruption to the 494 installed base. Evolutionary improvement of the protocol via minor 495 versioning is a conservative and cautious approach to current and 496 future problems and shortcomings. 498 Many arguments can be made as to the efficacy of the file 499 abstraction in meeting the future needs of enterprise data service 500 and the Internet. Fine grained Quality of Service (QoS) policies 501 (e.g., data delivery, retention, availability, security, ...) are 502 high among them. 504 It is vital that the NFS protocol continue to provide these 505 benefits to a wide range of applications, without its usefulness 506 being compromised by concerns about performance and semantic 507 inadequacies. This can reasonably be addressed in the existing NFS 508 protocol framework. A cautious evolutionary improvement of 509 performance and semantics allows building on the value already 510 present in the NFS protocol, while addressing new requirements that 511 have arisen from the application of networking technology. 513 7. Security Considerations 515 The NFS protocol, in conjunction with its layering on RPC, provides 516 a rich and widely interoperable security model to applications and 517 systems. Any layering of NFS over RDMA transports must address the 518 NFS security requirements, and additionally must ensure that no new 519 vulnerabilities are introduced. For RDMA, the integrity, and any 520 privacy, of the data stream are of particular importance. 522 The core goals of an NFS-to-RDMA binding are to reduce overhead and 523 to enable high performance. To support these goals while 524 maintaining required NFS security protection presents a special 525 challenge. Historically, the provision of integrity and privacy 526 have been implemented within the RPC layer, and their operation 527 requires local processing of messages exchanged with the RPC peer. 528 This procesing imposes memory and processing overhead on a per- 529 message basis, exactly the overhead that RDMA is designed to avoid. 531 Therefore, it is a requirement that the RDMA transport binding 532 provide a means to delegate the integrity and privacy processing to 533 the RDMA hardware, in order to maintain the high level of 534 performance desired from the approach, while simultaneously 535 providing the existing highest levels of security required by the 536 NFS protocol. This in turn requires a means by which the RPC layer 537 may invoke these services from the RDMA provider, and for the NFS 538 layer to negotiate their use end-to-end. 540 The "Channel Binding" concept [RFC5056] provides a means by which 541 the RPC and NFS layers may delegate their session protection to the 542 lower RDMA layers. An extension to the RPCSEC_GSS protocol 543 [RPCSECGSSV2] may then be specified to negotiate the use of these 544 bindings, and to establish the shared secrets necessary to protect 545 the sessions. 547 The protocol described in [RPCRDMA] specifies the use of these 548 mechanisms, and they are required to implement the protocol. 550 An additional consideration is protection of the integrity and 551 privacy of local memory by the RDMA transport itself. The use of 552 RDMA by NFS must not introduce any vulnerabilities to system memory 553 contents, or to memory owned by user processes. These protections 554 are provided by the RDMA layer specifications, and specifically 555 their security models. It is required that any RDMA provider used 556 for NFS transport be conformant to the requirements of [RFC5042] in 557 order to satisfy these protections. 559 8. IANA Considerations 561 This document has no IANA considerations. 563 9. Acknowledgements 565 The authors wish to thank Jeff Chase who provided many useful 566 suggestions. 568 10. Normative References 570 [RFC3530] 571 S. Shepler, et al., "NFS Version 4 Protocol", Standards Track 572 RFC 574 [RFC1831bis] 575 R. Thurlow, Ed., "RPC: Remote Procedure Call Protocol 576 Specification Version 2", Standards Track RFC 578 [RFC4506] 579 M. Eisler, Ed. "XDR: External Data Representation Standard", 580 Standards Track RFC 582 [RFC1813] 583 B. Callaghan, B. Pawlowski, P. Staubach, "NFS Version 3 584 Protocol Specification", Informational RFC 586 [RPCSECGSSV2] 587 M. Eisler, "RPCSEC_GSS Version 2", Internet Draft Work In 588 Progress, draft-ietf-nfsv4-rpcsec-gss-v2 590 [RFC5056] 591 N. Williams, "On the Use of Channel Bindings to Secure 592 Channels", Standards Track RFC 594 [RFC5042] 595 J. Pinkerton, E. Deleganes, "Direct Data Placement Protocol 596 (DDP) / Remote Direct Memory Access Protocol (RDMAP) Security" 597 Standards Track RFC 599 11. Informative References 601 [BRU99] 602 J. Brustoloni, "Interoperation of copy avoidance in network 603 and file I/O", in Proc. INFOCOM '99, pages 534-542, New York, 604 NY, Mar. 1999., IEEE. Also available from 605 http://www.cs.pitt.edu/~jcb/publs.html 607 [CAL+03] 608 B. Callaghan, T. Lingutla-Raj, A. Chiu, P. Staubach, O. Asad, 609 "NFS over RDMA", in Proceedings of ACM SIGCOMM Summer 2003 610 NICELI Workshop. 612 [CHA+01] 613 J. S. Chase, A. J. Gallatin, K. G. Yocum, "Endsystem 614 optimizations for high-speed TCP", IEEE Communications, 615 39(4):68-74, April 2001. 617 [CHA+99] 618 J. S. Chase, D. C. Anderson, A. J. Gallatin, A. R. Lebeck, K. 619 G. Yocum, "Network I/O with Trapeze", in 1999 Hot 620 Interconnects Symposium, August 1999. 622 [CHU96] 623 H.K. Chu, "Zero-copy TCP in Solaris", Proc. of the USENIX 1996 624 Annual Technical Conference, San Diego, CA, January 1996 626 [DCK+03] 627 M. DeBergalis, P. Corbett, S. Kleiman, A. Lent, D. Noveck, T. 628 Talpey, M. Wittle, "The Direct Access File System", in 629 Proceedings of 2nd USENIX Conference on File and Storage 630 Technologies (FAST '03), San Francisco, CA, March 31 - April 631 2, 2003 633 [FJDAFS] 634 Fujitsu Prime Software Technologies, "Meet the DAFS 635 Performance with DAFS/VI Kernel Implementation using cLAN", 636 available from 637 http://www.pst.fujitsu.com/english/dafsdemo/index.html, 2001. 639 [FJNFS] 640 Fujitsu Prime Software Technologies, "An Adaptation of VIA to 641 NFS on Linux", available from 642 http://www.pst.fujitsu.com/english/nfs/index.html, 2000. 644 [GAL+99] 645 A. Gallatin, J. Chase, K. Yocum, "Trapeze/IP: TCP/IP at Near- 646 Gigabit Speeds", 1999 USENIX Technical Conference (Freenix 647 Track), June 1999. 649 [KM02] 650 K. Magoutis, "Design and Implementation of a Direct Access 651 File System (DAFS) Kernel Server for FreeBSD", in Proceedings 652 of USENIX BSDCon 2002 Conference, San Francisco, CA, February 653 11-14, 2002. 655 [MAF+02] 656 K. Magoutis, S. Addetia, A. Fedorova, M. Seltzer, J. Chase, D. 657 Gallatin, R. Kisley, R. Wickremesinghe, E. Gabber, "Structure 658 and Performance of the Direct Access File System (DAFS)", in 659 Proceedings of 2002 USENIX Annual Technical Conference, 660 Monterey, CA, June 9-14, 2002. 662 [MOG03] 663 J. Mogul, "TCP offload is a dumb idea whose time has come", 664 9th Workshop on Hot Topics in Operating Systems (HotOS IX), 665 Lihue, HI, May 2003. USENIX. 667 [NFSv4.1] 668 S. Shepler, ed., "NFSv4 Minor Version 1" Internet Draft work- 669 in-progress, draft-ietf-nfsv4-minorversion1 671 [PAI+00] 672 V. S. Pai, P. Druschel, W. Zwaenepoel, "IO-Lite: a unified I/O 673 buffering and caching system", ACM Trans. Computer Systems, 674 18(1):37-66, Feb. 2000. 676 [RDDP] 677 RDDP Working Group charter, 678 http://www.ietf.org/html.charters/rddp-charter.html 680 [RFC4297] 681 A. Romanow, J. Mogul, T. Talpey, S. Bailey, "Remote Direct 682 Memory Access (RDMA) over IP Problem Statement", Informational 683 RFC 685 [RFC1094] 686 Sun Microsystems, "NFS: Network File System Protocol 687 Specification" 689 [RPCRDMA] 690 T. Talpey, B. Callaghan, "RDMA Transport for ONC RPC", 691 Internet Draft Work in Progress, draft-ietf-nfsv4-rpcrdma 693 [SHI+03] 694 P. Shivam, J. Chase, "On the Elusive Benefits of Protocol 695 Offload", Proceedings of ACM SIGCOMM Summer 2003 NICELI 696 Workshop, also available from 697 http://issg.cs.duke.edu/publications/niceli03.pdf 699 [SKE+01] 700 K.-A. Skevik, T. Plagemann, V. Goebel, P. Halvorsen, 701 "Evaluation of a Zero-Copy Protocol Implementation", in 702 Proceedings of the 27th Euromicro Conference - Multimedia and 703 Telecommunications Track (MTT'2001), Warsaw, Poland, September 704 2001. 706 Authors' Addresses 708 Tom Talpey 709 Network Appliance, Inc. 710 1601 Trapelo Road, #16 711 Waltham, MA 02451 USA 713 Phone: +1 781 768 5329 714 Email: thomas.talpey@netapp.com 716 Chet Juszczak 717 Chet's Boathouse Co. 718 P.O. Box 1467 719 Merrimack, NH 03054 721 Email: chetnh@earthlink.net 723 Intellectual Property and Copyright Statements 725 Full Copyright Statement 727 Copyright (C) The IETF Trust (2008). 728 This document is subject to the rights, licenses and restrictions 729 contained in BCP 78, and except as set forth therein, the authors 730 retain all their rights. 732 This document and the information contained herein are provided on 733 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 734 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE 735 IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL 736 WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 737 WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE 738 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 739 FOR A PARTICULAR PURPOSE. 741 Intellectual Property 742 The IETF takes no position regarding the validity or scope of any 743 Intellectual Property Rights or other rights that might be claimed 744 to pertain to the implementation or use of the technology described 745 in this document or the extent to which any license under such 746 rights might or might not be available; nor does it represent that 747 it has made any independent effort to identify any such rights. 748 Information on the procedures with respect to rights in RFC 749 documents can be found in BCP 78 and BCP 79. 751 Copies of IPR disclosures made to the IETF Secretariat and any 752 assurances of licenses to be made available, or the result of an 753 attempt made to obtain a general license or permission for the use 754 of such proprietary rights by implementers or users of this 755 specification can be obtained from the IETF on-line IPR repository 756 at http://www.ietf.org/ipr. 758 The IETF invites any interested party to bring to its attention any 759 copyrights, patents or patent applications, or other proprietary 760 rights that may cover technology that may be required to implement 761 this standard. Please address the information to the IETF at ietf- 762 ipr@ietf.org. 764 Acknowledgment 765 Funding for the RFC Editor function is provided by the IETF 766 Administrative Support Activity (IASA).