idnits 2.17.1 draft-eisler-nfsv4-minorversion-2-requirements-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 19, 2009) is 5302 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 3530 (Obsoleted by RFC 7530) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force M. Eisler, Ed. 3 Internet-Draft NetApp 4 Intended status: Informational October 19, 2009 5 Expires: April 22, 2010 7 Requirements for NFSv4.2 8 draft-eisler-nfsv4-minorversion-2-requirements-01 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with the 13 provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt. 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 This Internet-Draft will expire on April 22, 2010. 33 Copyright Notice 35 Copyright (c) 2009 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents in effect on the date of 40 publication of this document (http://trustee.ietf.org/license-info). 41 Please review these documents carefully, as they describe your rights 42 and restrictions with respect to this document. 44 Abstract 46 This document proposes requirements for NFSv4.2. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 51 1.1. Requirements Language . . . . . . . . . . . . . . . . . . . 3 52 2. Efficiency and Utilization Requirements . . . . . . . . . . . . 3 53 2.1. Capacity . . . . . . . . . . . . . . . . . . . . . . . . . 3 54 2.2. Network Bandwidth and Processing . . . . . . . . . . . . . 5 55 3. Flash Memory Requirements . . . . . . . . . . . . . . . . . . . 5 56 4. Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . 6 57 5. Incremental Improvements . . . . . . . . . . . . . . . . . . . 6 58 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 59 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 60 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8 61 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 62 9.1. Normative References . . . . . . . . . . . . . . . . . . . 8 63 9.2. Informative References . . . . . . . . . . . . . . . . . . 8 64 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 8 66 1. Introduction 68 NFSv4.1 [I-D.ietf-nfsv4-minorversion1] is an approved specification. 69 The NFSv4 [RFC3530] community has indicated a desire to continue 70 innovating NFS, and specifically via a new minor version of NFSv4, 71 namely NFSv4.2. The desire for future innovation is primarily driven 72 by two trends in the storage industry: 74 o High efficiency and utilization of resources such as, capacity, 75 network bandwidth, and processors. 77 o Solid state flash storage which promises faster throughput and 78 lower latency than magnetic disk drives and lower cost than 79 dynamic random access memory. 81 Secondarily, innovation is being driver by the trend to stronger 82 compliance with information management. In addition, as might be 83 expected with a complex protocol like NFSv4.1, implementation 84 experience has shown that minor changes to the protocol would be 85 useful to improve the end user experience. 87 This document proposes requirements along these four themes, and 88 attempts to strike a balance between stating the problem and 89 proposing solutions. With respect to the latter, some thinking among 90 the NFS community has taken place, and a future revision of this 91 document will reference embodiments of such thinking. 93 1.1. Requirements Language 95 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 96 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 97 document are to be interpreted as described in RFC 2119 [RFC2119]. 99 2. Efficiency and Utilization Requirements 101 2.1. Capacity 103 Despite the capacity of magnetic disk continuing to increase at 104 exponential rates, the storage industry is under pressure to make the 105 storage of data increasingly efficient, so that more data can be 106 stored. The driver for this counter-intuitive demand is that disk 107 access times are not improving anywhere near as quickly as 108 capacities. The industry has responded to this development by 109 increasing data density via limiting the number of times a unique 110 pattern of data is stored in a storage device. For example some 111 storage devices support de-duplication. When storing two files, a 112 storage device might compare them for shared patterns of data, and 113 store the pattern just once, and setting reference counts on the 114 blocks of the unique pattern to two. With de-duplication the number 115 of times a storage device has to read a particular pattern would be 116 reduced to just once, thus improving average access time. 118 For a file access protocol such as NFS, there are several implied 119 requirements for addressing this capacity efficiency trend: 121 o The "space_used" attribute of NFSv4 does not report meaningful 122 information. Removing a file with a "space_used" value of X bytes 123 does not mean that the file system will see an increase of X 124 available bytes. Providing more meaningful information is a 125 requirement. 127 o Because it is probable, especially for applications such as 128 hypervisors, the NFSv4 client is accessing multiple files with 129 shared blocks of data, it is in the interest of the client and 130 server for the client to know which blocks are share so that they 131 are are not read multiple times, and not cached multiple times. 132 Providing a block map of shared blocks is a requirement. 134 o If an NFSv4 client is aware of which patterns exist on which 135 files, when it wants to write pattern X to file B to offset J, and 136 it knows that X also exists in offset I of file A, then if it can 137 advise the server of its intent, the server can arrange for 138 pattern X to appear in file A being a zero copy. Even if the 139 server does not support de-duplication, it can at least perform a 140 local copy that saves network bandwidth and processor overhead on 141 the client and server. 143 o File holes are patterns of zeros that in some file systems do are 144 unallocated blocks. In a sense, holes are the ultimate de- 145 duplicated pattern. While proposals to extend NFS to support hole 146 punching have been around since the 1980s, until recently there 147 have not been NFS clients that could make use of hole punching. 148 The Information Technology (IT) trend toward virtualizing 149 operating environments via hypervisors has resulted in a need for 150 hypervisors to translate a (virtual) disk command to free a block 151 into an NFS request to free that block. On the read side, if a 152 file contains holes, then again, as the ultimate in de- 153 duplication, it would be better for the client to be told the 154 region it wants to read has a hole, instead of of returning long 155 arrays of zero bytes. Even if a server does not support holes on 156 write or read, avoiding the transmission of zeroes will save 157 network bandwidth and reduce processor overhead. 159 2.2. Network Bandwidth and Processing 161 The computational capabilities of processors continues to grow at an 162 exponential rate. However, as noted previously, because disk access 163 times are not showing a commensurate exponential decrease, disk 164 performance is not tracking processor performance. In addition, 165 while network bandwidth is exponentially increasing, unlike disk 166 capacities and processor bandwidth, the improvement is not seen on a 167 1-2 year cycle, but happens on something closer to a 10 year cycle. 168 The lag between disk and network performance compared to processor 169 performance means that there is often a discontinuity between the 170 processing capabilities of NFS clients and the speed at which they 171 can extract data from an NFS server. For some use cases, much of the 172 data that is read by one client from an NFS server also needs to be 173 read by other clients. Re-reading this data is will result in a 174 waste of the network bandwidth and processing of the NFS server. 175 This same observation has driven the creation of peer-to-peer content 176 distribution protocols, where data is directly read from peers rather 177 than servers. It is apparent that a similar technique could be used 178 to offload primary storage. 180 The pNFS protocol distributes the I/O to a set of files across a 181 cluster of data servers. Arguably, its primary value is in balancing 182 load across storage devices, especially when it can leverage a back 183 end file system or storage cluster with automatic load balancing 184 capabilities. In NFSv4.1, no consideration was given to metadata. 185 Metadata is critical to several workloads, to the point that, as 186 defined in NFSv4.1, pNFS will not not offer much value in those 187 cases. The load balancing capabilities of pNFS need to be brought to 188 metadata. 190 From an end user perspective, the operations performed on a file 191 include creating, reading, writing, deleting, and copying. NFSv4 has 192 operations for all but the last. While file copy has been proposed 193 for NFS in the past, it was always rejected because of the lack of 194 Application Programming Interfaces (APIs) within existing operating 195 environments to send a copy operation. The IT trend toward 196 virtualization via hypervisors has changed the situation, where the 197 emerging use case is to copy a virtual disk. The use of a copy 198 operation will save network bandwidth on the client and server, and 199 where the server supports it, intra-server file copy has the 200 potential to avoid all physical data copy. 202 3. Flash Memory Requirements 204 Flash memory is rapidly filling the wide gap between expensive but 205 fast Dynamic Random Access Memory (DRAM) and inexpensive but cheap 206 magnetic disk. The cost per bit of flash is between DRAM and disk. 207 The access time pet bit of flash is between DRAM and disk. This has 208 resulted in the File access Operations Per Second (FOPS) per unit of 209 cost of flash exceeding DRAM and disk. Flash can be easily added as 210 another storage medium to NFS servers, and this does not require a 211 change to the NFS protocol. However, the value of flash's superior 212 FOPS is best realized when flash is closest to the application, i.e. 213 on the NFS client. One approach would be to forgo the use of network 214 storage and de-evolve back to Direct Attached Storage (DAS). 215 However, this would require that data protection value that exists in 216 modern storage devices be brought into DAS, and this is not always 217 convenient or cost effective. A less traumatic way to leverage the 218 full FOPS of flash would be for NFSv4 clients to leverage flash for 219 caching of data. 221 Today NFSv4 supports whole file delegations for enabling caching. 222 Such a granularity is useful for applications like user home 223 directories where there is little file sharing. However, NFS is used 224 for many more workloads, which include file sharing. In these 225 workloads, files are shared, whereas individual blocks might not be. 226 This drives a requirement for sub-file caching. 228 4. Compliance 230 New regulations for the IT industry limit who can view what data. 231 NFSv4 has Access Control Lists (ACLs), but the ACL can be changed by 232 the nominal file owner. In practice, the end user that owns the file 233 (essentially, has the right to delete the file or give permissions to 234 other users), is not the legal owner of the file. The legal owner of 235 the file wants to control not just who can access the file, but who 236 they can pass the content of the file to. The IT industry has 237 addressed this need in the past with notion of security labeling. 238 Labels are attached to devices, files, users, applications, network 239 connections, etc. When the labels of two objects match, data can be 240 transferred from one to another. For example a label called "Secret" 241 on a file results in only users with a "Secret" security clearance 242 being allowed to view the file, despite what the ACL says. 244 To attach a label on a file requires that it be created atomically 245 with the file, which means that a new RECOMMENDED attribute for a 246 security label is needed. 248 5. Incremental Improvements 250 Implementation experience with NFSv4.1 and related protocols, such as 251 SMB2, has shown a number of areas where the protocol can be improved. 253 o Hints for the type of file access, such as sequential read. While 254 traditionally NFS servers have been able to detect read-a-head 255 patterns, with the introduction of pNFS, this will be harder. 256 Since NFS clients can detect patterns of access, they can advise 257 servers. In addition, the UNIX/Linux madvise() API is an example 258 of where applications can provide direct advice to the NFS server. 260 o Head of line blocking. Consider a client that wants to send a 261 three operations: a file creation, a read for one megabyte, and a 262 write for one megabyte. Each of these might be sent on a separate 263 slot. The client determines that it is not desirable for the read 264 operation to wait for the write operation to be sent, so it sends 265 the create. However, it does not want to serialize the read and 266 write behind the create, so the read gets sent, followed by the 267 write. On the reply side, the server does not know that client 268 wants the create satisfied first, so read and write operations are 269 first processed. By the time the create is performed on the 270 server, the response to the read is still filling the reply side. 271 While NFSv4.1 could solve this problem by associating two 272 connections with the session, and using one connection for create, 273 and the other for read or write, multiple connections come at a 274 cost. The requirement is to solve this head of line blocking 275 problem. Tagging a request as one that should go to the head of 276 the line for request and response processing is one possible way 277 to address it. 279 o pNFS connectivity/access indication. If a pNFS client is given a 280 layout that directs it to a storage device it cannot access due to 281 connectivity of access control issues, it has no way in NFSv4.1 to 282 indicate the problem to the metadata server. 284 o RPCSEC_GSS sequence window size on backchannel. The NFSv4.1 285 specification does not have a way to for the client to tell the 286 server what window size to use on the backchannel. The 287 specification says that the window size will be the same as what 288 the server uses. Potentially, a server could use a very large 289 window size that the client does not want. 291 o Trunking discovery. The NFSv4.1 specification is long on how a 292 client verifies if trunking is available between two connections, 293 but short on how a client can discover destination addresses that 294 can be trunked. It would be useful if there was a method (such as 295 an operation) to get a list of destinations that can be session or 296 client ID trunked, as well as a notification when the set of 297 destinations changes. 299 6. IANA Considerations 301 None. 303 7. Security Considerations 305 None. 307 8. Acknowledgements 309 Thanks to Dave Noveck for reviewing this document and providing 310 valuable feedback. 312 9. References 314 9.1. Normative References 316 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 317 Requirement Levels", BCP 14, RFC 2119, March 1997. 319 9.2. Informative References 321 [I-D.ietf-nfsv4-minorversion1] 322 Shepler, S., Eisler, M., and D. Noveck, "NFS Version 4 323 Minor Version 1", draft-ietf-nfsv4-minorversion1-29 (work 324 in progress), December 2008. 326 [RFC3530] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., 327 Beame, C., Eisler, M., and D. Noveck, "Network File System 328 (NFS) version 4 Protocol", RFC 3530, April 2003. 330 Author's Address 332 Michael Eisler (editor) 333 NetApp 334 5765 Chase Point Circle 335 Colorado Springs, CO 80919 336 US 338 Phone: +1 719 599 8759 339 Email: mike@eisler.com