idnits 2.17.1 draft-black-pnfs-block-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5 on line 645. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 622. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 629. ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line 649), which is fine, but *also* found old RFC 2026, Section 10.4C, paragraph 1 text on line 37. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document seems to lack an RFC 3979 Section 5, para. 3 IPR Disclosure Invitation -- however, there's a paragraph with a matching beginning. Boilerplate error? Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 322 has weird spacing: '...gnature info ...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 3, 2005) is 6874 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-03) exists of draft-welch-pnfs-ops-01 -- Possible downref: Normative reference to a draft: ref. 'WELCH-OPS' Summary: 6 errors (**), 0 flaws (~~), 4 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 NFSv4 Working Group David L. Black 2 Internet Draft Stephen Fridella 3 Expires: December 2005 EMC Corporation 4 June 3, 2005 6 pNFS Block/Volume Layout 7 draft-black-pnfs-block-00.txt 9 Status of this Memo 11 By submitting this Internet-Draft, each author represents that 12 any applicable patent or other IPR claims of which he or she is 13 aware have been or will be disclosed, and any of which he or she 14 becomes aware will be disclosed, in accordance with Section 6 of 15 BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html 33 This Internet-Draft will expire in December 2005. 35 Copyright Notice 37 Copyright (C) The Internet Society (2005). All Rights Reserved. 39 Abstract 41 Parallel NFS (pNFS) extends NFSv4 to allow clients to directly access 42 file data on the storage used by the NFSv4 server. This ability to 43 bypass the server for data access can increase both performance and 44 parallelism, but requires additional client functionality for data 45 access, some of which is dependent on the class of storage used. The 46 main pNFS operations draft specifies storage-class-independent 47 extensions to NFS; this draft specifies the additional extensions 48 (primarily data structures) for use of pNFS with block and volume 49 based storage. 51 Conventions used in this document 53 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 54 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 55 document are to be interpreted as described in RFC-2119 [RFC2119]. 57 Table of Contents 59 1. Introduction...................................................3 60 2. Background and Architecture....................................3 61 2.1. Data Structures: Extents and Extent Lists.................4 62 2.1.1. Layout Requests and Extent Lists.....................6 63 2.1.2. Extents Have Lock-like Behavior......................6 64 2.2. Volume Identification.....................................7 65 3. Operations Issues..............................................9 66 3.1. Ordering Issues..........................................10 67 3.2. Crash Recovery Issues....................................11 68 3.3. Additional Features - Not Needed or Recommended..........12 69 4. Security Considerations.......................................12 70 5. Conclusions...................................................13 71 6. Acknowledgments...............................................13 72 7. References....................................................13 73 7.1. Normative References.....................................13 74 7.2. Informative References...................................14 75 Author's Addresses...............................................14 76 Intellectual Property Statement..................................14 77 Disclaimer of Validity...........................................15 78 Copyright Statement..............................................15 79 Acknowledgment...................................................15 81 NOTE: This is an early stage draft. It's still rough in places, with 82 significant work to be done. 84 1. Introduction 86 Figure 1 shows the overall architecture of a pNFS system: 88 +-----------+ 89 |+-----------+ +-----------+ 90 ||+-----------+ | | 91 ||| | NFSv4 + pNFS | | 92 +|| Clients |<------------------------------>| Server | 93 +| | | | 94 +-----------+ | | 95 ||| +-----------+ 96 ||| | 97 ||| | 98 ||| +-----------+ | 99 ||| |+-----------+ | 100 ||+----------------||+-----------+ | 101 |+-----------------||| | | 102 +------------------+|| Storage |------------+ 103 +| Systems | 104 +-----------+ 106 Figure 1 pNFS Architecture 108 The overall approach is that pNFS-enhanced clients obtain sufficient 109 information from the server to enable them to access the underlying 110 storage (on the Storage Systems) directly. See [WELCH-OPS] for more 111 details. This draft is concerned with access from pNFS clients to 112 Storage Systems over storage protocols based on blocks and volumes, 113 such as the SCSI protocol family (e.g., parallel SCSI, FCP for Fibre 114 Channel, iSCSI, SAS). This class of storage is referred to as 115 block/volume storage. While the Server to Storage System protocol is 116 not of concern for interoperability here, it will typically also be a 117 block/volume protocol when clients use block/volume protocols. 119 2. Background and Architecture 121 The fundamental storage abstraction supported by block/volume storage 122 is a storage volume consisting of a sequential series of fixed size 123 blocks. This can be thought of as a logical disk; it may be realized 124 by the Storage System as a physical disk, a portion of a physical 125 disk or something more complex (e.g., concatenation, striping, RAID, 126 and combinations thereof) involving multiple physical disks or 127 portions thereof. 129 A pNFS layout for this block/volume class of storage is responsible 130 for mapping from an NFS file (or portion of a file) to the blocks of 131 storage volumes that contain the file. The blocks are expressed as 132 extents with 64 bit offsets and lengths using the existing NFSv4 133 offset4 and length4 types. Clients must be able to perform I/O to 134 the block extents without affecting additional areas of storage 135 (especially important for writes), therefore extents MUST be aligned 136 to 512-byte boundaries, and SHOULD be aligned to the block size used 137 by the NFSv4 server in managing the actual filesystem (4 kilobytes 138 and 8 kilobytes are common block sizes). 140 OPEN ISSUE: Client ability to ask server for block size - if block 141 size is constant per filesystem (fsid), it can enable internal client 142 optimizations. Constant filesystem block size is probably the common 143 case - an additional (required) FS attribute would suffice. 145 This draft draws extensively on the authors' familiarity with the the 146 mapping functionality and protocol in EMC's HighRoad system. The 147 protocol used by HighRoad is called FMP (File Mapping Protocol); it 148 is an add-on protocol that runs in parallel with filesystem protocols 149 such as NFSv3 to provide pNFS-like functionality for block/volume 150 storage. While drawing on HighRoad FMP, the data structures and 151 functional considerations in this draft differ in significant ways, 152 based on lessons learned and the opportunity to take advantage of 153 NFSv4 features such as COMPOUND operations. 155 2.1. Data Structures: Extents and Extent Lists 157 A pNFS layout is a list of extents with associated properties. EAch 158 extent MUST be at least 512-byte aligned. 160 struct extent { 162 offset4 file_offset;/* the logical location in the file */ 164 length4 extent_length; /* the size of this extent in file and 165 and on storage */ 167 pnfs_deviceid4 volume_ID; /* the logical volume/physical device 168 that this extent is on */ 170 offset4 storage_offset;/* the logical location of 171 this extent in the volume */ 173 extentState4 es; /* the state of this extent */ 175 }; 176 enum extentState4 { 178 VALID_DATA = 0, /* the data located by this extent is valid for 179 reading and writing. */ 181 INVALID_DATA = 1, /* the location is valid; the data is invalid. 182 It could be overwritten by the valid data. 183 It is a newly (pre-) allocated extent. There 184 is physical space. */ 186 NONE_DATA = 2, /* the location is invalid. It is a hole in the 187 file. There is no physical space. */ 189 }; 191 The file_offset, extent_length, and es fields for an extent returned 192 from the server are always valid. The interpretation of the 193 storage_offset field depends on the value of es as follows: 195 o VALID_DATA means that storage_offset is valid, and points to 196 valid/initialized data which can and should be fetched from the 197 disk to satisfy read requests (and partial-block write requests). 199 o INVALID_DATA means that storage_offset is valid, but points to 200 invalid uninitialized data. This data must not be physically read 201 from the disk until it has been initialized. Read request from an 202 INVALID_DATA extent, must fill the user buffer with zeros. Write 203 requests must write whole blocks to the disk. Bytes not 204 initialized by the user must be set to zero. INVALID_DATA extents 205 are returned by requests for writeable extents; they are never 206 returned if the request was only for reading.. 208 o NONE_DATA means that storage_offset is not valid, and this extent 209 may not be used to satisfy write requests. Read requests may be 210 satisfied by zero-filling as for INVALID_DATA. NONE_DATA extents 211 are returned by requests for readable extents; they are never 212 returned if the request was for a writeable extent. 214 The volume_ID field for an extent returned by the server is used to 215 identify the logical volume on which this extent resides, and its 216 interpretation depends on the volume-management protocol being used 217 by the client and server. 219 The extent list lists all relevant extents in increasing order of the 220 file_offset of each extent. 222 typedef extent extentList; /* MAX_EXTENTS = 256; */ 224 2.1.1. Layout Requests and Extent Lists 226 Each request for a layout specifies at least three parameters: 227 offset, desired size, and minimum size (the desired size is missing 228 from the operations draft - see Section 3). If the status of a 229 request indicates success, the extent list returned must meet the 230 following criteria: 232 o A request for a readable (but not writeable layout returns only 233 VALID_DATA or NONE_DATA extents (but not INVALID_DATA extents). 235 o A request for a writeable layout returns only VALID_DATA or 236 INVALID_DATA extents (but not NONE_DATA extents). 238 o The first extent in the list MUST contain the starting offset. 240 o The total size of extents in the extent list MUST cover at least 241 the minimum size and no more than the desired size. One exception 242 is allowed: the total size MAY be smaller if only readable extents 243 were requested and EOF is encountered. 245 o Extents in the extent list MUST be logically contiguous and non- 246 overlapping). 248 2.1.2. Extents Have Lock-like Behavior 250 Extents returned to pNFS clients function as locks in that they grant 251 clients permission to read or write. Both read/write and write/write 252 conflicts must be controlled by the pNFS server as a read/write 253 conflict may cause a read to return a mixture of before-write and 254 after-write data from a block-based storage system and a write/write 255 conflict may cause the result on the block-based storage system to be 256 a mixture of data from the two write operations; both of these 257 outcomes are unacceptable, as in the absence of pNFS, the NFSv4 258 server would have correctly sequenced the conflicting operations to 259 avoid this mixing. This is particularly nasty if the underlying 260 storage is striped and the operations complete in different orders on 261 the different stripes. 263 A client which makes a layout request that conflicts with an existing 264 layout delegation will be rejected with the error NFS4_Locked 265 (OPEN_ISSUE: New error code needed?). This client is then expected 266 to retry the request after a short interval. During this interval 267 the server needs to recall the conflicting portion of the layout 268 delegation from the client that currently holds it. It has been 269 noted that this mode of reject/retry operation does not prevent a 270 requesting client from being starved when there is contention for the 271 layout of a particular file. For this reason a pNFS server SHOULD 272 implement a mechanism to prevent starvation. One possibility is that 273 the server can maintain a queue of rejected layout requests. Each 274 new layout request can be checked to see if it conflicts with a 275 previous rejected request, and if so, the newer request can be 276 rejected. Once the original requesting client retries its request, 277 its entry in the rejected request queue can be cleared, or the entry 278 in the rejected request queue can be removed when it reaches a 279 certain age. 281 NFSv4 supports mandatory locks and share reservations. These are 282 mechanisms that clients can use to restrict the set of IO operations 283 that are permissible to other clients. Since all IO operations 284 ultimately arrive at the NFSv4 server for processing, the server is 285 in a position to enforce these restrictions. However, with pNFS 286 layout delegations, IOs will be issued from the clients that hold the 287 delegations directly to the storage devices that host the data. 288 These devices have no knowledge of files, mandatory locks, or share 289 reservations, and are not in a position to enforce such restrictions. 290 For this reason the NFSv4 server must not grant layout delegations 291 that conflict with mandatory locks or share reservations. 292 Furthermore, if a conflicting mandatory lock request or a conflicting 293 open request arrives at the server, the server must recall the part 294 of the layout delegation in conflict with the request before 295 processing the request. 297 2.2. Volume Identification 299 Storage Systems such as storage arrays can have multiple physical 300 network ports that need not be connected to a common network, 301 resulting in a pNFS client having simultaneous multipath access to 302 the same storage volumes via different ports on different networks. 303 The networks may not even be the same technology - for example, 304 access to the same volume via both iSCSI and Fibre Channel is 305 possible, hence network address are difficult to use for volume 306 identification. For this reason, this pNFS block layout identifies 307 storage volumes by content, for example providing the means to match 308 (unique portions of) labels used by volume managers. Any block pNFS 309 system using this layout MUST support a means of content-based unique 310 volume identification that can be employed via the data structure 311 given here. 313 A volume is content-identified by a disk signature made up of extents 314 within blocks and contents that must match. 316 block_device_addr_list - A list of the disk signatures for the 317 physical volumes on which the file system resides. This is list of 318 variable number of diskSigInfo structures. This is the 319 device_addr_list<> as returned by GETDEVICELIST in [WELCH-OPS] 321 typedef diskSigInfo block_device_addr_list; 322 /* disksignature info */ 324 where diskSigInfo is: 326 struct diskSigInfo { /* used in DISK_SIGNATURE */ 327 diskSig ds; /* disk signature */ 329 pnfs_deviceid4 volume_ID; /* volume ID the server will use in 330 extents. */ 332 }; 334 where diskSig is defined as: 336 typedef sigComp diskSig; 338 struct sigComp { /* disk signature component */ 340 offset4 sig_offset; /* byte offset of component */ 342 length4 sig_length; /* byte length of component */ 344 sigCompContents contents; /* contents of this component of the 345 signature (this is opaque) */ 347 }; 349 sigCompContents MUST NOT be interpreted as a zero-terminated string, 350 as it may contain embedded zero-valued octets. It contains 351 sig_length octets. There are no restrictions on alignment (e.g., 352 neither sig_offset nor sig_length need to be multiples of 4). 354 3. Operations Issues 356 This section collects issues in the operations draft encountered in 357 writing this block/volume layout draft. 359 1. Request for a layout (LAYOUTGET) only conveys minimum required 360 size - for the block storage class, a desired size is also useful. 361 This allows the client to ask for a good size for performance but 362 allow the server to reduce the size when other clients are 363 actively writing different areas of the file for conflict 364 management. 366 2. The operations draft treats a layout returned by an operation as 367 an indivisible object (at least for callback and return - commit 368 seems to only be able to handle one extent). For block storage 369 layouts, it is important to be able to recall, commit, or return a 370 portion of a layout. The server needs to be in control of the 371 conflict granularity to minimize the impact of false sharing, and 372 the client needs to be able to manage its layout state in a 373 flexible fashion. 375 3. Need a callback to set EOF. The underlying issue here is that 376 block pNFS clients have to handle EOF enforcement because the 377 Storage Systems have no concept of file, let alone EOF. Hence 378 client interactions based on EOF changes (e.g., one client 379 truncates file, another tries to write beyond new EOF) require 380 updates to tell clients that the EOF has moved. Calling back 381 layouts beyond the new EOF to force the client to check for EOF 382 change is both inefficient and overkill. 384 4. HighRoad supports three additional types of layout recalls - 385 "everything in a file", "everything in a list of files", 386 "everything in a filesystem". HighRoad also supports an 387 "everything in a file" layout return. The "everything in a file" 388 type is very convenient to get rid of all state for a file. The 389 "everything in a filesystem" is crucial to get unmount of a busy 390 filesystem to actually work. The "everything in a list of files" 391 turns out to be useful for quota situations, although it's a bit 392 blunt - when a user is nearing her quota, recall her writeable 393 layouts to force the commits needed to manage the quota. OPEN 394 ISSUE: This may not be the best way to handle approaching a quota 395 limit. 397 5. Access and Modify time behavior. Any LAYOUTCOMMIT operation 398 should implicitly set both the Access and Modify times. 399 LAYOUTRETURN needs flags saying whether to set Access time or 400 Access and Modify times or neither. 402 6. The disk signature approach to volume identification is noted in 403 the [WELCH-OPS] draft, but the data structures in the -01 version 404 of that draft do not support it. 406 3.1. Ordering Issues 408 This deserves its own subsection because there is some serious 409 subtlety here. High Road uses two mechanisms for ordering: 411 1. In contrast to NFSv4 callbacks that expect immediate responses, 412 HighRoad layout callback responses may be delayed to allow a 413 client to perform any required commits, etc. prior to responding 414 to the callback. This allows the reply to the callback to serve 415 as an implicit return of the recalled range or ranges. For a 416 simple return case, this saves a round trip (client replies to 417 callback, doesn't have to issue a separate return). Another 418 useful case is that the response to a set EOF callback discards 419 all layout info beyond the block containing the new EOF (need 420 filesystem block size attribute for this to work). If NFSv4 style 421 callbacks that expect immediate responses are used, the client has 422 to perform an explicit LAYOUTRETURN. 424 2. HighRoad uses a server message number for operation sequencing, 425 which appears to correspond well to the layout stateid in [WELCH- 426 OPS], except that the server message number has per-file rather 427 than per-layout scope. The pNFS layout stateid should probably 428 have per-file scope in order to deal well with Issue 1 in Section 429 3 above. The server message number serves to ensure that a pNFS 430 client can process pNFS server replies (operation completions) and 431 callbacks *in the same order* as the pNFS server. 433 The delayed callback response creates an ordering issue in that the 434 client may immediately issue a LAYOUTGET for the range that its 435 callback reply returns - if that request crosses the callback reply 436 on the wire, the server must detect this reordering and tell the 437 client to retry. This does not require a sequence number/stateid 438 mechanism - the server must wait for the callback to finish before 439 processing any conflicting LAYOUTGET from the same client. With an 440 NFSv4-style callback, the client must wait for its LAYOUTRETURN to 441 complete before issuing the LAYOUTGET, so this issue does not arise. 443 In the reverse direction, the same "cross on the wire" scenario 444 applies, and requires a sequencing mechanism. The server may issue a 445 recall for a range covered by a LAYOUTGET immediately after returning 446 the layout to the client. If the recall arrives first, the client 447 has to queue it until the LAYOUTGET result comes back and process the 448 callback against that new layout. A variant on this that appears 449 similar to the client but requires a different response occurs when 450 the server issued the recall before processing the LAYOUTGET; in this 451 case the server will reject the LAYOUTGET as having a stale sequence 452 number/stateid (because that number/stateid was incremented by the 453 recall callback) and the client needs to process the callback before 454 retrying the LAYOUTGET. 456 3.2. Crash Recovery Issues 458 Client recovery for layout delegations works in much the same way as 459 NFSv4 client recovery for other lock/delegation state. When an NFSv4 460 client reboots, it will lose all information about the layout 461 delegations that it previously owned. There are two methods by which 462 the server can reclaim these resources and begin providing them to 463 other clients. The first is through the expiry of the client's 464 lock/delegation lease. If the client recovery time is longer than 465 the lease period, the client's lock/delegation lease will expire and 466 the server will know to reclaim any state held by the client. On the 467 other hand, the client may recover in less time than it takes for the 468 lease period to expire. In such a case, the client will be required 469 to contact the server through the standard SETCLIENTID protocol. The 470 server will find that the client's id matches the id of the previous 471 client invocation, but that the verifier is different. The server 472 uses this as a signal to reclaim all the state associated with the 473 client's previous invocation. 475 The server recovery case is slightly more complex. In general, the 476 recovery process will again follow the standard NFSv4 recovery model: 477 the client will discover that the server has rebooted when it 478 receives an unexpected STALE_STATEID or STALE_CLIENTID reply from the 479 server; it will then proceed to try to reclaim its previous 480 delegations during the server's recovery grace period. However there 481 is an important safety concern associated with layout delegations 482 that does not come into play in the standard NFSv4 case. If a 483 standard NFSv4 client makes use of a stale delegation, the 484 consequence could be to deliver stale data to an application. 485 However, the pNFS layout delegation enables the client to directly 486 access the file system storage---if this access is not properly 487 managed by the NFSv4 server the client can potentially corrupt the 488 file system data or meta-data. 490 Thus it is vitally important that the client discover that the server 491 has rebooted as soon as possible, and that the client stops using 492 stale layout delegations before the server gives the delegations away 493 to other clients. To ensure this, the client must be implemented so 494 that layout delegations are never used to access the storage after 495 the client's lease timer has expired. This prohibition applies to 496 all accesses, especially the flushing of dirty data to storage. If 497 the client's lease timer expires because the client could not contact 498 the server for any reason, the client MUST immediately stop using the 499 layout delegation until the server can be contacted and the 500 delegation can be officially recovered or reclaimed. 502 3.3. Additional Features - Not Needed or Recommended 504 This subsection is a place to record things that existing SAN or 505 clustered filesystems do that aren't needed or recommended for pNFS: 507 o Callback for write-to-read downgrade. Writers tend to want to 508 remain writers, so this feature isn't very useful. 510 o HighRoad FMP implements several frequently used operation 511 combinations as single RPCs for efficiency; these can be 512 effectively handled by NFSv4 COMPOUNDs. One subtle difference is 513 that a single RPC is treated as a single operation, whereas NFSv4 514 COMPOUNDs are not atomic in any sense. This can cause operation 515 ordering subtleties, such as having to set the new EOF *before* 516 returning the layout extent that contains the new EOF, even within 517 a single COMPOUND. 519 o Queued request support. The HighRoad FMP protocol specification 520 allows the server to return an "operation blocked" result code 521 with a cookie that is later passed to the client in a "it's done 522 now" callback. This has not proven to be of great use vs. having 523 the client retry with some sort of back-off. Recommendations on 524 how to back off should be added to the ops draft. 526 o Additional client and server crash detection mechanisms. As a 527 separate protocol, HighRoad FMP had to handle this on its own. As 528 an NFSv4 extension, NFSv4's SETCLIENTID, STALE CLIENTID and STALE 529 STATEID mechanisms combined with implicit lease renewal and (per- 530 file) layout stateids should be sufficient for pNFS. 532 o The use of separate read and write layouts to enable client 533 participation in copy-on-write (as in IBM's SAN.FS) does not seem 534 to be important to pNFS; this may be an implementation approach 535 that is unique to SAN.FS . 537 4. Security Considerations 539 Certain security responsibilities are delegated to pNFS clients. 540 Block/volume storage systems generally control access at a volume 541 granularity, and hence pNFS clients have to be trusted to only 542 perform accesses allowed by the layout extents it currently holds 543 (e.g., and not access storage for files on which a layout extent is 544 not held). This also has implications for some NFSv4 functionality 545 outside pNFS. For instance, if a file is covered by a mandatory 546 read-only lock, the server can ensure that only read-layout- 547 delegations for the file are granted to pNFS clients. However, it is 548 up to each pNFS client to ensure that the read layout delegation is 549 used only to service read requests, and not to allow writes to the 550 existing parts of the file. Since block/volume storage systems are 551 generally not capable of enforcing such file-based security, in 552 environments where pNFS clients cannot be trusted to enforce such 553 policies, block/volume-based pNFS SHOULD NOT be used. 555 563 5. Conclusions 565 567 6. Acknowledgments 569 This draft draws extensively on the authors' familiarity with the the 570 mapping functionality and protocol in EMC's HighRoad system. The 571 protocol used by HighRoad is called FMP (File Mapping Protocol); it 572 is an add-on protocol that runs in parallel with filesystem protocols 573 such as NFSv3 to provide pNFS-like functionality for block/volume 574 storage. While drawing on HighRoad FMP, the data structures and 575 functional considerations in this draft differ in significant ways, 576 based on lessons learned and the opportunity to take advantage of 577 NFSv4 features such as COMPOUND operations. 579 7. References 581 7.1. Normative References 583 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 584 Requirement Levels", BCP 14, RFC 2119, March 1997. 586 [WELCH-OPS] Welch, B., et. al. "pNFS Operations Summary", draft- 587 welch-pnfs-ops-01.txt, Work in Progress, May 2005. 589 TODO: Need to reference RFC 3530. 591 7.2. Informative References 593 OPEN ISSUE: HighRoad and/or SAN.FS references? 595 Author's Addresses 597 David L. Black 598 EMC Corporation 599 176 South Street 600 Hopkinton, MA 01748 602 Phone: +1 (978) 263-0937 603 Email: black_david@emc.com 605 Stephen Fridella 606 EMC Corporation 607 32 Coslin Drive 608 Southboro, MA 01772 610 Phone: +1 (508) 305-8512 611 Email: fridella_stephen@emc.com 613 Intellectual Property Statement 615 The IETF takes no position regarding the validity or scope of any 616 Intellectual Property Rights or other rights that might be claimed to 617 pertain to the implementation or use of the technology described in 618 this document or the extent to which any license under such rights 619 might or might not be available; nor does it represent that it has 620 made any independent effort to identify any such rights. Information 621 on the procedures with respect to rights in RFC documents can be 622 found in BCP 78 and BCP 79. 624 Copies of IPR disclosures made to the IETF Secretariat and any 625 assurances of licenses to be made available, or the result of an 626 attempt made to obtain a general license or permission for the use of 627 such proprietary rights by implementers or users of this 628 specification can be obtained from the IETF on-line IPR repository at 629 http://www.ietf.org/ipr. 631 The IETF invites any interested party to bring to its attention any 632 copyrights, patents or patent applications, or other proprietary 633 rights that may cover technology that may be required to implement 634 this standard. Please address the information to the IETF at 635 ietf-ipr@ietf.org 637 Disclaimer of Validity 639 This document and the information contained herein are provided on an 640 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 641 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 642 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 643 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 644 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 645 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 647 Copyright Statement 649 Copyright (C) The Internet Society (2005). 651 This document is subject to the rights, licenses and restrictions 652 contained in BCP 78, and except as set forth therein, the authors 653 retain all their rights. 655 Acknowledgment 657 Funding for the RFC Editor function is currently provided by the 658 Internet Society.