idnits 2.17.1 draft-ietf-nfsv4-repl-mig-proto-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC3530]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 174: '... Implementations MUST support operatio...' RFC 2119 keyword, line 864: '... protocol MUST support the RPCSEC_GS...' Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 917 has weird spacing: '...r_mixed who;...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 2003) is 7651 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '12' on line 997 == Unused Reference: 'RDIST' is defined on line 1278, but no explicit reference was found in the text == Unused Reference: 'RSYNC' is defined on line 1281, but no explicit reference was found in the text ** Obsolete normative reference: RFC 1831 (Obsoleted by RFC 5531) ** Obsolete normative reference: RFC 1832 (Obsoleted by RFC 4506) ** Obsolete normative reference: RFC 2478 (Obsoleted by RFC 4178) ** Obsolete normative reference: RFC 3530 (Obsoleted by RFC 7530) Summary: 8 errors (**), 0 flaws (~~), 4 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Robert Thurlow 3 Internet Draft May 2003 4 Document: draft-ietf-nfsv4-repl-mig-proto-01.txt 6 A Server-to-Server Replication/Migration Protocol 8 Status of this Memo 10 This document is an Internet-Draft and is subject to all provisions 11 of Section 10 of RFC2026. 13 Internet-Drafts are working documents of the Internet Engineering 14 Task Force (IETF), its areas, and its working groups. Note that 15 other groups may also distribute working documents as Internet- 16 Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six months 19 and may be updated, replaced, or obsoleted by other documents at any 20 time. It is inappropriate to use Internet- Drafts as reference 21 material or to cite them other than as "work in progress." 23 The list of current Internet-Drafts can be accessed at 24 http://www.ietf.org/1id-abstracts.html 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html 29 Discussion and suggestions for improvement are requested. This 30 document will expire in November, 2003. Distribution of this draft is 31 unlimited. 33 Abstract 35 NFS Version 4 [RFC3530] provided support for client/server 36 interactions to support replication and migration, but left 37 unspecified how replication and migration would be done. This 38 document is an initial draft of a protocol which could be used to 39 transfer filesystem data and metadata for use with replication and 40 migration services for NFS Version 4. 42 Table of Contents 44 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 45 1.1. Changes Since Last Revision . . . . . . . . . . . . . . . 3 46 1.2. Shortcomings . . . . . . . . . . . . . . . . . . . . . . . 4 47 1.3. Rationale . . . . . . . . . . . . . . . . . . . . . . . . 4 48 1.4. Basic structure . . . . . . . . . . . . . . . . . . . . . 4 49 2. Common data types . . . . . . . . . . . . . . . . . . . . . 5 50 2.1. Session, file and checkpoint IDs . . . . . . . . . . . . . 5 51 2.2. Offset, length and cookies . . . . . . . . . . . . . . . . 5 52 2.3. General status . . . . . . . . . . . . . . . . . . . . . . 5 53 2.4. From NFS Version 4 [RFC3530] . . . . . . . . . . . . . . . 6 54 3. Session Management . . . . . . . . . . . . . . . . . . . . . 7 55 3.1. Capabilities negotiation . . . . . . . . . . . . . . . . . 7 56 3.2. Security Negotiation . . . . . . . . . . . . . . . . . . . 8 57 3.3. OPEN_SESSION call . . . . . . . . . . . . . . . . . . . . 8 58 3.4. CLOSE_SESSION call . . . . . . . . . . . . . . . . . . . 11 59 4. Data transfer . . . . . . . . . . . . . . . . . . . . . . 12 60 4.1. Data transfer operations . . . . . . . . . . . . . . . . 12 61 4.2. Data transfer phase overview . . . . . . . . . . . . . . 12 62 4.3. SEND call . . . . . . . . . . . . . . . . . . . . . . . 13 63 4.4. Data transfer operation description . . . . . . . . . . 15 64 4.4.1. SEND_METADATA operation . . . . . . . . . . . . . . . 15 65 4.4.2. SEND_FILE_DATA operation . . . . . . . . . . . . . . . 15 66 4.4.3. SEND_FILE_HOLE operation . . . . . . . . . . . . . . . 16 67 4.4.4. SEND_LOCK_STATE operation . . . . . . . . . . . . . . 16 68 4.4.5. SEND_SHARE_STATE operation . . . . . . . . . . . . . . 16 69 4.4.6. SEND_DELEG_STATE operation . . . . . . . . . . . . . . 17 70 4.4.7. SEND_REMOVE operation . . . . . . . . . . . . . . . . 17 71 4.4.8. SEND_RENAME operation . . . . . . . . . . . . . . . . 18 72 4.4.9. SEND_LINK operation . . . . . . . . . . . . . . . . . 18 73 4.4.10. SEND_SYMLINK operation . . . . . . . . . . . . . . . 18 74 4.4.11. SEND_DIR_CONTENTS operation . . . . . . . . . . . . . 19 75 4.4.12. SEND_CLOSE operation . . . . . . . . . . . . . . . . 19 76 5. IANA Considerations . . . . . . . . . . . . . . . . . . . 19 77 6. Security Considerations . . . . . . . . . . . . . . . . . 19 78 7. Appendix A: XDR Protocol Definition File . . . . . . . . . 20 79 8. Normative References . . . . . . . . . . . . . . . . . . . 28 80 9. Informative References . . . . . . . . . . . . . . . . . . 29 81 10. Author's Address . . . . . . . . . . . . . . . . . . . . 30 83 1. Introduction 85 This document describes a proposed protocol to perform the data 86 transfer involved with replication and migration, as the problem was 87 described in [DESIGN]; familiarity with that document is assumed. It 88 is not yet proven by implementation experience, but is presented for 89 collective work and discussion. 91 Though data replication and transfer are needed in many areas, this 92 document will focus primarily on solving the problem of providing 93 replication and migration support between NFS Version 4 servers. It 94 is assumed that the reader has familiarity with NFS Version 4 95 [RFC3530]. 97 1.1. Changes Since Last Revision 99 Since the -00 version of this draft, the following major changes have 100 been made: 102 o The protocol no longer uses XDR-formatted messages sent via TCP; 103 it now uses RPC calls and replies. 105 o The elements used to transfer data and metadata are now 106 operations arguments to a unified SEND RPC, so that an array of 107 information about a particular file may be sent in one RPC call. 109 o Session management has been simplified to a single OPEN_SESSION 110 call and a single CLOSE_SESSION call. Sessions may also be 111 multiplexed over the same connection. 113 o The protocol should now work in a continuous replication mode, 114 where a transfer session stays up indefinitely and changes can 115 be passed rapidly to replicas. 117 o Support for transferring delegation state has been added. 119 o Support for transferring hard links and symbolic links has been 120 added. 122 o Zero-filled regions or "holes" are now sent as separate 123 operations, rather than being treated as a special case of data 124 transfers. 126 o ACLs and the object type are handled as part of the RMattrs 127 type, rather than being separate. 129 1.2. Shortcomings 131 This draft has the following known shortcomings: 133 o it does not deal with [RSYNC]-like behaviour, which can compare 134 source and destination files 136 o it introduces a capabilities negotiation feature which is not 137 complete enough to be useful 139 o it does not fully specify compression algorithms which can be 140 used 142 o it does not specify how it works with minor revisions to NFS 143 Version 4 145 1.3. Rationale 147 The protocol presented below is a simple bulk-data transfer protocol 148 with minimal traffic in the reverse direction. It is believed that 149 optimal performance is best achieved by a well-implemented source 150 server sending the smallest set of change information to the 151 destination. The advantages in this protocol over data formats such 152 as tar/pax/cpio (as defined by IEEE 1003.1 or ISO/IEC 9945-1) are: 154 o NFSv4 Access Control Lists (ACLs) and named attributes can be 155 transferred 157 o The richer NFSv4 metadata set can be transferred 159 o Restarting of transfers can be achieved 161 o The bandwidth requirements approach the smallest possible. 163 1.4. Basic structure 165 This replication/migration protocol is optimized for bulk data 166 transfer with a minimum of overhead. The ideal case is where the 167 source server can stream filesystem data (or just the changes made) 168 to the destination. An alternate [RSYNC]-like mode which supports 169 both servers comparing files to determine differences has been 170 discussed, but is not present in this draft. 172 Unlike the previous version of this draft, this version will specify 173 RPC [RFC1831] rather than just XDR [RFC1832] formatted messages over 174 TCP. Implementations MUST support operation over TCP and MAY support 175 UDP and other transports supported by RPC. 177 The protocol permits multiple "sessions" per TCP connection by using 178 session identifiers in each RPC. Sessions can be terminated and 179 restarted at a later time. Sessions used to update replicas can also 180 be left in place continuously, so that changes to the master can be 181 reflected on the replicas in near-real-time. 183 The SEND RPC has been optimized by permitting an array of data and 184 metadata updates to be sent in one RPC call, while the response 185 permits the source server to know how far the destination got in 186 applying the updates. 188 2. Common data types 190 2.1. Session, file and checkpoint IDs 192 RMsession_id permits multiplexing transfer sessions on a single 193 authenticated connection; the value is chosen arbitrarily by the 194 source server. RMcheckpoint is used to track the last RPC known to 195 the destination so that restart can be done; a timestamp is supplied 196 to help choose the earliest checkpoint. RMfile_id is intended to be 197 identical to the NFSv4 fileid attribute. 199 typedef uint64_t RMsession_id; 201 typedef uint64_t RMfile_id; 203 struct RMcheckpoint { 204 nfstime4 time; 205 uint64_t id; 206 }; 208 2.2. Offset, length and cookies 210 These variables are chosen for compatibility with NFSv4. 212 typedef uint64_t RMoffset; 213 typedef uint64_t RMlength; 214 typedef uint64_t RMcookie; 216 2.3. General status 218 Status responses for OPEN_SESSION and SEND responses and 219 CLOSE_SESSION reasons shall return a value from this set. 221 enum RMstatus { 222 RM_OK = 0, 223 RMERR_PERM = 1, 224 RMERR_IO = 5, 225 RMERR_EXISTS = 17 226 }; 228 2.4. From NFS Version 4 [RFC3530] 230 The following definitions are imported from NFS Version 4. 232 typedef uint32_t bitmap4<>; 233 typedef opaque attrlist4<>; 234 typedef opaque utf8string<>; 235 typedef opaque utf8str_mixed<>; 236 typedef opaque utf8str_cis<>; 238 struct nfstime4 { 239 int64_t seconds; 240 uint32_t nseconds; 241 }; 243 enum nfs_ftype4 { 244 NF4REG = 1, /* Regular File */ 245 NF4DIR = 2, /* Directory */ 246 NF4BLK = 3, /* Special File - block device */ 247 NF4CHR = 4, /* Special File - character device */ 248 NF4LNK = 5, /* Symbolic Link */ 249 NF4SOCK = 6, /* Special File - socket */ 250 NF4FIFO = 7, /* Special File - fifo */ 251 NF4ATTRDIR = 8, /* Attribute Directory */ 252 NF4NAMEDATTR = 9 /* Named Attribute */ 253 }; 255 typedef uint32_t acetype4; 256 typedef uint32_t aceflag4; 257 typedef uint32_t acemask4; 258 struct nfsace4 { 259 acetype4 type; 260 aceflag4 flag; 261 acemask4 access_mask; 262 utf8string who; 263 }; 265 typedef nfsace4 fattr4_acl<>; 266 struct fattr4 { 267 bitmap4 attrmask; 268 attrlist4 attr_vals; 269 }; 271 3. Session Management 273 Security flavors supported by the destination server may be known in 274 advance, or may be discovered via an initial NULL RPC call which uses 275 SNEGO GSS-API pseudo-mechanism as defined in [RFC2478]. A security 276 flavor normally does not change through the life of the session. 278 A transfer session is created or resumed with the OPEN_SESSION call 279 and terminated normally or abnormally with the CLOSE_SESSION call. 280 This is simpler than the previous draft of this protocol. The 281 OPEN_SESSION call permits negotiation of capabilities and of the 282 checkpoint to be used for a restart, while CLOSE_SESSION permits 283 abnormal as well as normal termination. 285 3.1. Capabilities negotiation 287 Parameters in the OPEN_SESSION call express certain capabilities of 288 the source server and provide an indication of properties of the data 289 to be transferred. The destination server is responsible for 290 reacting to these capabilities. If the desired capabilities are not 291 acceptable to the destination, the response can bid down capabilities 292 by clearing capabilities bits, or reject the session by failing the 293 RPC. If the lowered capabilities bid by the destination server are 294 not acceptable to the source server, the session should be terminated 295 with CLOSE_SESSION. 297 Currently, only three capabilities are specified; we expect to add 298 more through working group effort. Specified so far are the 299 following: 301 o RM_UTF8NAMES - source server supports and expects to send 302 filenames encoded in UTF-8 format. If the destination server 303 does not support UTF-8 filenames, it should convey that by 304 clearing the flag. 306 o RM_FHPRESERVE - source server is willing to attempt to preserve 307 filehandles by sending them as part of each SEND_METADATA 308 operation. If the destination can issue filehandles which it 309 did not generate, and can work with the filehandle format used 310 by the implementation identified by RMimplementation field in 311 the OPEN_SESSION arguments, it can accept this offer; otherwise 312 it should clear the bit to indicate refusal. Since the source 313 server may be denied in attempting to preserve filehandles, it 314 should either refuse to transfer data if the destination clears 315 this flag, or should advise clients of the possibility that 316 filehandles will change via the [RFC3530] FH4_VOL_MIGRATION bit. 318 o RM_FILEID - in combination with RM_FHPRESERVE, the source server 319 is willing to attempt to preserve file_ids as well. If the 320 destination can issue file_ids which it did not generate, and 321 can work with the file_id format used by the implementation 322 identified by RMimplementation field in the OPEN_SESSION 323 arguments, it can accept this offer; otherwise it should clear 324 the bit to indicate refusal. 326 3.2. Security Negotiation 328 Security for this protocol is provided by the RPCSEC_GSS mechanism, 329 defined in [RFC2203], with the same GSS-API mechanisms defined as 330 mandatory-to-implement as [RFC3530], namely the Kerberos V5 and 331 LIPKEY mechanisms defined in [RFC1964] and [RFC2847]. In the case of 332 a client and server implementing more than one of these mechanisms, 333 the first RPC call should be an RPC NULL procedure call with the 334 RPCSEC_GSS auth flavor and the SNEGO GSS-API mechanism populated with 335 the mechanisms acceptable to the client. The server should respond 336 with the preferred mechanism, if any, and this mechanism will be used 337 for all sessions on this connection. 339 3.3. OPEN_SESSION call 341 SYNOPSIS 343 OPEN_SESSIONargs -> OPEN_SESSIONres 345 ARGUMENT 347 struct RMnewsession { 348 utf8string src_path; 349 utf8string dest_path; 350 uint64_t fs_size; 351 uint64_t tr_size; 352 uint64_t tr_objs; 353 }; 355 struct RMoldsession { 356 RMcheckpoint check_id; 357 uint64_t rem_size; 358 uint64_t rem_objs; 359 }; 360 union RMopeninfo switch (bool new) { 361 case TRUE: 362 RMnewsession newinfo; 363 case FALSE: 364 RMoldsession oldinfo; 365 }; 367 typedef uint64_t RMcapability; 368 typedef utf8str_cis RMimplementation<>; 370 struct OPEN_SESSIONargs { 371 RMsession_id session_id; 372 RMcomp_type comp_list<>; 373 RMcapability capabilities; 374 RNimplementation implementation; 375 RMopeninfo info; 376 }; 378 RESULT 380 struct RMopenok { 381 RMcheckpoint check_id; 382 RMcomp_type comp_alg; 383 RMcapability capabilities; 384 }; 386 union RMopenresp switch (RMstatus status) { 387 case RM_OK: 388 RMopenok info; 389 default: 390 void; 391 }; 393 struct OPEN_SESSIONres { 394 RMsession_id session_id; 395 RMopenresp response; 396 }; 398 OPEN_SESSION is a request to create or resume a transfer session to 399 send the full or incremental contents of one filesystem. For either 400 new or resuming sessions, the source server supplies the following 401 information: 403 o session_id - a unique number assigned by the source server to 404 the transfer session, or the number of the session to be 405 resumed. 407 o comp_list - a list of compression types the source server can 408 use to compress data. 410 o capabilities - the bitmask used to negotiate as described in 411 Section 4.3. 413 o implementation - a descriptor of the operating system and 414 filesystem implementation, with version information, used by the 415 source server; this is to permit preservation of filehandles and 416 fileids if the destination server runs a compatible version. 417 This field is constructed at the pleasure of the source server 418 and need only be parsed properly by a destination server running 419 the same operating system code. 421 For new sessions, the source server supplies the following 422 information: 424 o src_path - full path name to the filesystem on source server 426 o dest_path - full path name to the filesystem on the destination 427 server 429 o fs_size - total size of the filesystem data 431 o tr_size - amount of filesystem data to be sent during this 432 transfer session 434 o tr_objs - number of objects to be sent or updated in this 435 transfer session 437 For resuming sessions, the source server supplies the following 438 information: 440 o check_id - checkpoint ID for the last RPC believed sent 442 o rem_size - remaining amount of filesystem data to be sent 444 o rem_objs - remaining number of objects to be sent or updated 446 The response from the destination server may reject the session 447 proposal with an error code, may accept the proposal outright, or may 448 bid down capabilities or state that it needs to start from an earlier 449 checkpoint than that proposed by the source. The destination will 450 also choose a compression algorithm from the list the source 451 provided. The source may issue a CLOSE_SESSION call if capabilities 452 negotiated down are not acceptable to it. Once the OPEN_SESSION RPC 453 has been completed, SEND RPCs with data transfer operations will be 454 sent until a CLOSE_SESSION RPC is sent. 456 3.4. CLOSE_SESSION call 458 SYNOPSIS 460 CLOSE_SESSIONargs -> CLOSE_SESSIONres 462 ARGUMENT 464 struct RMbadclose { 465 RMcheckpoint check_id; 466 bool_t restartable; 467 }; 469 union RMcloseinfo switch (RMstatus status) { 470 case RM_OK: 471 void; 472 default: 473 RMbadclose info; 474 }; 476 struct CLOSE_SESSIONargs { 477 RMsession_id session_id; 478 RMcloseinfo info; 479 }; 481 RESULT 483 struct CLOSE_SESSIONres { 484 RMsession_id session_id; 485 RMcheckpoint check_id; 486 }; 488 CLOSE_SESSION is used to terminate the session normally or abnormally 489 by the source server. 491 A normal close is handled by setting the RMcloseinfo status to RM_OK. 492 Upon a normal close, a migration event is considered complete and the 493 source will begin to refer clients to the destination server. 495 An abnormal close is handled by setting the status to something other 496 than RM_OK and supplying the last checkpoint the source server 497 believes it sent plus an indication of whether it is possible to 498 restart the transfer from that checkpoint. The destination server 499 responds with the last checkpoint it has successfully committed. The 500 destination server should attempt to save the state of the aborted 501 session for a period of at least one hour. 503 4. Data transfer 505 4.1. Data transfer operations 507 Data transfer is accomplished by the SEND RPC, which takes an array 508 of unions to permit a variety of transfer operations to be sent in 509 each RPC. All operations must pertain to one filesystem object, 510 since the RMfile_id is provided for each SEND RPC, not for each 511 operation. Each operation in the array has an RMstatus in the 512 response, so the source server can track how much was done if the 513 call failed. Processingn stops at the first failure, and the SEND 514 RPC response status is set to the first failure status. 516 The following transfer operations are supported: 518 o SEND_METADATA - send metadata about object 520 o SEND_FILE_DATA - send file data 522 o SEND_FILE_HOLE - send file data 524 o SEND_LOCK_STATE - send file lock state 526 o SEND_SHARE_STATE - send share modes state 528 o SEND_DELEG_STATE - send delegation state 530 o SEND_REMOVE - send an object removal transaction 532 o SEND_RENAME - send an object rename transaction 534 o SEND_LINK - send an object link transaction 536 o SEND_SYMLINK - send an object symlink transaction 538 o SEND_DIR_CONTENTS - send names of objects in a directory 540 o SEND_CLOSE - signal completion of object 542 4.2. Data transfer phase overview 544 The source server processes filesystem objects in some known order 545 which will permit checkpointing and restarting in case of some 546 problem or operator abort. Full transfers should be done in order 547 such that objects which are needed, such as directories and link 548 targets, are present when referrals are made to them. Incremental 549 transfers should be done in the order changes were made on the source 550 server, if possible; if not possible, the order described for full 551 transfers is acceptable. 553 For files which are to be created or updated, SEND_METADATA is sent 554 first, then SEND_FILE_DATA operations will be sent. If outstanding 555 lock, share or delegation state for an object exists on the source 556 server, it will be sent via SEND_LOCK_STATE, SEND_SHARE_STATE or 557 SEND_DELEG_STATE operations after all data has been transferred. 558 SEND_CLOSE is used to signal that all changes to a file are complete. 559 Directories are created with SEND_METADATA, but are not populated 560 until its objects are created, so the SEND_METADATA is followed by 561 SEND_CLOSE. 563 Ideally, the source server will track all filesystem changes via a 564 mechanism such as [DMAPI], and will be able to reflect remove, rename 565 and link changes via SEND_REMOVE, SEND_RENAME and SEND_LINK 566 operations. If the source server cannot capture all create and 567 remove operations on a directory reliably, SEND_DIR_CONTENTS should 568 be used. This operation lists all directory entries for a source 569 server, so that the destination server can compute what items should 570 be removed. This is less reliable than being able to send 571 SEND_REMOVE, SEND_RENAME and SEND_LINK operations, and should be used 572 only when the underlying filesystem cannot record changes as they 573 happen. 575 Named attributes for a filesystem object are handled with 576 SEND_METADATA operations with file type NF4NAMEDATTR. This will be 577 "nested", i.e. it will be understood that the named attribute is 578 associated with the parent object handled. SEND_CLOSE is used to 579 indicate that all data and metadata of the named attribute have been 580 transferred, and must be issued before another named attribute can be 581 handled and before the SEND_CLOSE for the parent object is issued. 582 Named attributes may not themselves have named attributes. 584 4.3. SEND call 586 SYNOPSIS 588 SENDargs -> SENDres 590 ARGUMENT 592 union RMsendargs switch (RMoptype sendtype) { 593 case OP_SEND_METADATA: 594 SEND_METADATA metadata; 595 case OP_SEND_FILE_DATA: 596 SEND_FILE_DATA data; 598 case OP_SEND_FILE_HOLE: 599 SEND_FILE_HOLE hole; 600 case OP_SEND_LOCK_STATE: 601 SEND_LOCK_STATE lock; 602 case OP_SEND_SHARE_STATE: 603 SEND_SHARE_STATE share; 604 case OP_SEND_DELEG_STATE: 605 SEND_DELEG_STATE deleg; 606 case OP_SEND_REMOVE: 607 SEND_REMOVE remove; 608 case OP_SEND_RENAME: 609 SEND_RENAME rename; 610 case OP_SEND_LINK: 611 SEND_LINK link; 612 case OP_SEND_SYMLINK: 613 SEND_SYMLINK symlink; 614 case OP_SEND_DIR_CONTENTS: 615 SEND_DIR_CONTENTS dirc; 616 case OP_SEND_CLOSE: 617 void; 618 }; 620 struct SEND1args { 621 RMsession_id session_id; 622 RMcheckpoint check_id; 623 RMfile_id file_id; 624 RMsendargs sendarray<>; 625 }; 627 RESULT 629 union RMsendres switch (RMoptype sendtype) { 630 case OP_SEND_METADATA: 631 case OP_SEND_FILE_DATA: 632 case OP_SEND_FILE_HOLE: 633 case OP_SEND_LOCK_STATE: 634 case OP_SEND_SHARE_STATE: 635 case OP_SEND_DELEG_STATE: 636 case OP_SEND_REMOVE: 637 case OP_SEND_RENAME: 638 case OP_SEND_LINK: 639 case OP_SEND_SYMLINK: 640 case OP_SEND_DIR_CONTENTS: 641 case OP_SEND_CLOSE: 642 RMstatus status; 643 }; 644 struct SEND1res { 645 RMsession_id session_id; 646 RMcheckpoint check_id; 647 RMfile_id file_id; 648 RMsendres resarray<>; 649 RMstatus status; 650 }; 652 The SEND RPC batches data transfer operations together and sends them 653 to the destination server to operate on one file and with one 654 checkpoint. The destination server may fail a call in the middle of 655 the array by setting the return status for that operation to 656 something other than RM_OK, and will not process further operations. 657 The call will be failed with that status as well. 659 4.4. Data transfer operation description 661 4.4.1. SEND_METADATA operation 663 SYNOPSIS 665 struct SEND_METADATA { 666 utf8string obj_name; 667 RMattrs attrs; 668 }; 670 SEND_METADATA announces that we are about to transfer information 671 about a particular filesystem object. If an object does not exist on 672 the destination, it will be created with the given obj_name and 673 attributes supplied. If the object exists and is is the correct 674 type, its attributes will be updated. If an object of the same name 675 but a different type exists, it will be removed and recreated with 676 this information. If a SEND_METADATA has not followed a SEND_CLOSE, 677 it may have the is_named_attr flag set, in which case the object is a 678 named attribute of the most recent object identified by a 679 SEND_METADATA. 681 4.4.2. SEND_FILE_DATA operation 683 SYNOPSIS 685 struct SEND_FILE_DATA { 686 RMoffset offset; 687 RMlength length; 688 opaque data<>; 689 }; 690 SEND_FILE_DATA sends a block of data for a regular file. The range 691 is identified by the offset, length pair as starting at seek position 692 'offset' and extending through 'offset+length-1', inclusive. 694 4.4.3. SEND_FILE_HOLE operation 696 SYNOPSIS 698 struct SEND_FILE_HOLE { 699 RMoffset offset; 700 RMlength length; 701 }; 702 SEND_FILE_HOLE sends a description of a "hole", or a zero-filled and 703 usually unallocated block of data. A source server which does sparse 704 allocation and which can learn via APIs what parts of a file are 705 unallocated can use this to describe the hole without transferring 706 the block of zeros. 708 4.4.4. SEND_LOCK_STATE operation 710 SYNOPSIS 712 enum RMlocktype { 713 RM_NOLOCK = 0, 714 RM_READLOCK = 1, 715 RM_WRITELOCK = 2 716 }; 718 struct SEND_LOCK_STATE { 719 RMowner owner; 720 RMclientid clientid; 721 RMoffset offset; 722 RMlength length; 723 RMlocktype type; 724 RMstateid id; 725 }; 727 SEND_LOCK_STATE transfers ownership and range information about 728 outstanding byte-range locks to the destination server. The lock 729 stateid is transferred so that the client need not reestablish the 730 lock after migration. RM_NOLOCK is included to support continuous 731 replication by permitting locks on replicas to be cleared. 733 4.4.5. SEND_SHARE_STATE operation 735 SYNOPSIS 737 typedef uint32_t RMaccess; 738 typedef uint32_t RMdeny; 740 struct SEND_SHARE_STATE { 741 RMowner owner; 742 RMclientid client; 743 RMaccess accmode; 744 RMdeny denymode; 745 }; 747 SEND_SHARE_STATE transfers ownership and mode information about 748 outstanding share reservations to the destination server. 750 4.4.6. SEND_DELEG_STATE operation 752 SYNOPSIS 754 enum RMdelegtype { 755 RM_NODELEG = 0, 756 RM_READDELEG = 1, 757 RM_WRITEDELEG = 2 758 }; 760 struct SEND_DELEG_STATE { 761 RMclientid client; 762 RMdelegtype type; 763 RMstateid id; 764 }; 766 SEND_DELEG_STATE transfers ownership and type information about 767 outstanding file delegations to the destination server. RM_NODELEG 768 is included to support continuous replication by permitting 769 delegations on replicas to be cleared. 771 4.4.7. SEND_REMOVE operation 773 SYNOPSIS 775 struct SEND_REMOVE { 776 utf8string name; 777 }; 779 SEND_REMOVE documents a remove event on the object identified; upon 780 receipt, the destination server will remove the object as well. 782 4.4.8. SEND_RENAME operation 784 SYNOPSIS 786 struct SEND_RENAME { 787 utf8string old_name; 788 utf8string new_name; 789 }; 791 SEND_RENAME documents a rename event on the object identified by 792 old_name; upon receipt, the destination server will rename the object 793 in the destination filesystem. Full paths may be used relative to 794 the root of the source filesystem. 796 4.4.9. SEND_LINK operation 798 SYNOPSIS 800 struct SEND_LINK { 801 utf8string old_name; 802 utf8string new_name; 803 }; 805 SEND_LINK documents the creation of a hard link from the old_name to 806 the new_name; upon receipt, the destination server will link the 807 objects in the destination filesystem. Full paths may be used 808 relative to the root of the source filesystem. 810 4.4.10. SEND_SYMLINK operation 812 SYNOPSIS 814 struct SEND_SYMLINK { 815 utf8string old_name; 816 utf8string new_name; 817 }; 819 SEND_SYMLINK documents the creation of a symbolic link from the 820 old_name to the new_name; upon receipt, the destination server will 821 symlink the objects in the destination filesystem. The old_name 822 value is not checked in any way and can be arbitrary textual data. 824 4.4.11. SEND_DIR_CONTENTS operation 826 SYNOPSIS 828 struct SEND_DIR_CONTENTS { 829 RMcookie cookie; 830 bool eof; 831 utf8string names<>; 832 }; 834 SEND_DIR_CONTENTS is used to account for removals and renames when 835 source servers cannot record the events such that they may be sent 836 with SEND_REMOVE and SEND_RENAME. The contents are listed in no 837 predictable order so that the destination can what entries it has 838 which are no longer found on the source. Each SEND_DIR_CONTENTS 839 includes an opaque directory cookie to represent starting location of 840 the block on the source server, and the eof flag is set on the last 841 block. Any item existing on the destination that is not listed in a 842 SEND_DIR_CONTENTS operation will be removed. 844 4.4.12. SEND_CLOSE operation 846 SYNOPSIS 848 void; 850 SEND_CLOSE is used to announce that all data and metadata changes for 851 a particular object have been completed. 853 5. IANA Considerations 855 The replication/migration protocol will use a well-known RPC program 856 number at which destination servers will register. The author will 857 acquire an RPC program number for this purpose. 859 6. Security Considerations 861 NFS Version 4 is the primary impetus behind a replication/migration 862 protocol, so this protocol should mandate a strong security scheme in 863 a manner comparable with NFS Version 4. Implementations of this 864 protocol MUST support the RPCSEC_GSS security flavor as defined in 865 [RFC2203] and must also support the Kerberos V5 and LIPKEY mechanisms 866 as defined in [RFC1964] and [RFC2847]. The particular mechanism 867 chosen for sessions is determined by the use of SNEGO on the initial 868 call, which should be a NULL RPC. 870 7. Appendix A: XDR Protocol Definition File 872 /* 873 * Copyright (C) The Internet Society (1998,1999,2000,2001,2002). 874 * All Rights Reserved. 875 */ 877 /* 878 * repl-mig.x 879 */ 881 %#pragma ident "@(#)repl-mig.x 1.4 03/05/27" 883 /* 884 * From RFC3530 885 */ 886 typedef uint32_t bitmap4<>; 887 typedef opaque attrlist4<>; 888 typedef opaque utf8string<>; 889 typedef opaque utf8str_mixed<>; 890 typedef opaque utf8str_cis<>; 892 struct nfstime4 { 893 int64_t seconds; 894 uint32_t nseconds; 895 }; 897 enum nfs_ftype4 { 898 NF4REG = 1, /* Regular File */ 899 NF4DIR = 2, /* Directory */ 900 NF4BLK = 3, /* Special File - block device */ 901 NF4CHR = 4, /* Special File - character device */ 902 NF4LNK = 5, /* Symbolic Link */ 903 NF4SOCK = 6, /* Special File - socket */ 904 NF4FIFO = 7, /* Special File - fifo */ 905 NF4ATTRDIR = 8, /* Attribute Directory */ 906 NF4NAMEDATTR = 9 /* Named Attribute */ 907 }; 909 typedef uint32_t acetype4; 910 typedef uint32_t aceflag4; 911 typedef uint32_t acemask4; 913 struct nfsace4 { 914 acetype4 type; 915 aceflag4 flag; 916 acemask4 access_mask; 917 utf8str_mixed who; 918 }; 920 typedef nfsace4 fattr4_acl<>; 922 struct fattr4 { 923 bitmap4 attrmask; 924 attrlist4 attr_vals; 925 }; 927 /* 928 * For session, message, file and checkpoint IDs 929 */ 930 typedef uint64_t RMsession_id; 932 typedef uint64_t RMfile_id; 934 struct RMcheckpoint { 935 nfstime4 time; 936 uint64_t id; 937 }; 939 /* 940 * For compression algorithm negotiation 941 */ 942 enum RMcomp_type { 943 RM_NULLCOMP = 0, 944 RM_COMPRESS = 1, 945 RM_ZIP = 2 946 }; 948 /* 949 * For capabilities negotiation 950 */ 951 typedef utf8str_cis RMimplementation<>; 952 typedef uint64_t RMcapability; 953 const RM_UTF8NAMES = 0x00000001; 954 const RM_FHPRESERVE = 0x00000002; 956 /* 957 * For general status 958 */ 959 enum RMstatus { 960 RM_OK = 0, 961 RMERR_PERM = 1, 962 RMERR_IO = 5, 963 RMERR_EXISTS = 17 964 }; 965 /* 966 * Attributes 967 */ 968 struct RMattrs { 969 fattr4 attr; 970 nfs_ftype4 obj_type; 971 fattr4_acl obj_acl; 972 bool is_named_attr; 973 }; 975 /* 976 * Offset, length and cookies 977 */ 978 typedef uint64_t RMoffset; 979 typedef uint64_t RMlength; 980 typedef uint64_t RMcookie; 982 /* 983 * Owner 984 */ 985 typedef utf8str_mixed RMowner; 987 /* 988 * Lock and share supporting definitions 989 */ 990 struct RMclientid { 991 utf8string name; 992 opaque address<>; 993 }; 995 struct RMstateid { 996 uint32_t seqid; 997 opaque other[12]; 998 }; 1000 enum RMlocktype { 1001 RM_NOLOCK = 0, 1002 RM_READLOCK = 1, 1003 RM_WRITELOCK = 2 1004 }; 1006 typedef uint32_t RMaccess; 1007 typedef uint32_t RMdeny; 1009 enum RMdelegtype { 1010 RM_NODELEG = 0, 1011 RM_READDELEG = 1, 1012 RM_WRITEDELEG = 2 1014 }; 1016 /* 1017 * Protocol elements - session control 1018 */ 1019 struct RMnewsession { 1020 utf8string src_path; 1021 utf8string dest_path; 1022 uint64_t fs_size; 1023 uint64_t tr_size; 1024 uint64_t tr_objs; 1025 }; 1027 struct RMoldsession { 1028 RMcheckpoint check_id; 1029 uint64_t rem_size; 1030 uint64_t rem_objs; 1031 }; 1033 union RMopeninfo switch (bool new) { 1034 case TRUE: 1035 RMnewsession newinfo; 1036 case FALSE: 1037 RMoldsession oldinfo; 1038 }; 1040 struct OPEN_SESSIONargs { 1041 RMsession_id session_id; 1042 RMcomp_type comp_list<>; 1043 RMcapability capabilities; 1044 RNimplementation impl; 1045 RMopeninfo info; 1046 }; 1048 struct RMopenok { 1049 RMcheckpoint check_id; 1050 RMcomp_type comp_alg; 1051 RMcapability capabilities; 1052 }; 1054 union RMopenresp switch (RMstatus status) { 1055 case RM_OK: 1056 RMopenok info; 1057 default: 1058 void; 1059 }; 1061 struct OPEN_SESSIONres { 1062 RMsession_id session_id; 1063 RMopenresp response; 1064 }; 1066 struct RMbadclose { 1067 RMcheckpoint check_id; 1068 bool_t restartable; 1069 }; 1071 union RMcloseinfo switch (RMstatus status) { 1072 case RM_OK: 1073 void; 1074 default: 1075 RMbadclose info; 1076 }; 1078 struct CLOSE_SESSIONargs { 1079 RMsession_id session_id; 1080 RMcloseinfo info; 1081 }; 1083 struct CLOSE_SESSIONres { 1084 RMsession_id session_id; 1085 RMcheckpoint check_id; 1086 }; 1088 /* 1089 * Protocol elements - data transfer 1090 */ 1091 enum RMoptype { 1092 OP_SEND_METADATA = 1, 1093 OP_SEND_FILE_DATA = 2, 1094 OP_SEND_FILE_HOLE = 3, 1095 OP_SEND_LOCK_STATE = 4, 1096 OP_SEND_SHARE_STATE = 5, 1097 OP_SEND_DELEG_STATE = 6, 1098 OP_SEND_REMOVE = 7, 1099 OP_SEND_RENAME = 8, 1100 OP_SEND_LINK = 9, 1101 OP_SEND_SYMLINK = 10, 1102 OP_SEND_DIR_CONTENTS = 11, 1103 OP_SEND_CLOSE = 12 1104 }; 1106 /* 1107 * Data and metadata send items 1108 */ 1109 struct SEND_METADATA { 1110 utf8string obj_name; 1111 RMattrs attrs; 1112 }; 1114 struct SEND_FILE_DATA { 1115 RMoffset offset; 1116 RMlength length; 1117 opaque data<>; 1118 }; 1120 struct SEND_FILE_HOLE { 1121 RMoffset offset; 1122 RMlength length; 1123 }; 1125 struct SEND_LOCK_STATE { 1126 RMowner owner; 1127 RMclientid client; 1128 RMoffset offset; 1129 RMlength length; 1130 RMlocktype type; 1131 RMstateid id; 1132 }; 1134 struct SEND_SHARE_STATE { 1135 RMowner owner; 1136 RMclientid client; 1137 RMaccess accmode; 1138 RMdeny denymode; 1139 }; 1141 struct SEND_DELEG_STATE { 1142 RMclientid client; 1143 RMdelegtype type; 1144 RMstateid id; 1145 }; 1147 struct SEND_REMOVE { 1148 utf8string name; 1149 }; 1151 struct SEND_RENAME { 1152 utf8string old_name; 1153 utf8string new_name; 1154 }; 1156 struct SEND_LINK { 1157 utf8string old_name; 1158 utf8string new_name; 1159 }; 1161 struct SEND_SYMLINK { 1162 utf8string old_name; 1163 utf8string new_name; 1164 }; 1166 struct SEND_DIR_CONTENTS { 1167 RMcookie cookie; 1168 bool eof; 1169 utf8string names<>; 1170 }; 1172 /* no parameters for SEND_CLOSE */ 1174 union RMsendargs switch (RMoptype sendtype) { 1175 case OP_SEND_METADATA: 1176 SEND_METADATA metadata; 1177 case OP_SEND_FILE_DATA: 1178 SEND_FILE_DATA data; 1179 case OP_SEND_FILE_HOLE: 1180 SEND_FILE_HOLE hole; 1181 case OP_SEND_LOCK_STATE: 1182 SEND_LOCK_STATE lock; 1183 case OP_SEND_SHARE_STATE: 1184 SEND_SHARE_STATE share; 1185 case OP_SEND_DELEG_STATE: 1186 SEND_DELEG_STATE deleg; 1187 case OP_SEND_REMOVE: 1188 SEND_REMOVE remove; 1189 case OP_SEND_RENAME: 1190 SEND_RENAME rename; 1191 case OP_SEND_LINK: 1192 SEND_LINK link; 1193 case OP_SEND_SYMLINK: 1194 SEND_SYMLINK symlink; 1195 case OP_SEND_DIR_CONTENTS: 1196 SEND_DIR_CONTENTS dirc; 1197 case OP_SEND_CLOSE: 1198 void; 1199 }; 1201 union RMsendres switch (RMoptype sendtype) { 1202 case OP_SEND_METADATA: 1203 case OP_SEND_FILE_DATA: 1204 case OP_SEND_FILE_HOLE: 1205 case OP_SEND_LOCK_STATE: 1207 case OP_SEND_SHARE_STATE: 1208 case OP_SEND_DELEG_STATE: 1209 case OP_SEND_REMOVE: 1210 case OP_SEND_RENAME: 1211 case OP_SEND_LINK: 1212 case OP_SEND_SYMLINK: 1213 case OP_SEND_DIR_CONTENTS: 1214 case OP_SEND_CLOSE: 1215 RMstatus status; 1216 }; 1218 struct SEND1args { 1219 RMsession_id session_id; 1220 RMcheckpoint check_id; 1221 RMfile_id file_id; 1222 RMsendargs sendarray<>; 1223 }; 1225 struct SEND1res { 1226 RMsession_id session_id; 1227 RMcheckpoint check_id; 1228 RMfile_id file_id; 1229 RMsendres resarray<>; 1230 RMstatus status; 1231 }; 1233 program RM_PROGRAM { 1234 version RM_V1 { 1235 void 1236 RMPROC1_NULL(void) = 0; 1237 OPEN_SESSIONres 1238 RMPROC1_OPEN_SESSION(OPEN_SESSIONargs) = 1; 1239 CLOSE_SESSIONres 1240 RMPROC1_CLOSE_SESSION(CLOSE_SESSIONargs) = 2; 1241 SEND1res 1242 RMPROC1_SEND(SEND1args) = 3; 1243 } = 1; 1244 } = 100273; 1246 8. Normative References 1248 [RFC1831] 1249 R. Srinivasan, "RPC: Remote Procedure Call Protocol Specification 1250 Version 2", RFC1831, August 1995. 1252 [RFC1832] 1253 R. Srinivasan, "XDR: External Data Representation Standard", RFC1832, 1254 August 1995. 1256 [RFC1964] 1257 J. Linn, "Kerberos Version 5 GSS-API Mechanism", RFC1964, June 1996 1259 [RFC2203] 1260 M. Eisler, A. Chiu, L. Ling, "RPCSEC_GSS Protocol Specification", 1261 RFC2203, September 1997 1263 [RFC2478] 1264 E. Baize, D. Pinkas, "The Simple and Protected GSS-API Negotiation 1265 Mechanism", RFC2478, December 1998. 1267 [RFC2847] 1268 M. Eisler, "LIPKEY - A Low Infrastructure Public Key Mechanism Using 1269 SPKM", RFC2847, June 2000 1271 [RFC3530] 1272 S. Shepler, B. Callaghan, D. Robinson, R. Thurlow, C. Beame, M. 1273 Eisler, D. Noveck, "Network File System (NFS) Version 4 Protocol", 1274 RFC3530, April 2003. 1276 9. Informative References 1278 [RDIST] 1279 MagniComp, Inc., "RDist Home Page", http://www.magnicomp.com/rdist. 1281 [RSYNC] 1282 The Samba Team, "rsync web pages", http://samba.anu.edu.au/rsync. 1284 [DESIGN] 1285 R. Thurlow, "Server-to-Server Replication/Migration Protocol Design 1286 Principles" (work in progress), http://www.ietf.org/internet- 1287 drafts/draft-ietf-nfsv4-repl-mig-design-00.txt, December 2002. 1289 [DMAPI] 1290 P. Lawthers, "The Data Management Applications Programming 1291 Interface", 1292 http://www.computer.org/conferences/mss95/lawthers/lawthers.htm, July 1293 1995. 1295 10. Author's Address 1297 Address comments related to this memorandum to: 1299 nfsv4-wg@sunroof.eng.sun.com 1301 Robert Thurlow 1302 Sun Microsystems, Inc. 1303 500 Eldorado Boulevard, UBRM05-171 1304 Broomfield, CO 80021 1306 Phone: 877-718-3419 1307 E-mail: robert.thurlow@sun.com