idnits 2.17.1 draft-callaghan-webnfs-client-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-03-28) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 841 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 2 instances of too long lines in the document, the longest one being 2 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 1996) is 10026 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Missing reference section? 'RFC1094' on line 703 looks like a reference -- Missing reference section? 'RFC1813' on line 710 looks like a reference -- Missing reference section? 'RFC1831' on line 690 looks like a reference -- Missing reference section? 'RFC1832' on line 695 looks like a reference -- Missing reference section? 'RFC1833' on line 699 looks like a reference -- Missing reference section? 'RFC1808' on line 685 looks like a reference -- Missing reference section? 'RFCmmmm' on line 717 looks like a reference -- Missing reference section? 'Sandberg' on line 721 looks like a reference Summary: 8 errors (**), 0 flaws (~~), 1 warning (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group B. Callaghan 2 INTERNET DRAFT 3 Category: Informational 4 Expire in six months Sun Microsystems, Inc. 5 October 1996 7 WebNFS Client Specification 8 10 Abstract 12 This document describes a lightweight binding mechanism that 13 allows NFS clients to obtain service from WebNFS-enabled 14 servers with a minimum of protocol overhead. In removing 15 this overhead, WebNFS clients see benefits in faster response 16 to requests, easy transit of packet filter firewalls and 17 TCP-based proxies, and better server scalability. 19 Status of this Memo 21 This memo provides information for the Internet community. 22 This memo does not specify an Internet standard of any kind. 23 Distribution of this memo is unlimited. 25 Table of Contents 27 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 2 28 2. TCP vs UDP . . . . . . . . . . . . . . . . . . . . . . . . 2 29 3. Well-known Port . . . . . . . . . . . . . . . . . . . . . 2 30 4. NFS Version 3 . . . . . . . . . . . . . . . . . . . . . . 3 31 4.1 Transfer Size . . . . . . . . . . . . . . . . . . . . . 3 32 4.2 Fast Writes . . . . . . . . . . . . . . . . . . . . . . 4 33 4.3 READDIRPLUS . . . . . . . . . . . . . . . . . . . . . . 5 34 5. Public Filehandle . . . . . . . . . . . . . . . . . . . . 5 35 5.1 NFS Version 2 Public Filehandle . . . . . . . . . . . . 5 36 5.2 NFS Version 3 Public Filehandle . . . . . . . . . . . . 6 37 6. Multi-component Lookup . . . . . . . . . . . . . . . . . . 6 38 6.1 Canonical Path vs. Native Path . . . . . . . . . . . . . 7 39 6.2 Symbolic Links . . . . . . . . . . . . . . . . . . . . . 8 40 6.2.1 Absolute Link . . . . . . . . . . . . . . . . . . . . 8 41 6.2.2 Relative Link . . . . . . . . . . . . . . . . . . . . 8 42 6.3 Filesystem Spanning Pathnames . . . . . . . . . . . . . 9 43 7. Contacting the Server . . . . . . . . . . . . . . . . . . 10 44 8. Mount Protocol . . . . . . . . . . . . . . . . . . . . . . 11 45 9. Exploiting Concurrency . . . . . . . . . . . . . . . . . . 12 46 9.1 Read-ahead . . . . . . . . . . . . . . . . . . . . . . . 12 47 9.2 Concurrent File Download . . . . . . . . . . . . . . . . 13 48 10. Timeout and Retransmission . . . . . . . . . . . . . . . . 14 49 11. Bibliography . . . . . . . . . . . . . . . . . . . . . . . 15 50 12. Security Considerations . . . . . . . . . . . . . . . . . 16 51 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 17 52 14. Author's Address . . . . . . . . . . . . . . . . . . . . . 17 54 1. Introduction 56 The NFS protocol provides access to shared filesystems 57 across networks. It is designed to be machine, operating 58 system, network architecture, and transport protocol independent. 59 The protocol currently exists in two versions: version 2 [RFC1094] 60 and version 3 [RFC1813], both built on Sun RPC [RFC1831] at its 61 associated eXternal Data Representation (XDR) [RFC1832] and 62 Binding Protocol [RFC1833]. 64 WebNFS provides additional semantics that can be 65 applied to NFS version 2 and 3 to eliminate the overhead 66 of PORTMAP and MOUNT protocols, make the protocol easier 67 to use where firewall transit is required, and reduce 68 the number of LOOKUP requests required to identify 69 a particular file on the server. WebNFS server requirements 70 are described in RFC mmmm. 72 2. TCP vs UDP 74 The NFS protocol is most well known for its use of UDP which 75 performs acceptably on local area networks. However, on wide area 76 networks with error prone, high-latency connections and bandwidth 77 contention, TCP is well respected for its congestion control and 78 superior error handling. A growing number of NFS implementations 79 now support the NFS protocol over TCP connections. 81 Use of NFS version 3 is particularly well matched to the use 82 of TCP as a transport protocol. Version 3 removes the arbitrary 83 8k transfer size limit of version 2, allowing the READ or 84 WRITE of very large streams of data over a TCP connection. 85 Note that NFS version 2 is also supported on TCP connections, 86 though the benefits of TCP data streaming will not be as great. 88 A WebNFS client must first attempt to connect to its server 89 with a TCP connection. If the server refuses the connection, 90 the client should attempt to use UDP. 92 3. Well-known Port 94 While Internet protocols are generally identified by registered 95 port number assignments, RPC based protocols register a 32 bit 96 program number and a dynamically assigned port with the portmap 97 service which is registered on the well-known port 111. Since 98 the NFS protocol is RPC-based, NFS servers register their port 99 assignment with the portmap service. 101 NFS servers are constrained by a requirement to re-register 102 at the same port after a server crash and recovery so that 103 clients can recover simply by retransmitting an RPC request 104 until a response is received. This is simpler than the 105 alternative of having the client repeatedly check with 106 the portmap service for a new port assignment. NFS servers 107 typically achieve this port invariance by registering a 108 constant port assignment, 2049, for both UDP and TCP. 110 To avoid the overhead of contacting the server's portmap 111 service, and to facilitate transit through packet filtering 112 firewalls, WebNFS clients optimistically assume that WebNFS 113 servers register on port 2049. Most NFS servers use this 114 port assignment already, so this client optimism is well 115 justified. Refer to section 8 for further details on port 116 binding. 118 4. NFS Version 3 120 NFS version 3 corrects deficiencies in version 2 of the protocol 121 as well as providing a number of features suitable to WebNFS 122 clients accessing servers over high-latency, low-bandwidth 123 connections. 125 4.1 Transfer Size 127 NFS version 2 limited the amount of data in a single request 128 or reply to 8 kilobytes. This limit was based on what was 129 then considered a reasonable upper bound on the amount of 130 data that could be transmitted in a UDP datagram across an 131 Ethernet. The 8k transfer size limitation affects READ, WRITE, 132 and READDIR requests. When using version 2, a WebNFS client 133 must not transmit any request that exceeds the 8k transfer 134 size. Additionally, the client must be able to adjust its 135 requests to suit servers that limit transfer sizes to values 136 smaller than 8k. 138 NFS version 3 removes the 8k limit, allowing the client and 139 server to negotiate whatever limit they choose. Larger 140 transfer sizes are preferred since they require fewer READ 141 or WRITE requests to transfer a given amount of data and 142 utilize a TCP stream more efficiently. 144 While the client can use the FSINFO procedure to request the 145 server's maximum and preferred transfer sizes, in the 146 interests of keeping the number of NFS requests to a 147 minimum, WebNFS clients should optimistically choose a 148 transfer size and make corrections if necessary based on the 149 server's response. 151 For instance, given that the file attributes returned with 152 the filehandle from a LOOKUP request indicate that the file 153 has a size of 50k, the client might transmit a READ request 154 for 50k. If the server returns only 32k, then the client 155 can assume that the server's maximum transfer size is 32k 156 and issue another read request for the remaining data. 157 The server will indicate positively when the end of file 158 is reached. 160 A similar strategy can be used when writing to a file on 161 the server, though the client should be more conservative 162 in choosing write request sizes so as to avoid transmitting 163 large amounts of data that the server cannot handle. 165 4.2 Fast Writes 167 NFS version 2 requires the server to write client data 168 to stable storage before responding to the client. 169 This avoids the possibility of the the server crashing 170 and losing the client's data after a positive response. 171 While this requirement protects the client from data 172 loss, it requires that the server direct client write 173 requests directly to the disk, or to buffer client data 174 in expensive non-volatile memory (NVRAM). Either way, 175 the effect is poor write performance, either through 176 inefficient synchronous writes to the disk or through the 177 limited buffering available in NVRAM. 179 NFS version 3 provides clients with the option of having the 180 server buffer a series of WRITE requests in unstable storage. 181 A subsequent COMMIT request from the client will have the 182 server flush the data to stable storage and have the client 183 verify that the server lost none of the data. Since fast 184 writes benefit both the client and the server, WebNFS clients 185 should use WRITE/COMMIT when writing to the server. 187 4.3 READDIRPLUS 189 The NFS version 2 READDIR procedure is also supported in 190 version 3. READDIR returns the names of the entries in 191 a directory along with their fileids. Browser programs that 192 display directory contents as a list will usually display 193 more than just the filename; a different icon may be displayed 194 if the entry is a directory or a file. Similarly, the browser 195 may display the file size, and date of last modification. 197 Since this additional information is not returned by READDIR, 198 version 2 clients must issue a series of LOOKUP requests, one 199 per directory member, to retrieve the attribute data. Clearly 200 this is an expensive operation where the directory is large 201 (perhaps several hundred entries) and the network latency is high. 203 The version 3 READDIRPLUS request allows the client to retrieve 204 not only the names of the directory entries, but also their 205 file attributes and filehandles in a single call. WebNFS clients 206 that require attribute information for directory entries should 207 use READDIRPLUS in preference to READDIR. 209 5. Public Filehandle 211 NFS filehandles are normally created by the server and used 212 to identify uniquely a particular file or directory on the server. 213 The client does not normally create filehandles or have any 214 knowledge of the contents of a filehandle. 216 The public filehandle is an an exception. It is an NFS filehandle 217 with a reserved value and special semantics that allow an initial 218 filehandle to be obtained. A WebNFS client can use the public 219 filehandle as an initial filehandle rather than using the MOUNT 220 protocol. Since NFS version 2 and version 3 have different 221 filehandle formats, the public filehandle is defined differently 222 for each. 224 The public filehandle is a zero filehandle. For NFS version 2 225 this is a filehandle with 32 zero octets. A version 3 public 226 filehandle has zero length. 228 5.1 NFS Version 2 Public Filehandle 230 A version 2 filehandle is defined in RFC 1094 as an opaque value 231 occupying 32 octets. A version 2 public filehandle has a zero 232 in each octet, i.e. all zeros. 234 1 32 235 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 236 |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 237 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 239 5.2 NFS Version 3 Public Filehandle 241 A version 3 filehandle is defined in RFC 1813 as a variable length 242 opaque value occupying up to 64 octets. The length of the filehandle 243 is indicated by an integer value contained in a 4 octet value 244 which describes the number of valid octets that follow. A version 245 3 public filehandle has a length of zero. 247 +-+-+-+-+ 248 | 0 | 249 +-+-+-+-+ 251 6. Multi-component Lookup 253 Normally the NFS LOOKUP request (version 2 or 3) takes 254 a directory filehandle along with the name of a directory 255 member, and returns the filehandle of the directory member. 256 If a client needs to evaluate a pathname that contains 257 a sequence of components, then beginning with the directory 258 filehandle of the first component it must issue a series of LOOKUP 259 requests one component at a time. For instance, evaluation of 260 the Unix path "a/b/c" will generate separate LOOKUP requests for 261 each component of the pathname "a", "b", and "c". 263 A LOOKUP request that uses the public filehandle can provide 264 a pathname containing multiple components. The server is 265 expected to evaluate the entire pathname and return a filehandle 266 for the final component. Both canonical (slash-separated) and 267 server native pathnames are supported. 269 For example, rather than evaluate the path "a/b/c" as: 271 LOOKUP FH=0x0 "a" ---> 272 <--- FH=0x1 273 LOOKUP FH=0x1 "b" ---> 274 <--- FH=0x2 275 LOOKUP FH=0x2 "c" ---> 276 <--- FH=0x3 278 Relative to the public filehandle these three LOOKUP 279 requests can be replaced by a single multi-component 280 lookup: 282 LOOKUP FH=0x0 "a/b/c" ---> 283 <--- FH=0x3 285 Multi-component lookup is supported only for LOOKUP 286 requests relative to the public filehandle. 288 6.1 Canonical Path vs. Native Path 290 If the pathname in a multi-component LOOKUP request begins 291 with an ASCII character, then it must be a canonical path. 292 A canonical path is a hierarchically-related, slash-separated 293 sequence of components, //.../. 294 Occurrences of the "/" character within a component must be 295 escaped using the escape code %2f. Non-ascii characters within 296 components must also be escaped using the "%" character to 297 introduce a two digit hexadecimal code. Occurrences of the "%" 298 character that do not introduce an encoded character must themselves 299 be encoded with %25. 301 If the first character of the path is a slash, then the canonical 302 path will be evaluated relative to the server's root directory. 303 If the first character is not a slash, then the path will be 304 evaluated relative to the directory with which the public 305 filehandle is associated. 307 Not all WebNFS servers can support arbitrary use of absolute 308 paths. Clearly, the server cannot return a filehandle if 309 the path identifies a file or directory that is not exported 310 by the server. In addition, some servers will not return 311 a filehandle if the path names a file or directory in an 312 exported filesystem different from the one that is associated 313 with the public filehandle. 315 If the first character of the path is 0x80 (non-ascii) then 316 the following character is the first in a native path. 317 A native path conforms to the normal pathname syntax of the 318 server. For example: 320 Lookup for Canonical Path: 322 LOOKUP FH=0x0 "/a/b/c" 324 Lookup for Native Path: 326 LOOKUP FH=0x0 0x80 "a:b:c" 328 6.2 Symbolic Links 330 On Unix servers, components within a pathname may be symbolic 331 links. The server will evaluate these symbolic links as a part 332 of the normal pathname evaluation process. If the final 333 component is a symbolic link, the server will return its filehandle, 334 rather than evaluate it. 336 If the attributes returned with a filehandle indicate that 337 it refers to a symbolic link, then it is the client's 338 responsibility to deal with the link by fetching the contents 339 of the link using the READLINK procedure. What follows is 340 determined by the contents of the link. 342 Evaluation of symbolic links by the client is defined only 343 if the symbolic link is retrieved via the multi-component 344 lookup of a canonical path. 346 6.2.1 Absolute Link 348 If the first character of the link text is a slash "/", then 349 the following path can be assumed to be absolute. The entire path 350 must be evaluated by the server relative to the public filehandle: 352 LOOKUP FH=0x0 "a/b" ---> 353 <--- FH=0x1 (symbolic link) 354 READLINK FH=0x1 ---> 355 <--- "/x/y" 356 LOOKUP FH=0x0 "/x/y" 357 <--- FH=0x2 359 So in this case the client just passes the link text back 360 to the server for evaluation. 362 6.2.2 Relative Link 364 If the first character of the link text is not a slash, then 365 the following path can be assumed to be relative to the location 366 of the symbolic link. To evaluate this correctly, the client 367 must substitute the link text in place of the final pathname 368 component that named the link and issue a another LOOKUP relative 369 to the public filehandle. 371 LOOKUP FH=0x0 "a/b" ---> 372 <--- FH=0x1 (symbolic link) 373 READLINK FH=0x1 ---> 374 <--- "x/y" 375 LOOKUP FH=0x0 "a/x/y" 376 <--- FH=0x2 378 By substituting the link text in the link path and having 379 the server evaluate the new path, the server effectively 380 gets to evaluate the link relative to the link's location. 382 The client may also "clean up" the resulting pathname by 383 removing redundant components as described in Section 4. of 384 RFC 1808. 386 6.3 Filesystem Spanning Pathnames 388 NFS LOOKUP requests normally do not cross from one 389 filesystem to another on the server. For instance 390 if the server has the following export and mounts: 392 /export (exported) 394 /export/bigdata (mountpoint) 396 then an NFS LOOKUP for "bigdata" using the filehandle for 397 "/export" will return a "no file" error because the LOOKUP 398 request did not cross the mountpoint on the server. There 399 is a practical reason for this limitation: if the server 400 permitted the mountpoint crossing to occur, then a Unix client 401 might receive ambiguous fileid information inconsistent with 402 it's view of a single remote mount for "/export". It is 403 expected that the client resolve this by mirroring the additional 404 server mount, e.g. 406 Client Server 408 /mnt <--- mounted on --- /export 410 /mnt/bigdata <--- mounted on --- /export/bigdata 412 However, this semantic changes if the client issues 413 the filesystem spanning LOOKUP relative to the public 414 filehandle. If the following filesystems are exported: 416 /export (exported public) 418 /export/bigdata (exported mountpoint) 420 then an NFS LOOKUP for "bigdata" relative to the public 421 filehandle will cross the mountpoint - just as if the 422 client had issued a MOUNT request - but only if the 423 new filesystem is exported, and only if the server 424 supports Export Spanning Pathnames described in Section 6.3 425 of RFC [mmmm]. 427 7. Contacting the Server 429 WebNFS clients should be optimistic in assuming that the server 430 supports WebNFS, but should be capable of fallback to 431 conventional methods for server access if the server does not 432 support WebNFS. 434 The client should start with the assumption that the 435 server supports: 437 - NFS version 3. 439 - NFS TCP connections. 441 - Public Filehandles. 443 If these assumptions are not met, the client should 444 fall back gracefully with a minimum number of 445 messages. The following steps are recommended: 447 1. Attempt to create a TCP connection to the server's 448 port 2049. 450 If the connection fails then assume that a request 451 sent over UDP will work. Use UDP port 2049. 453 Do not use the PORTMAP protocol to determine the 454 server's port unless the server does not respond to 455 port 2049 for both TCP and UDP. 457 2. Assume WebNFS and V3 are supported. 458 Send an NFS version 3 LOOKUP with the public filehandle 459 for the requested pathname. 461 If the server returns an RPC PROG_MISMATCH error then 462 assume that NFS version 3 is not supported. Retry 463 the LOOKUP with an NFS version 2 public filehandle. 465 Note: The first call may not necessarily be a LOOKUP 466 if the operation is directed at the public filehandle 467 itself, e.g. a READDIR or READDIRPLUS of the directory 468 that is associated with the public filehandle. 470 If the server returns an NFS3ERR_STALE, NFS3ERR_INVAL, or 471 NFS3ERR_BADHANDLE error, then assume that the server does 472 not support WebNFS since it does not recognize the public 473 filehandle. The client must use the server's portmap 474 service to locate and use the MOUNT protocol to obtain an 475 initial filehandle for the requested path. 477 WebNFS clients can benefit by caching information about the 478 server: whether the server supports TCP connections (if TCP is 479 supported then the client should cache the TCP connection as 480 well), which protocol the server supports and whether the server 481 supports public filehandles. If the server does not support 482 public filehandles, the client may choose to cache the port 483 assignment of the MOUNT service as well as previously used 484 pathnames and their filehandles. 486 8. Mount Protocol 488 If the server returns an error to the client that indicates 489 no support for public filehandles, the client must use the 490 MOUNT protocol to convert the given pathname to a filehandle. 491 Version 1 of the MOUNT protocol is described in Appendix A of 492 RFC 1094 and version 3 in Appendix I of RFC 1813. Version 2 493 of the MOUNT protocol is identical to version 1 except for 494 the addition of a procedure MOUNTPROC_PATHCONF which returns 495 POSIX pathconf information from the server. 497 At this point the client must already have some indication 498 as to which version of the NFS protocol is supported on the 499 server. Since the filehandle format differs between NFS 500 versions 2 and 3, the client must select the appropriate 501 version of the MOUNT protocol. MOUNT versions 1 and 2 return 502 only NFS version 2 filehandles, whereas MOUNT version 3 returns 503 NFS version 3 filehandles. 505 Unlike the NFS service, the MOUNT service is not registered on a 506 well-known port. The client must use the PORTMAP service to 507 locate the server's MOUNT port before it can transmit a 508 MOUNTPROC_MNT request to retrieve the filehandle corresponding to 509 the requested path. 511 Client Server 512 ------ ------ 514 -------------- MOUNT port ? --------------> Portmapper 515 <-------------- Port=984 ------------------ 517 ------- Filehandle for /export/foo ? ----> Mountd @ port 984 518 <--------- Filehandle=0xf82455ce0.. ------ 520 NFS servers commonly use a client's successful MOUNTPROC_MNT 521 request request as an indication that the client has "mounted" 522 the filesystem and may maintain this information in a file 523 that lists the filesystems that clients currently have mounted. 524 This information is removed from the file when the client 525 transmits an MOUNTPROC_UMNT request. Upon receiving a 526 successful reply to a MOUNTPROC_MNT request, a WebNFS client 527 should send a MOUNTPROC_UMNT request to prevent an accumulation 528 of "mounted" records on the server. 530 Note that the additional overhead of the PORTMAP and MOUNT 531 protocols will have an effect on the client's binding time 532 to the server and the dynamic port assignment of the MOUNT 533 protocol may preclude easy firewall or proxy server transit. 535 The client may regain some performance improvement by utilizing 536 a pathname prefix cache. For instance, if the client already 537 has a filehandle for the pathname "a/b" then there is a good 538 chance that the filehandle for "a/b/c" can be recovered by 539 by a lookup of "c" relative to the filehandle for "a/b", 540 eliminating the need to have the MOUNT protocol translate 541 the pathname. However, there are risks in doing this. 542 Since the LOOKUP response provides no indication of filesystem 543 mountpoint crossing on the server, the relative LOOKUP may 544 fail, since NFS requests do not normally cross mountpoints 545 on the server. The MOUNT service can be relied upon to 546 evaluate the pathname correctly - including the crossing 547 of mountpoints where necessary. 549 9. Exploiting Concurrency 551 NFS servers are known for their high capacity and their 552 responsiveness to clients transmitting multiple concurrent 553 requests. For best performance, a WebNFS client should take 554 advantage of server concurrency. The RPC protocol on which the NFS 555 protocol is based, provides transport-independent support for this 556 concurrency via a unique transaction ID (XID) in every NFS 557 request. 559 There is no need for a client to open multiple TCP connections 560 to transmit concurrent requests. The RPC record marking 561 protocol allows the client to transmit and receive a stream 562 of NFS requests and replies over a single connection. 564 9.1 Read-ahead 566 To keep the number of READ requests to a minimum, a WebNFS 567 client should use the maximum transfer size that it and the 568 server can support. The client can often optimize utilization 569 of the link bandwidth by transmitting concurrent READ requests. 570 The optimum number of READ requests needs to be determined 571 dynamically taking into account the available bandwidth, link 572 latency, and I/O bandwidth of the client and server, e.g. 573 the following series of READ requests show a client using 574 a single read-ahead to transfer a 128k file from the server 575 with 32k READ requests: 577 READ XID=77 offset=0 for 32k --> 578 READ XID=78 offset=32k for 32k --> 579 <-- Data for XID 77 580 READ XID=79 offset=64k for 32k --> 581 <-- Data for XID 78 582 READ XID=80 offset=96k for 32k --> 583 <-- Data for XID 79 584 <-- Data for XID 80 586 The client must be able to handle the return of data 587 out of order. For instance, in the above example the 588 data for XID 78 may be received before the data for XID 77. 590 The client should be careful not to use read-ahead beyond 591 the capacity of the server, network, or client, to handle 592 the data. This might be determined by a heuristic that 593 measures throughput as the download proceeds. 595 9.2 Concurrent File Download 597 A client may combine read-ahead with concurrent download 598 of multiple files. A practical example is that of Web 599 pages that contain multiple images, or a Java Applet that 600 imports multiple class files from the server. 602 Omitting read-ahead for clarity, the download of multiple 603 files, "file1", "file2", and "file3" might look something 604 like this: 606 LOOKUP XID=77 0x0 "file1" --> 607 LOOKUP XID=78 0x0 "file2" --> 608 LOOKUP XID=79 0x0 "file3" --> 609 <-- FH=0x01 for XID 77 610 READ XID=80 0x01 offset=0 for 32k --> 611 <-- FH=0x02 for XID 78 612 READ XID=81 0x02 offset=0 for 32k --> 613 <-- FH=0x03 for XID 79 614 READ XID=82 0x03 offset=0 for 32k --> 615 <-- Data for XID 80 616 <-- Data for XID 81 617 <-- Data for XID 82 619 Note that the replies may be received in a different order 620 from the order in which the requests were transmitted. This 621 is not a problem, since RPC uses the XID to match requests 622 with replies. A benefit of the request/reply multiplexing 623 provided by the RPC protocol is that the download of a large 624 file that requires many READ requests will not delay the 625 concurrent download of smaller files. 627 Again, the client must be careful not to drown the server 628 with download requests. 630 10.0 Timeout and Retransmission 632 A WebNFS client should follow the example of conventional 633 NFS clients and handle server or network outages gracefully. 634 If a reply is not received within a given timeout, the client 635 should retransmit the request with its original XID (described 636 in Section 8 of RFC 1831). The XID can be used by the server 637 to detect duplicate requests and avoid unnecessary work. 639 While it would seem that retransmission over a TCP connection 640 is unnecessary (since TCP is responsible for detecting 641 and retransmitting lost data), at the RPC layer retransmission 642 is still required for recovery from a lost TCP connection, perhaps 643 due to a server crash or, because of resource limitations, the server 644 has closed the connection. When the TCP connection is lost, the 645 client must re-establish the connection and retransmit pending 646 requests. 648 The client should set the request timeout according to the 649 following guidelines: 651 - A timeout that is too small may result in the 652 wasteful transmission of duplicate requests. 653 The server may be just slow to respond, either because 654 it is heavily loaded, or because the link latency is high. 656 - A timeout that is too large may harm throughput if 657 the request is lost and the connection is idle waiting 658 for the retransmission to occur. 660 - The optimum timeout may vary with the server's 661 responsiveness over time, and with the congestion 662 and latency of the network. 664 - The optimum timeout will vary with the type of NFS 665 request. For instance, the response to a LOOKUP 666 request will be received more quickly than the response 667 to a READ request. 669 - The timeout should be increased according to an 670 exponential backoff until a limit is reached. 671 For instance, if the timeout is 1 second, the 672 first retransmitted request should have a timeout of 673 two seconds, the second retransmission 4 seconds, and 674 so on until the timeout reaches a limit, say 30 seconds. 675 This avoids flooding the network with retransmission 676 requests when the server is down, or overloaded. 678 As a general rule of thumb, the client should start with 679 a long timeout until the server's responsiveness is determined. 680 The timeout can then be set to a value that reflects the 681 server's responsiveness to previous requests. 683 11.0 Bibliography 685 [RFC1808] R. Fielding, 686 "Relative Uniform Resource Locators," RFC-1808, 687 June 1995 688 http://www.internic.net/rfc/rfc1808.txt 690 [RFC1831] R. Srinivasan, "RPC: Remote Procedure Call 691 Protocol Specification Version 2," RFC-1831, 692 August 1995. 693 http://www.internic.net/rfc/rfc1831.txt 695 [RFC1832] R. Srinivasan, "XDR: External Data Representation 696 Standard," RFC-1832, August 1995. 697 http://www.internic.net/rfc/rfc1832.txt 699 [RFC1833] R. Srinivasan, "Binding Protocols for ONC RPC 700 Version 2," RFC-1833, August 1995. 701 http://www.internic.net/rfc/rfc1833.txt 703 [RFC1094] Sun Microsystems, Inc., "Network Filesystem 704 Specification," RFC-1094, DDN Network 705 Information Center, SRI International, Menlo 706 Park, CA. NFS version 2 protocol 707 specification. 708 http://www.internic.net/rfc/rfc1094.txt 710 [RFC1813] Sun Microsystems, Inc., "NFS Version 3 Protocol 711 Specification," RFC-1813, DDN Network 712 Information Center, SRI International, Menlo 713 Park, CA. NFS version 3 protocol 714 specification. 715 http://www.internic.net/rfc/rfc1813.txt 717 [RFCmmmm] B. Callaghan, "WebNFS Server Specification," 718 RFC-mmmm, October 1996. 719 http://www.internic.net/rfc/rfcmmmm.txt 721 [Sandberg] Sandberg, R., D. Goldberg, S. Kleiman, D. Walsh, 722 B. Lyon, "Design and Implementation of the Sun 723 Network Filesystem," USENIX Conference 724 Proceedings, USENIX Association, Berkeley, CA, 725 Summer 1985. The basic paper describing the 726 SunOS implementation of the NFS version 2 727 protocol, and discusses the goals, protocol 728 specification and trade-offs. 730 [X/OpenNFS] X/Open Company, Ltd., X/Open CAE Specification: 731 Protocols for X/Open Internetworking: XNFS, 732 X/Open Company, Ltd., Apex Plaza, Forbury Road, 733 Reading Berkshire, RG1 1AX, United Kingdom, 734 1991. This is an indispensable reference for 735 NFS version 2 protocol and accompanying 736 protocols, including the Lock Manager and the 737 Portmapper. 739 [X/OpenPCNFS] X/Open Company, Ltd., X/Open CAE Specification: 740 Protocols for X/Open Internetworking: (PC)NFS, 741 Developer's Specification, X/Open Company, Ltd., 742 Apex Plaza, Forbury Road, Reading Berkshire, RG1 743 1AX, United Kingdom, 1991. This is an 744 indispensable reference for NFS version 2 745 protocol and accompanying protocols, including 746 the Lock Manager and the Portmapper. 748 12. Security Considerations 750 Since the WebNFS server features are based on NFS protocol 751 versions 2 and 3, the RPC based security considerations 752 described in RFC 1094, RFC 1831, and RFC 1832 apply here also. 754 Clients and servers may separately negotiate secure 755 connection schemes for authentication, data integrity, 756 and privacy. 758 13. Acknowledgements 760 This specification was extensively reviewed by the NFS 761 group at SunSoft and brainstormed by Michael Eisler. 763 14. Author's Address 765 Address comments related to this document to: 767 nfs@eng.sun.com 769 Brent Callaghan 770 Sun Microsystems, Inc. 771 2550 Garcia Avenue 772 Mailstop Mpk17-201 773 Mountain View, CA 94043-1100 775 Phone: 1-415-786-5067 776 Email: brent.callaghan@eng.sun.com 777 Fax: 1-415-786-5896