idnits 2.17.1 draft-iab-perlman-folklore-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-03-28) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 6) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 17 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 7 instances of too long lines in the document, the longest one being 4 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (6 January 1998) is 9578 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'Clark' on line 117 looks like a reference -- Missing reference section? 'XXX' on line 537 looks like a reference Summary: 10 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft R. Perlman 2 Sun Microsystems, Inc. 3 6 January 1998 5 Folklore of Protocol Design 7 draft-iab-perlman-folklore-00.txt 9 Status of this Memo 11 This document is an Internet-Draft. Internet-Drafts are working 12 documents of the Internet Engineering Task Force (IETF), its areas, 13 and its working groups. Note that other groups may also distribute 14 working documents as Internet-Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six months 17 and may be updated, replaced, or obsoleted by other documents at any 18 time. It is inappropriate to use Internet-Drafts as reference 19 material or to cite them other than as "work in progress." 21 To view the entire list of current Internet-Drafts, please check the 22 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 23 Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe), 24 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 25 ftp.isi.edu (US West Coast). 27 Abstract 29 This document is intended to set the tone as an IETF collaboration to 30 collect various tricks and ''gotchas'' in protocol design. It is not 31 intended to declare the ''right'' and ''wrong'' ways of doing things, but 32 rather ''this practice has the following advantages and 33 disadvantages'', or ''here are several ways of solving the following 34 problem'', with technical explanation of the pros and cons of the 35 various approaches. 37 Discussion will take place on the mailing list 38 folklore@external.cisco.com. To join, send a message to folklore- 39 request@external.cisco.com. 41 1 Simplicity vs Flexibility vs Optimality 43 Obviously a simpler protocol is better, all things being equal, but 44 other goals, such as making the protocol flexible enough to fit every 45 possible situation or always finding the theoretically optimal 46 solution, create a more complex protocol. The question to ask is 47 whether the tradeoff is worth it. Sometimes going after "the 48 optimal" solution makes a protocol many times as complex, when users 49 wouldn't actually be able to tell the difference between a "pretty 50 good" solution and an "optimal" solution. Also, sometimes designing 51 for every possible problem and every possible future technology 52 change makes a protocol too complicated for the added flexibility. 53 The simpler the protocol, the more likely it is to be successfully 54 implemented and deployed. If a protocol works in most situations, but 55 fails in some obscure case, such as a network in which there are 300 56 baud links or routers implemented on toasters, it might be worthwhile 57 to abandon those cases, either forcing users to upgrade their 58 equipment or design a custom protocol for those networks. 60 Underspecification creates complexity. When the goal of flexibility 61 is carried too far, one can wind up with a protocol that is so 62 general that it is unlikely that two independent, conformant (to the 63 specification) implementations will interwork. Many of the ISO 64 protocols had this property. The specification was so general, and 65 left so many choices, that it was necessary to hold "implementor 66 workshops" to agree on what subsets to build and what choices to 67 make. The specification wasn't a specification of a protocol. Instead 68 it was a framework in which a protocol could be designed and 69 implemented. In other words, rather than specifying an algorithm for, 70 say, data compression, the standard would only specify "compression 71 type", and "type-specific data". Often even the type codes would not 72 be defined in the specification, much less the specifics of each 73 choice. Choices are often the result of the inability of the 74 committee to reach consensus. 76 An interesting example is cryptographic algorithm choices. For 77 example, PGP specified "RSA for keys, IDEA for encryption". One 78 argument is that it is necessary to have a choice of algorithms, in 79 case an algorithm is broken or is only legal in some countries. 80 However, having a choice of algorithms means the protocol has to be 81 more complex in order to negotiate algorithms, and runs the risk of 82 non-interoperability because different nodes might implement non- 83 overlapping subsets. If simplicity is chosen instead of flexibility, 84 then a new protocol can be deployed if an algorithm is broken, or in 85 countries where the chosen algorithm is illegal. But then there it 86 could be argued that a new protocol is needed in order to negotiate 87 which of the simple, non-flexible protocols to use, and the result is 88 similar to having designed a flexibility protocol with algorithm 89 choices. 91 A middle ground for something like cryptographic algorithms, where 92 there is the possibility that one or more will be broken, is to 93 specify a set of algorithms, and have all implementations capable of 94 using any from that set. Then later, if an algorithm gets broken it 95 is simple to configure each implementation to no longer generate (or 96 accept) that algorithm. 98 2 Define the Problem 100 The first step to designing a good protocol is defining the problem. 101 What applications will use it? What are their "must have" needs, vs 102 their "desirable" features. One example is multicast. A protocol 103 reasonable for broadcasting IETF meetings to the majority of the 104 Internet might be very different from a protocol for a conference 105 call of several participants. Is it better to design one general 106 protocol that will meet the needs of very different sorts of 107 multicast groups, or is it better to design multiple protocols? The 108 answer is "it depends", but before designing any protocol, it is good 109 to jus- tify the choice. A justification for designing without 110 defining the problem is that one cannot imagine what applications 111 will develop. Design the tool and the applications will come. The 112 argument against is that a protocol designed without defining the 113 problem is likely to be more complex and expensive (bandwidth, etc) 114 than necessary, and if an appli 116 Another example is "policy based routing". Dave Clark described the 117 general problem, from a theoretical point of view, in [Clark]. But 118 nobody ever described all the actual customer needs. BGP provides 119 some set of policies, but not the general case. For instance, a BGP 120 router chooses a single path to the destination, without taking into 121 account the source. Maybe some sources need to have data routed 122 differently from others. 124 Did BGP solve the important cases, or did the world adapt to what BGP 125 happened to solve? If the latter, would the world have been satisfied 126 with a more conveniently accommodated subset, or perhaps even without 127 policy-based routing at all? 129 3 Overhead/Scaling 131 One should calculate the overhead of the algorithm. For example, the 132 bandwidth used by source route bridging increases exponentially with 133 the number of nodes in a reasonably richly interconnected topology. 134 It is usually possible to choose an algorithm with less dramatic 135 growth, but most algorithms have some limit. Make reasonable bounds 136 on the limits, and publish these in the specification. 138 Sometimes there is no reason to scale beyond a certain point. For 139 example, a protocol that was n**2 or even exponential might be 140 reasonable if it's known that there would never be more than 5 nodes 141 participating. 143 4 Operation Above Capacity 145 If there are assumptions about the size of the problem to be solved, 146 either the limit should be so large that it would never in practice 147 be exceeded, or the protocol should be designed to gracefully degrade 148 if the limit is exceeded, or at the very least detect that the 149 topology is now illegal and complain (or disconnect a subset to bring 150 the topology within legal limits). 152 An example of a protocol that considered graceful operation beyond 153 expected limits was IS-IS, when a router's capacity for storing link 154 state information was exceeded. Routing depends on all routers making 155 decisions based on identical link state databases, so loops and other 156 disruption can form if a router attempts to continue making decisions 157 based on a subset of the information. The protocol was designed so 158 that: 160 * an overloaded router would not disrupt operations by being on any 161 paths (except as a last resort) 163 * the router was still reachable on the network, so that it could be 164 remotely managed 166 * if the router was on a cut set of the network, the nodes on the 167 other side could (probably) still be reachable through that router 169 * if the routing database somehow got smaller, the router would 170 return to normal operation without human intervention 172 This was accomplished by having the router report, in its own link 173 state information, that it was "overloaded". Other routers treated 174 links to that router as usable on as a "last resort". If some amount 175 of time elapsed without the router needing to discard link state 176 information, the router decleared itself normal again by reissuing 177 its link state information. 179 5 Identifiers 181 Often a protocol contains a field indentifying something, for 182 instance a protocol type. Most IETF standards have numbers assigned 183 by the IANA. This enables a field to be reasonbly compact. An 184 alternative is an "object identifier" as in ASN.1. Object identifiers 185 are very large, but have the advantage that it is not necessary to 186 obtain one from the IANA, since the hierarchical structure of the 187 object identifier makes it possible to get a unique identifier 188 without central administration. There might also be cases in which 189 companies might want to deploy proprietary extensions without letting 190 anyone know that they are doing this. With an object identifier it is 191 not necessary to tell a central authority of your plans. And in some 192 cases the central authority might publicly divulge the assigned 193 numbers, and the recipient of each assigned number. 195 There are several disadvantages to object identifiers: 197 * the field is larger, and therefore consumes memory and 198 bandwidth and CPU 200 * there is no central place to look up all the currently used 201 object identifiers, so it might be difficult to debug a network 203 * sometimes the same protocol will wind up with multiple object 204 identifiers, again because there is no central coordination so two 205 different organizations might define an object identifier for the sa= 206 me 207 protocol. Then it is possible that two implementations might be in 208 theory interoperable, but since the object identifiers assigned to 209 some field differ, the two implementations might refuse to 210 interoperate. 212 6 Optimize for Most Common or Important Case 214 Huffman coding is an example of this principle. It might be 215 applicable to implementation or to protocol design. An example of an 216 implementation that optimizes for the usual case is one in which a 217 "common" IP packet (no options, nothing else unusual) is switched in 218 hardware, whereas if there is anything unusual about the packet it is 219 sent to the dungeon of the central processor to be prodded and 220 pondered when the router finds it convenient. An example of this 221 principal in protocol design is encoding "unusual" requests, such as 222 source routing, as an option, which is less efficient in space and in 223 parsing overhead than having the capability encoded in a fixed 224 portion of the header. 226 7 Forward Compatibility 228 Protocols generally evolve, and it is good to design it with 229 provision for making minor or major changes. Some changes are 230 "incompatible", so that it is preferable for the later version node 231 to be aware that it is talking to an earlier version node, and switch 232 to speaking the earlier version of the protocol. Other changes are 233 "compatible", where later version protocol messages can be processed 234 without harm by earlier version nodes. There are various techniques. 236 7.1 Large Enough Fields 238 A common mistake is to make fields too small. It is better to 239 overestimate than to underestimate. It greatly expands the lifetime 240 of a protocol. Examples of fields that one could argue should have 241 been larger are: 243 IP address 244 "packet identifier" in IP header (because it could wrap around withi= 245 n 246 a packet lifetime) 248 "fragment identifier" in IS-IS (because an LSP could be larger than = 249 256 250 fragments) 252 packet size in IPv6 (though some might argue that the "optimize for 253 most common case" is the reason for splitting the high order part in= 254 to 255 an option in the very unusual case where packets larger than 64K byt= 256 es 257 would be desired) 259 date fields 261 7.2 Independence of Layers 263 It is desirable to design a protocol with as little as possible 264 dependence on other layers, so that in the future one layer can be 265 replaced without affecting other layers. An example is having 266 protocols above layer 3 make the assumption that addresses are 4 267 bytes long. 269 The downside of this principal is that if you do not exploit the 270 special capabilities of a particular technology at layer n, then you 271 wind up with "least common denominator". For example, not all data 272 links provide multicast capability, yet it is very useful for routing 273 algorithms to use link level multicast for neighbor discovery, 274 efficient propagation of information to all LAN neighbors, etc. If 275 we adhered too strictly to the principal of not making special 276 assumptions about the data link layer, then we might not have allowed 277 layer 3 to exploit the multicast capability of some layer 2 278 technologies. 280 Another danger of exploiting special capabilities of layer n-1 is 281 that a new technology at layer n-1 might need to be altered in 282 unnatural ways to make it support the API designed for a different 283 tech- nology. An example is attempting to make a technology like 284 Frame Relay or SMDS provide multicast so that it "looks like" 285 Ethernet. For example, the way in which multicast was simulated in 286 SMDS was to have packets with a multicast destination address 287 transmitted to a special node that was manually configured with the 288 individual members, and that node individually addressed copies of 289 the "multicast" packet to each of the recipients. 291 7.3 Reserved Fields 293 Often there are spare bits. If they are carefully specified to be 294 transmitted as zero and ignored upon receipt, then they can later be 295 used for functions such as signaling that the transmitting node has 296 implemented later version features, or they can be used to encode 297 information such as priority that is safe for some nodes to not 298 understand. This is an excellent example of the maxim "Be 299 conservative in what you send, and liberal in what you accept", 300 because you should always set reserved bits to zero and ignore them 301 upon receipt. 303 7.4 Single Version Number Field 305 One method of expressing version is a single number. What should an 306 implementation do if the version number is different? Sometimes a 307 node might implement multiple previous versions. Sometimes later 308 versions are indeed compatible with previous versions. 310 It is generally good to specify that a node that receives a packet 311 with a larger version number simply drop it, or respond with an 312 earlier version packet, rather than logging an error, or crashing. If 313 two nodes attempt to communicate, and the one with the larger version 314 notices it is talking to a node with a smaller version, the later 315 version node simply switches to talking the older version of the 316 protocol, setting the version number to the one recognized by the 317 other side. 319 One problem that can result is that two new version nodes might get 320 tricked into talking the old version of the protocol to each other, 321 since any memory from one side that the other side is older will 322 cause it to talk the older version, and therefore cause the other 323 side to talk the older version. A method of solving this problem is 324 to use a reserved bit indicating "I could be speaking a later version 325 but I think this is the latest version you support". Another 326 possibility is to periodically probe with a later version packet. 328 7.5 Split Version Number Field 330 This strategy uses two or more subfields, sometimes referred to as 331 "major" and "minor" version numbers. The major subfield is 332 incremented if the protocol has been modified in an incompatible way 333 and it is dangerous for an old version node to attempt to process the 334 packet. The minor subfield is incremented if there are compatible 335 changes to the protocol. An example of a compatible change is where a 336 Transport layer protocol might have added the feature of delayed acks 337 to avoid silly window syndrome [Clark's paper]. 339 The same result could be applied with reserved bits (signalling that 340 you implement enhanced features that are compatible with this 341 version), but having a "minor" version field in addition to the 342 "major version" allows 2**n possible enhancements to be signalled 343 with an n-bit "minor version" field (assuming the enhancements were 344 added to the protocol in sequential order, so that announcing 345 enhancement 23 means you support all previous enhancements as well). 347 If you want to allow more flexibility than "all versions up to n", 348 then there are various possibilities: 350 * I support all capabilities between k and n (requires double the 351 "minor" version field) 353 * I support capabilities 2, 3, and 6 (probably better off with a 354 bitmask) 356 With a version number field, care must be taken if it is allowed to 357 wrap around. It is far simpler not to face this issue by either 358 making the version number field very large or being conservative 359 about incrementing it. 361 7.6 Options 363 Another way of providing for future protocol evolution is to allow 364 appending "options". IP has option fields. It is desirable to encode 365 it in a way so that an unknown option can be skipped. Though 366 sometimes it is desirable for an unknown option to generate an error 367 rather than be ignored. The most flexible capability is to specify 368 for each option what a node that does not recognize the option should 369 do, whether it be "skip and ignore", "skip and log", or "stop parsing 370 and generate error" 372 To be able to skip unknown options, strategies are: 374 * have a special marker at the end of the option (requires linear scan= 376 of option to find the end) 378 * have options be TLV encoded, which means a "type" field, a "length" 379 field, and a "value" field. 381 Note that the "L" has to always mean the same thing. Sometimes 382 protocols have L depend on T, for instance not having any L field if 383 the particular type is always fixed length, or having the L be 384 expressed in bits vs bytes. If L depends on T then an unknown option 385 cannot be skipped. Another way to make it impossible to parse an 386 unknown option is if L is the "usable length", and the actual length 387 is always padded to, say, a multiple of 8 bytes. If the specification 388 is clear that all options interpret L that way, then options can be 389 parsed, but if some option types use L as "how much data to skip" and 390 others as "relevant information" to which padding is inferred 391 somehow, then it is not possible to parse unknown options. 393 To know what to do with unknown options there are various strategies: 395 * Specify the handling of all unknown types (e.g., skip and log, skip 396 and ignore, generate error and ignore entire packet) 398 * Have a field present in all options that specifies the handling of 399 the option (such as the "copy" flag in IPv4 that specifies whether 400 an option should be copied into each fragment or just the initial 401 fragment, so that a router can perform that even if the router does 402 not understand the option). 404 * Have the handling implicit in the type number, for instance a range 405 of T valies that the specification says should be ignored and 406 another range to be skipped and logged, etc.. This is similar to 407 considering a bit in the type field as a flag indicating the 408 handling of the packet. 410 An example of an option that would make sense to ignore if unknown is 411 priority. An example of an option in which the packet should be 412 dropped is strict source routing. 414 8 Parameters 416 There are various reasons for having parameters, some good and some 417 bad. 419 * the protocol designers could not figure out the proper values, so 420 leave it to the user to figure it out. This might make sense, if 421 deployment experience might help determine reasonable values. 422 However, if the protocol designers simply can't decide, it is 423 unreasonable to expect the users to have any better judgement. At any 424 rate, if deployment experience does give enough information to set 425 the values, then the parameters should no longer be settable, and 426 should instead just be constants specified in the specification 428 * there are reasonable tradeoffs, say between responsiveness and 429 overhead. In this case, the parameter descriptions should explain the 430 range, and reasons for choosing points in the range. 432 In general, it is a good idea to avoid parameters wherever possible, 433 because it makes for intimidating documentation which must be written 434 and, more importantly, read, in order to use the protocol. It is 435 also desirable, whenever possible, for the computers to figure out 436 the values for the parameters rather than forcing the parameter to be 437 set by humans. Examples include link cost, which could be measured at 438 link startup time by measuring the round trip delay and bandwidth, 439 and network layer address. 441 It is important to design the protocol so that parameters set by 442 people can be modified in a running network, one node at a time. 444 In some protocols, parameters can be set incorrectly and the protocol 445 will not run properly. Unfortunately it isn't as simple as having a 446 legal range for the parameter, because one parameter might interact 447 with another, even a parameter in a different layer. In a distributed 448 system it's possible for two systems to independently have reasonable 449 parameter settings, but have the parameter settings incompatible. A 450 simple example of incompatible settings is in a neighbor aliveness 451 detection protocol, where one sends hellos every n seconds and the 452 other declares the neighbor dead if it does not hear a hello for k 453 seconds. If k is not greater than n, the protocol will not work very 454 well. 456 There are some tricks for causing parameters to be compatible in a 457 distributed system. In some cases, it is reasonable for nodes to 458 operate with different parameter settings, just so long as all the 459 nodes know the parameter setting of other (relevant) nodes. The 460 "report" method has node N report the value of its parameter, in 461 protocol messages, to all the other nodes that need to hear it. IS-IS 462 uses the "report" method. If the parameter is one that neighbors need 463 to know, then it would be reported in a "Hello" message (a message 464 that does not get forwarded, and is therefore only seen by the 465 neighbors). If the parameter is one that all nodes (in an area) need 466 to know, then it would be reported in an LSP. This method allows each 467 node to have independent parameter settings and yet interoperate, 468 because for example, a node will adjust its Listen timer (when to 469 declare a neighbor dead) for neighbor N based on N's reported Hello 470 timer (how often it sends Hellos). 472 Another method is the "detect misconfiguration" method, in which 473 parameters are reported so that nodes can detect whether they are 474 misconfigured. An example where the "detect misconfiguration" 475 strategy makes sense is where routers on a LAN might report to each 476 other the (IP address, subnet mask) of the LAN. 478 An example where the "detect misconfiguration" method is not the best 479 choice is the OSPF protocol, which puts the Hello timer and other 480 parameters into Hello messages, and has neighbors refuse to talk if 481 the parameter settings aren't identical. This forces all nodes on a 482 LAN to have the same Hello timer, but there might be legitimate 483 reasons why the responsiveness/overhead tradeoff for one router might 484 be different than for another router, so that neighbors might 485 legitimately need different values for the Hello Timer. Also, the 486 OSPF method makes it difficult to change parameters in a running 487 network because neighbors will refuse to talk to each other while the 488 network is being migrated from one value to another. 490 Another method is the "use my parameters" method. One example is the 491 bridge spanning tree algorithm, where the Root bridge reports, in its 492 spanning tree message, its values for parameters that should be used 493 by all the bridges. This way bridges can be configured one by one, 494 but a non-Root bridge will simply store the configured value in 495 nonvolatile storage to be used if that bridge becomes Root. The value 496 everyone uses for the parameters are the ones as configured into the 497 bridge that is currently acting as Root. This is a reasonable 498 strategy provided that there is no reason to want nodes to be working 499 with different parameter values. 501 Another example of "use my parameter" is Appletalk, where the "seed 502 router" informs the other routers of the proper LAN parameters, such 503 as network number range. However, it is different from the bridge 504 algorithm because if there is more than one seed router, they must be 505 configured with the same parameter values. 507 A dangerous version of the "use my parameters" method is one in which 508 all nodes store the parameters when receiving a report. This might 509 lead to problems because misconfiguring one node can cause all the 510 other nodes to be permanently misconfigured. In contrast, with the 511 bridge algorithm, although the Root bridge might get misconfigured 512 with undesirable parameters, even if those parameters cause the 513 network to be nonfunctional, simply disconnecting the Root bridge 514 will cause some other bridge to take over, and cause all bridges to 515 use that bridge's parameter settings. Or simply reconfiguring the one 516 Root bridge will clear the network. 518 9 Making Multiprotocol Operation Possible 520 Unfortunately, there is not a single protocol or protocol suite in 521 the world. There will be computers that will want to be able to 522 receive packets in multiple "languages". Unfortunately, since the 523 protocol designers do not in general coordinate with each other to 524 make their protocols self-describing, it is necessary to figure out a 525 way to ensure that a computer can receive a message in your protocol 526 and not confuse it with another protocol the computer may also be 527 capable of handling. 529 There are several methods of doing this, and because of that it can 530 be very confusing. There is no single "right" way to do it, although 531 the world would be simpler if everyone did it the same way, but we 532 will attempt to explain the various approaches: 534 * protocol type at layer (n-1): This is a field administered by the 535 owner of the layer n-1 specification. Each layer n protocol that 536 wishes to be carried in a layer (n-1) envelope is given a unique 537 value. The Ethernet standard [XXX] has a protocol type field 538 assigned. 540 * socket, port, or SAP at layer (n-1). This consists of two fields at 541 layer (n-1), one applying to the source and the other applying to the 542 destination. This makes sense when these fields need to be applied 543 dynamically. However, almost always when this approach is taken, 544 there are some predefined "well-known" sockets. A process tends to 545 "listen" on the well-known socket, and wait for a dynamically 546 assigned socket from another machine to connect. In practice, 547 although the IEEE 802.2 header is defined as using "SAP"s, in reality 548 the field is used as a protocol type, because the SAP values are 549 either well-known (and therefore the Destination and Source SAP 550 values will be the same), or there is a special SAP known as the 551 "SNAP SAP" which indicates that true multiplexing is done with a 552 protocol type later in the header. 554 * Protocol type at layer n. This consists of a field in the layer n 555 header that allows multiple different protocol n protocols to 556 distinguish themselves from each other. This is usually done when 557 multiple protocols defined by a particular standards body share the 558 same layer (n-1) protocol type. One could argue that the "version 559 number" field in IP is actually a layer-n protocol type, especially 560 since "version"=3D5 is clearly not intended as the next "version" of 561 IP. 563 So the multiplexing information might be one field or two (one for 564 source, one for destination), and the multiplexing information might 565 be dynamically asisgned or "well-known". 567 Multiplexing based on dynamically assigned sockets does not work well 568 with n-party protocols, so for something like a LAN on which 569 multicast is possible, sockets would be the wrong choice. In 570 particular, IEEE made the wrong choice when it changed the Ethernet 571 protocol to have sockets (SAPs), especially with the destination and 572 source sockets being only 8 bits long. Furthermore they defined 2 of 573 the bits, so there were only 64 possible values to assign to "well- 574 known" sockets, and 64 possible values to be assigned dynamically, or 575 by anyone other than IEEE. Because of this mistake, the SNAP encoding 576 was invented, whereby a single well-known socket (the SNAP SAP) was 577 assigned to indicate that the header was expanded to include a true 578 protocol type field. 580 Dynamically assigned values work best in a connection-oriented 581 environment. If one believes the Ethernet should always be combined 582 with LLC type 2 (connection oriented, reliable protocol), then it 583 might be reasonable to multiplex based on sockets. Indeed it is 584 similar to combining TCP or UDP with Ethernet, and including the 585 TCP/UDP port numbers in the combined protocol. However, if 586 reliability is considered as belonging in a different layer (if 587 needed at all), then SAPs were a poor choice. 589 If protocol types were used instead of SAPs in IEEE for multiplexing, 590 then all the functionality of LLC type 2 (or any other connection- 591 oriented protocol) could have been easily accomplished by assigning 592 LLC type 2 a protocol type, and having LLC type 2 define socket 593 fields within its own header. It is not as easy to accommodate 594 connectionless protocols on top of sockets unless you "cheat" by 595 assigning well-known socket values, and basically treating the socket 596 as a protocol type. Especially in the IEEE case this was 597 inconvenient because there were not enough socket values to assign a 598 well-known value to every connectionless protocol. The SNAP kludge 599 saved the day, though, by allowing all connectionless protocols to 600 share a single SAP. 602 10 Running over Layer 3 vs Layer 2 604 Sometimes protocols that only work neighbor to neighbor are 605 encapsulated in a layer 3 header. An example is many of the routing 606 protocols for routing IP. Since such messages are not intended to 607 ever be forwarded by IP, there is no reason to have an IP header. The 608 IP header makes the messages longer, and care must be taken to ensure 609 that packets don't actually get routed, because that could confuse 610 distant routers into thinking they are neighbors. 612 The alternative is to acquire a layer 2 protocol type. 614 Sometimes there are implementation reasons to run a neighbor-to- 615 neighbor protocol such as a routing algorithm over layer 3. For 616 instance, there might be an API for running over layer 3, so that the 617 application can be built as a user process, whereas there might not 618 be an API for running over layer 2, and therefore running over layer 619 2 would require modifications to the kernel. Or it might be 620 bureacratically difficult to obtain a layer 2 protocol type. 622 11 Robustness 624 One type of robustness is "simple robustness", where the protocol 625 adapts to node and link fail-stop failures. 627 Another type is "self-stabilization", where although operation might 628 have become disrupted due to extraordinary events like a 629 malfunctioning node injecting incorrect messages, once the malfunc- 630 tioning node is disconnected from the network, the network should 631 return to normal operation. The ARPANET link state distribution 632 protocol was not self-stabilizing, and after a sick router injected a 633 few bad LSPs, the network would have been down forever without hours 634 of difficult manual intervention, even though the sick router had 635 failed completely hours before and only "correctly functioning" 636 routers were participating in the protocol. 638 Another type is "Byzantine robustness", where the network can 639 continue to work properly even in the face of malfunctioning nodes, 640 whether the malfunctions be due to hardware problems or even malice. 642 As society gets more dependent on networks, it is desirable to 643 attempt to achieve Byzantine robustness in any distributed algorithm 644 such as clock synchronization, directory system synchronization, or 645 routing. This is difficult, however it is important if the protocol 646 is to be used in a hostile environment (such as where the nodes 647 cooperating in the protocol are remotely manageable from across the 648 Internet, or where a disgruntled employee might be able to physically 649 access one of the nodes). 651 Some interesting points to consider for making a system robust: 653 * every line of code should be exercised frequently. If there is code 654 that only gets invoked when the nuclear power plant is about to 655 explode, it is possible that the code will no longer work when it is 656 actually needed. This could be due to modifications that have been 657 made to the system since the special case code was last checked, or 658 seemingly unrelated events such as increasing link bandwidth. 660 * sometimes it is better to crash rather than gradually degrade in 661 the presence of problems, so that the problems get fixed or at least 662 diagnosed. For example, it might be preferable to bring down a link 663 that has a high error rate. 665 * it is sometimes possible to partition the network with containment 666 points, so that a problem on one side will not spread to the other. 667 An example is attaching two LANs with a router vs a bridge. A 668 broadcast storm (using data link multicast) will "spread" to both 669 sides, whereas it will not spread through a router 671 * Connectivity can be weird. For instance, a link might be one-way, 672 either because that is the way the technology works or because the 673 hardware is broken (e.g., one side has a broken transmitter, or the 674 other has a broken receiver).. Or a link might work except be 675 sensitive to certain bit patterns. Or it might look to your protocol 676 like a node is a neighbor when in fact there are bridges in between, 677 and somewhere on the bridged path is a link with a smaller MTU size. 678 Therefore it could look like you are neighbors, but indeed packets 679 beyond a certain size will not succeed. It is a good idea to have 680 your protocol check that the link is indeed functioning properly 681 (e.g., pad hellos to maximum length to determine if large packets 682 actually get through, test that connectivity is 2-way, etc.) 684 * Certain checksums detect certain error conditions better than 685 others. For example, if bytes are getting swapped, the Fletcher 686 checksum will catch the problem whereas the IPv4 checksum will not. 688 12 Determinism vs Stability 690 The Designated Router election protocols in IS-IS and OSPF differ in 691 an interesting way. In IS-IS the protocol is "deterministic", 692 considered by some to be a desirable property. "Determinism" means 693 that the behavior at this moment does not depend on past events. So 694 the protocol was designed so that given a particular set of routers 695 that are up, the same one would always be DR. In contrast, OSPF went 696 for "stability", to cause minimal disruption to the network if 697 routers go up or down. In OSPF, once a node is elected DR it will 698 remain DR unless it crashes, whereas in IS-IS if the router with a 699 "better" configured priority will usurp the role when it comes up. 701 A good compromise was done for the NLSP protocol (basically IS-IS for 702 IPX). Nodes change their priority by some constant (say 20) after 703 being DR for some time (say a minute). Then by configuring all the 704 routers with the same priority th protocol acts like OSPF. By 705 configuring all the routers with priorities more than 20 apart, it 706 acts like IS-IS. To allow OSPF-like behavior among a particular 707 subset of the routers (e.g., higher capacity routers), set them all 708 with a priority 20 greater than any of the other routers. That way if 709 any on the high priority set is up a high priority router will become 710 DR, but no other router will usurp the role. 712 Perhaps a simpler way to think of it is that each router could be 713 configured with two priorities, one initially and one after being DR 714 for a time. 716 13 Performance for Correctness 718 Sometimes in order to be "correct" an implementation must meet 719 certain performance constraints. An example is the bridge spanning 720 tree algorithm. Loops in a bridged network can be disastrous, since 721 packets can proliferate exponentially while they are looping. The 722 spanning tree algorithm depends on receipt of spanning tree messages 723 in order to keep a link from forwarding. If temporary congestion 724 caused a bridge to throw away packets before processing them, then 725 the bridge might be throwing away spanning tree messages, causing 726 links that should be in hot-standby to forward traffic, causing loops 727 and exponentially more congestion. It is very possible that a bridged 728 topology might not recover from such an event. Therefore it is highly 729 desirable, if not something worth mandating, that bridges operate at 730 wire speed. 732 A lot of denial of service attacks are possible (e.g., TCP SYN 733 attack) because nodes are not capable of processing every received 734 packet at wire speeds. 736 14 ASN.1 738 The concept of ASN.1 is appealing. You don't have to think of how the 739 actual data would be represented on each machine. Bit/byte order, 740 word size do not have to be considered by the protocol designer. Many 741 protocols therefore define their packet formats using ASN.1. However 742 there are certain "gotchas" that should be understood to decide 743 whether ASN.1 is a good choice: 745 * ASN.1 has a lot of overhead. It adds bytes of overhead in databases 746 and bytes on the wire, and increases the complexity of the code. 747 Although an expert in ASN.1 can define structures so that they will 748 generate reasonably efficient data structures, a nonexpert can easily 749 create wildly inefficient structures. For example, the way an address 750 was defined in ASN.1 in Kerberos version 5, an IPv4 address would be 751 encoded (in databases and on the wire) in 11 bytes, whereas an ASN.1 752 expert could have defined it differently, to use 6 bytes. Some might 753 argue that a naive C programmer can generate inefficient code, but 754 perhaps inefficient C code is less important because it only effects 755 the inside of a machine, and can later be improved, whereas an 756 inefficient data structure results in bits on the wire. 758 * TLV encoding makes optional fields easy and should make forward 759 compatibility easy. However, ASN.1 1984 was not implemented to make 760 it easy to add optional fields. Athough it translated into TLV 761 encoding, the parser would reject a data structure with added fields. 762 Although the 1988 version of ASN.1 fixed this, most protocols 763 continue to use 1984 ASN.1 because of the availability of 1984 ASN.1 764 compilers. 766 15 Security Pitfalls 768 Although a complete coverage of security pitfalls is beyond the scope 769 of a short paper, it is probably useful to note a few. 771 * bad random number generators for seeds for keys. Though this is 772 usually an implementation problem rather than a protocol problem, it 773 is a sufficiently common mistake that it is worth mentioning 775 * encryption alone does not necessarily provide data integrity. For 776 example, an encryption algorithm that precomputes a pseudorandom bit 777 string, and XOR's it with the data. If the data is predictable, then 778 the real data can be XOR'd out, and replaced with new data, even 779 though the ciphertext cannot be "decrypted" 781 * reflection attacks, especially with multiple servers. If the same 782 secret is used with multiple servers, a common mistake in some (bad) 783 protocols allows a message sent to one to be replayed at another 785 * backward compatibility with weak or broken crypto alogithms. 786 Sometimes for compatibility with exportable versions, or old 787 versions, a negotiation is done in which one side can request weaker 788 security. If this negotiation is not itself integrity protected, an 789 intruder can fool two sides capable of talking good security into 790 speaking weaker security by injecting a message into the negotiation 791 requesting the weaker security. 793 * IP addresses are spoofable. Sometimes the assumption is that only 794 the client needs to authenticate to the server. However, if an 795 intruder spoofs a server, it can cause the client machine to do 796 things like send the user's password in the clear. 798 * Sometimes protocols can trick something into decrypting or signing 799 something. For example, if the method of authentication is to accept 800 any abritrary challenge and sign it with your private key, then the 801 "challenge" might actually be a promise to pay someone a million 802 dollars. The PKCS standards are designed to avoid this sort of 803 pitfall. 805 16 Author's Address 807 Radia Perlman 808 Sun Microsystems, Inc. 809 2 Elizabeth Drive 810 Chelmsford, MA 01824 811 Tel: +1.978.442.3252 812 Email: radia.perlman@sun.com