idnits 2.17.1 draft-jhsrha-forces-netlink2-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([RFC3549]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 1541 has weird spacing: '... header with ...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 26, 2003) is 7488 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'TBD' is mentioned on line 1220, but not defined == Missing Reference: 'TBF' is mentioned on line 1580, but not defined == Unused Reference: 'Diffserv' is defined on line 1983, but no explicit reference was found in the text == Unused Reference: 'Netfilter' is defined on line 2005, but no explicit reference was found in the text == Unused Reference: 'RFC1157' is defined on line 2008, but no explicit reference was found in the text == Unused Reference: 'RFC1633' is defined on line 2012, but no explicit reference was found in the text == Unused Reference: 'RFC1812' is defined on line 2016, but no explicit reference was found in the text == Unused Reference: 'RFC2475' is defined on line 2025, but no explicit reference was found in the text == Unused Reference: 'RFC2748' is defined on line 2030, but no explicit reference was found in the text == Unused Reference: 'RFC2844' is defined on line 2035, but no explicit reference was found in the text == Unused Reference: 'RFC3036' is defined on line 2039, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'Diffserv' -- Possible downref: Non-RFC (?) normative reference: ref. 'Goutaudier' -- Possible downref: Non-RFC (?) normative reference: ref. 'Netfilter' ** Downref: Normative reference to an Historic RFC: RFC 1157 ** Downref: Normative reference to an Informational RFC: RFC 1633 ** Downref: Normative reference to an Informational RFC: RFC 2475 ** Downref: Normative reference to an Experimental RFC: RFC 2844 ** Obsolete normative reference: RFC 3036 (Obsoleted by RFC 5036) ** Downref: Normative reference to an Informational RFC: RFC 3358 ** Downref: Normative reference to an Informational RFC: RFC 3549 -- Possible downref: Non-RFC (?) normative reference: ref. 'Stevens' -- Possible downref: Non-RFC (?) normative reference: ref. 'TCP-SYN-COOKIES' -- Possible downref: Non-RFC (?) normative reference: ref. 'XTP' Summary: 11 errors (**), 0 flaws (~~), 14 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 ForCES Working Group J. Hadi Salim 3 Internet-Draft Znyx Networks 4 Expires: April 25, 2004 R. Haas 5 IBM Research 6 S. Blake 7 Ericsson 8 October 26, 2003 10 Netlink2 as ForCES Protocol 11 draft-jhsrha-forces-netlink2-02.txt 13 Status of this Memo 15 This document is an Internet-Draft and is in full conformance with 16 all provisions of Section 10 of RFC2026. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that other 20 groups may also distribute working documents as Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at http:// 28 www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on April 25, 2004. 35 Copyright Notice 37 Copyright (C) The Internet Society (2003). All Rights Reserved. 39 Abstract 41 This document describes Netlink2, which is an extension of Linux 42 Netlink [RFC3549]. This document is intended as a proposal for the 43 ForCES IETF working group protocol. 45 ForCES attempts to define a clear separation between the two entities 46 of the NE in order to have them evolve separately as opposed to the 47 current monolithic evolution. 49 Conventions used in this document 50 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 51 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 52 document are to be interpreted as described in [RFC2119]. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 57 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . 5 58 3. Netlink2 Overview . . . . . . . . . . . . . . . . . . . . . 6 59 4. Summary of Netlink2 Modifications to Netlink . . . . . . . . 7 60 4.1 Header Modifications . . . . . . . . . . . . . . . . . . . . 7 61 4.2 Addressing and Transport Extensions . . . . . . . . . . . . 8 62 5. Netlink2 Message Format . . . . . . . . . . . . . . . . . . 9 63 5.1 Netlink2 Message Header . . . . . . . . . . . . . . . . . . 9 64 5.2 Type Length Value . . . . . . . . . . . . . . . . . . . . . 13 65 5.3 Encapsulated TLVs . . . . . . . . . . . . . . . . . . . . . 14 66 5.4 Netlink2-extension TLVs . . . . . . . . . . . . . . . . . . 14 67 6. Addressing and Transport Extensions . . . . . . . . . . . . 16 68 6.1 Transport Methods . . . . . . . . . . . . . . . . . . . . . 16 69 6.1.1 Why Multicast? . . . . . . . . . . . . . . . . . . . . . . . 16 70 6.1.2 Why IP? . . . . . . . . . . . . . . . . . . . . . . . . . . 16 71 6.1.3 Why UDP/TCP/SCTP/DCCP? . . . . . . . . . . . . . . . . . . . 17 72 6.2 The Netlink2 wire and bundle . . . . . . . . . . . . . . . . 17 73 6.2.1 What wires go in a bundle? . . . . . . . . . . . . . . . . . 18 74 6.3 Redefining the Netlink PID Semantics . . . . . . . . . . . . 20 75 6.4 Local Scope Addressing and Encapsulation . . . . . . . . . . 21 76 6.5 Global Scope Addressing and Encapsulation . . . . . . . . . 21 77 7. Protocol Architecture . . . . . . . . . . . . . . . . . . . 23 78 7.1 Protocol Phases . . . . . . . . . . . . . . . . . . . . . . 23 79 7.1.1 The Pre-Association Phase . . . . . . . . . . . . . . . . . 23 80 7.1.2 The Association Phase . . . . . . . . . . . . . . . . . . . 23 81 7.1.3 Service Termination . . . . . . . . . . . . . . . . . . . . 24 82 7.2 Protocol Logical Model . . . . . . . . . . . . . . . . . . . 24 83 7.3 Service Addressing . . . . . . . . . . . . . . . . . . . . . 25 84 7.4 Service Templates . . . . . . . . . . . . . . . . . . . . . 26 85 7.5 Mechanisms for Creating Protocols . . . . . . . . . . . . . 26 86 7.5.1 Building Reliable Protocols . . . . . . . . . . . . . . . . 26 87 7.5.2 Building Availability . . . . . . . . . . . . . . . . . . . 27 88 7.5.3 The ACK Netlink2 Message . . . . . . . . . . . . . . . . . . 27 89 7.5.4 Batching . . . . . . . . . . . . . . . . . . . . . . . . . . 28 90 7.5.5 Atomicity and Ordering of Transactions . . . . . . . . . . . 29 91 8. Putting together the base protocol for WG charter . . . . . 30 92 8.1 Netlink2-Extension TLVs . . . . . . . . . . . . . . . . . . 30 93 8.1.1 Authentication . . . . . . . . . . . . . . . . . . . . . . . 30 94 8.1.2 Checksum . . . . . . . . . . . . . . . . . . . . . . . . . . 30 95 8.1.3 Message Priority . . . . . . . . . . . . . . . . . . . . . . 30 96 8.1.4 SYN COOKIE . . . . . . . . . . . . . . . . . . . . . . . . . 31 97 8.1.5 Name ID . . . . . . . . . . . . . . . . . . . . . . . . . . 31 98 8.2 LFB and FE Attributes and discovery . . . . . . . . . . . . 31 99 8.3 NE creation . . . . . . . . . . . . . . . . . . . . . . . . 31 100 8.3.1 FE State transitions . . . . . . . . . . . . . . . . . . . . 32 101 8.3.2 CE view of FE State transitions . . . . . . . . . . . . . . 34 102 8.3.3 SYN Message Format . . . . . . . . . . . . . . . . . . . . . 37 103 8.3.4 FIN Message Format . . . . . . . . . . . . . . . . . . . . . 37 104 8.3.5 NOOP Message Format . . . . . . . . . . . . . . . . . . . . 37 105 8.4 LFB and FE Service Templates . . . . . . . . . . . . . . . . 37 106 8.4.1 Physical Port and Address Functions . . . . . . . . . . . . 38 107 8.4.2 IPv4 and IPv6 L3 Forwarding Functions . . . . . . . . . . . 41 108 8.4.3 Filtering Functions . . . . . . . . . . . . . . . . . . . . 45 109 8.4.4 QoS Functions . . . . . . . . . . . . . . . . . . . . . . . 45 110 8.4.5 IPSEC Functions . . . . . . . . . . . . . . . . . . . . . . 45 111 8.4.6 Packet redirection Functions . . . . . . . . . . . . . . . . 45 112 8.4.7 Packet Mirroring Functions . . . . . . . . . . . . . . . . . 45 113 8.4.8 Packet Sampling Functions . . . . . . . . . . . . . . . . . 45 114 8.5 Security Considerations . . . . . . . . . . . . . . . . . . 45 115 8.5.1 Denial of Service (DoS) attacks . . . . . . . . . . . . . . 46 116 8.5.2 Authentication and Encryption . . . . . . . . . . . . . . . 46 117 References . . . . . . . . . . . . . . . . . . . . . . . . . 47 118 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 49 119 A. Sample Service Hierarchy . . . . . . . . . . . . . . . . . . 50 120 B. Sample Protocol for the foo IP Service . . . . . . . . . . . 52 121 B.1 Interacting with Other IP Services . . . . . . . . . . . . . 52 122 C. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 53 123 Intellectual Property and Copyright Statements . . . . . . . 54 125 1. Introduction 127 The concept of IP control and forwarding separation was first 128 introduced in the early 1980s by the BSD 4.4 routing sockets 129 [Stevens]. The focus at that time was to provide a simple IP(v4) 130 forwarding service and allow the control plane, either via a command 131 line configuration tool or a dynamic route daemon, to control 132 forwarding tables for that IPv4 forwarding service. 134 The IP world has evolved considerably since then. Linux Netlink 135 [RFC3549], when observed from a service provisioning and management 136 point of view, takes routing sockets one step further by breaking the 137 narrow focus on IPv4 forwarding. Since the Linux 2.1 kernel, Netlink 138 has been providing the IP service abstraction for a few additional 139 services other than classical RFC 1812 IPv4 forwarding. 141 Netlink was designed with a goal of solving the forwarding and 142 control separation. This means that many of the main issues have 143 been thought through and resolved over the years. In other words 144 Netlink is proven as a protocol addressing separation of forwarding 145 and control. Netlink is also network-ready because it uses packet 146 formating techniques and concepts (e.g., multicast addressing). This, 147 and the availability of publicly running and tested code which is 148 widely deployed, form a major motivator to base Netlink2 on Netlink. 150 Netlink2 extends Linux Netlink to meet the requirements of the ForCES 151 working group charter for a protocol. Netlink is extended to have a 152 distributed addressing and transport scheme, and missing mechanisms 153 are added to make Netlink2 meet the ForCES protocol requirements 154 [ForCES_REQ]. 156 Netlink2 operates in a mode where knowledge of the NE, its topology, 157 and LFB modeling MAY have already been discovered, or is discovered 158 within the Netlink2 protocol. Netlink2 can operate over a variety of 159 link, network, and transport media. The transport and media includes 160 but is not limited by: 162 o L2 such as Ethernet, ATM, FR, etc, 164 o over bus and I/O interfaces such as PCI, HT, PCI-express, etc 166 o L3 IPV4, IPv6, IPX etc. 168 o L4 and above such as TCP, UDP, SCTP, DCCP 170 In the cases where required mechanisms are missing from the 171 underlying media, they are compensanted for by Netlink2 extensions 172 (refer to Section 8.1) 174 2. Definitions 176 We use the definitions provided in [ForCES_REQ], as well as the 177 following: 179 Logical Functional Block (LFB): same as Forwarding Engine Components 180 as defined in [RFC3549]. This is a forwarding datapath component in 181 the FE driven by the ForCES protocol in order to achieve a certain 182 service. 184 Control Element Component (CPC): same as defined in Control Plane 185 Component in [RFC3549]. This is a component in the CE that drives 186 LFB(s) in order to achieve a certain service. 188 3. Netlink2 Overview 190 A datapath packet processing service accomplished by an FE is 191 represented as a logical functional block (LFB) in the FE. CE 192 components (CPC) in the CE interact with LFBs over Netlink2 wires and 193 bundles (described in Section 6.2) to configure and manage a certain 194 service. The interactions between LFBs and CPCs are specific to each 195 service and are defined using templates as presented in [RFC3549]. 197 The Netlink2 message is used to communicate between the FE and CPC 198 for configuration of LFBs, LFB events to the CPCs, and statistics or 199 config querying/gathering (typically by a CPC). Other activities 200 include transfer of control packets between FE and CPC. 202 Netlink2 messages travel between the CPC and LFB over Netlink2 wires 203 which are part of Netlink2 bundles. Netlink2 wires are abstractions 204 similar to GSMP links [RFC3292], albeit without the limitation to ATM 205 VP:VC, Ethernet link, or TCP connection only. 207 For instance, the IPv4 Forwarding service (called NETLINK_ROUTE) 208 defines a message template for handling IP routes and the message 209 types to insert, remove, or query a route. The routing CPC(s) and 210 the IPv4 Forwarding LFB(s) interact using these message templates and 211 message types over the Netlink2 bundle to execute the IPv4 Forwarding 212 service. 214 The message types in Netlink2 messages allow the FE to demultiplex 215 messages to the appropriate LFB. 217 Messages of a certain service destined to a LFB can travel on 218 different Netlink2 wires within the same bundle 220 Netlink2 by itself constitutes a base ForCES protocol with a set of 221 mechanisms that can be utilized depending on service requirements. 222 For example, for certain messages between the FE and CE, reliability 223 can be enforced at the transaction level by setting the appropriate 224 flags in the Netlink2 message. However, by default, Netlink2 225 transactions are not acknowledged. 227 4. Summary of Netlink2 Modifications to Netlink 229 To conform to the ForCES requirements [ForCES_REQ], the Netlink 230 protocol [RFC3549] is extended in the following respects: 232 1. Base header modifications, and feature expandability extensions 233 by means of optional header TLVs to accommodate current generic 234 ForCES requirements and to make it possible to add more in the 235 future. This facilitates adding such features as authentication, 236 checksumming, etc., when required. 238 2. IP and Transport encapsulations to carry Netlink messages. 240 With these complementary changes to the existing Netlink 241 functionality, Netlink2 fulfills the requirements to become the 242 ForCES protocol. 244 4.1 Header Modifications 246 1. PID field redefinition and addition. 248 In Netlink, PID 0 referred to the equivalent of the FE (kernel). 249 The equivalent of the CE (user process) was referred by its OS 250 process id. 252 In Netlink2, the PID has additional semantics which give it group 253 identity, unicast capability, etc (discussed later in Section 254 6.3). 256 A PID of the unicastPID type is assigned to each FE and CE in the 257 pre-association phase. In this way the CE uniquely identifies 258 the FE and avoids any collision. We maintain the name PID for 259 historical purposes. 261 * Destination PID: the PID field is redefined as the Destination 262 PID field. This field identifies the parties on the wire that 263 must process the message. 265 * Source PID: this field is introduced in the header to identify 266 the source of the message. 268 Different types of PIDs are discussed in Section 6.3. 270 2. The Length field has been reduced to 16 bits, with length 0 being 271 reserved. The rest of the old 32-bit Length field is now split 272 between a new version field and a new extended flags field. 274 3. A Version field is introduced in the Netlink2 header. This 8-bit 275 field is 4 bits major number and 4 bits minor number in the form 276 of major:minor. For Netlink2, this becomes: 0x20. 278 4. A new Extended Flags field is introduced to take over the 279 remaining 8 bits from the 16-bits taken from the original 32-bit 280 Length field in Netlink. Turning different bits on enables 281 additional new features such as proclaiming the presence of 282 extended TLVs, etc. 284 5. Netlink2-extension TLVs follow directly after the Netlink2 base 285 header. They are optional and their purpose is to extend the 286 Netlink2 header. Typical use of Netlink2-specific TLVs is to 287 compensate for capabilities lacking in a underlying transport. 288 For example, in an IP network not deployed with IPSEC, the 289 Netlink2-specific authentication TLV could be used to emulate the 290 features provided by IPSEC-AH. 292 6. There could be more than one IP service configuration template 293 within a Netlink2 message (as opposed to a single service 294 template per netlink message). Implementation experience Section 295 6.3 has shown embedding multiple service templates improves 296 performance of FE configuration. 298 Other than these changes, all mechanisms provided by Netlink are 299 sufficient to meet the requirements for ForCES. The reader is 300 encouraged to refer to [RFC3549] as a companion to this one. 302 4.2 Addressing and Transport Extensions 304 1. Support for UDP/TCP/SCTP/DCCP transport over unicast/multicast IP 305 (Section 6.1). 307 2. Support for bundles (Section 6.2). 309 3. Message recipient scoping using the Destination PID (Section 310 6.3). 312 4. Support for both local scope and global scope addressing (Section 313 6.4 and Section 6.5). 315 5. Netlink2 Message Format 317 There are three levels to a Netlink2 message: The general Netlink2 318 message header which is mandatory, the Netlink2-extension TLV and 319 service Template(s) which are optional. 321 0 1 2 3 322 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 323 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 324 | | 325 | Netlink2 message header | 326 | | 327 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 328 | | 329 | Netlink2-extension TLV (optional) | 330 | | 331 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 332 | | 333 | Service Template(s) (optional) | 334 | | 335 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 337 Implementation studies [Goutaudier] have shown the above data layout 338 to provide easier parsing while allowing for extensibility (via the 339 optional Netlink2-extension TLV) and scalability (allowing for 340 multiple Service templates). 342 The Netlink2 message header is generic for all services and contains 343 the command that describes the rest of the message. 345 The optional Netlink2-extension TLV acts to extend any general 346 missing functionality from the Netlink2 message header. Typically, 347 this would be to allow for compensating for missing underlying 348 transport functionality. 350 The Service template is specific to a service. As mentioned earlier 351 there could be more than one template per Netlink2 message. Each 352 Service template carries configuration parameters or query requests 353 (CPC->LFB direction) or query responses (LFB->CPC direction). In the 354 case of multiple Service templates, then all the templates MUST be 355 used to execute the same command as defined in the Netlink2 message 356 header. In some special cases the Service template is not used. For 357 example in the case of a Netlink2 SYN, FIN or NOOP command. 359 5.1 Netlink2 Message Header 360 Each Netlink2 message contains a byte stream with a Netlink2 header 361 followed by its associated payload. 363 A single PDU may contain more than one Netlink2 message. This is 364 referred to as batching. Netlink batching is reused in Netlink2 and 365 allows for messages with different commands (such as adding routes 366 and deleting a QoS policy) to be carried in the same batch PDU. 368 A Netlink2 message may be split across multiple PDUs if it does not 369 fit into the PDU. This is referred to as a multipart Netlink2 370 message and is also inherited from Netlink. 372 For multipart messages, the first and all following headers have the 373 NLM_F_MULTI Netlink header flag set, except for the last header, 374 which has the Netlink header type NLMSG_DONE. 376 The Netlink2 message header is shown below. 378 0 1 2 3 379 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 380 0 1 2 3 381 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 382 | Version | Flags_E | Length | 383 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 384 | Type | Flags | 385 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 386 | Sequence Number | 387 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 388 | Source PID | 389 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 390 | Destination PID | 391 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 393 The fields in the header are: 395 Version: 8 bits 397 The version field is split into major:minor (4:4 bits) sub- 398 fields. The value for Netlink2 is 0x20. 400 Flags_E: 8 bits 402 These are extended flags: 404 NLM_F_PRIO: Message priority: 1 for high and 0 for low. 405 Additional QoS level set in QoS TLV. 407 NLM_F_ASTR: Set the ACK strategy: 1 for partial ACKs and 0 for 408 full ACKs 410 NLM_F_MS: Multiple Service templates are present when this flag 411 is set to 1 413 NLM_F_EXT: If this flag is set, it implies presence of the 414 extended optional TLVs 416 Length: 16 bits 418 The length of the Netlink2 message in bytes including the header. 420 Type: 16 bits 422 This field describes the message content. It can be one of the 423 standard message types: 425 NLMSG_NOOP: message is not executed on LFnot executed on LFB 427 NLMSG_ERROR the message signals an error and the payload 428 contains a nlmsgerr structure. This can be looked at as a NACK 429 and typically it is from LFB to CPC. 431 NLMSG_DONE: message terminates a multipart message 433 NLMSG_SYN: Sent on the first message. Interpreted as a boot 434 message of the sender. 436 NLMSG_FIN: Sent on the last message. Interpreted as a shutdown 437 message of the sender. 439 Typically, services specify more message types centered around 440 transactional operations of adding, deleting or querying a 441 command. For example, the NETLINK_ROUTE Service specifies several 442 types for manipulating IPv4 or IPv6 routes such as RTM_NEWROUTE, 443 RTM_DELROUTE, etc. 445 Flags: 16 bits 447 The standard flag bits used in Netlink are: 449 NLM_F_REQUEST: Must be set on all request messages (typically 450 from CE to FE) 452 NLM_F_MULTI: Indicates the message is part of a multipart 453 message terminated by NLMSG_DONE 454 NLM_F_ACK: Request for an acknowledgment on success. Typical 455 direction of request is from CPC to LFB. 457 NLM_F_ECHO: Echo this request. Typical direction of request is 458 from CPC to LFB. 460 Additional flag bits for GET requests on config information in the 461 LFB: 463 NLM_F_ROOT: Return the complete table instead of a single 464 entry. 466 NLM_F_MATCH: Return all matching criteria passed in message 467 content 469 NLM_F_ATOMIC: This is an atomic or part of an atomic operation 470 (such as two-phase commit). 472 Convenience macros for flag bits: 474 NLM_F_DUMP: This is NLM_F_ROOT or'ed with NLM_F_MATCH 476 Additional flag bits for NEW requests: 478 NLM_F_REPLACE: Replace existing matching config object with 479 this request. 481 NLM_F_EXCL: Do not replace the config object if it already 482 exists. 484 NLM_F_CREATE: Create config object if it does not already 485 exist. 487 NLM_F_APPEND: Add to the end of the object list. 489 For readers familiar with BSDish use of such operations in route 490 sockets, the equivalent translations are: 492 * BSD ADD operation equates to NLM_F_CREATE or-ed with NLM_F_EXCL 494 * BSD CHANGE operation equates NLM_F_REPLACE 496 * BSD Check operation equates NLM_F_EXCL 498 * BSD APPEND equivalent is actually mapped to NLM_F_CREATE 500 Sequence Number: 32 bits 501 The sequence number of the message. 503 Source PID: 32 bits 505 The PID of the sender of the message (unicast or logical PID). 507 Destination PID: 32 bits 509 The PID of the destination of the message (unicast, logical, or 510 broadcast PID). 512 5.2 Type Length Value 514 0 1 2 3 515 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 516 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 517 | TLV Type | variable TLV Length | 518 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 519 | Value (Data of size TLV length) | 520 ~ ~ 521 ~ ~ 522 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 524 TLV Type: 526 The TLV type field is two octets, and indicates the type of data 527 encapsulated within the TLV. 529 TLV Length: 531 The TLV Length field is two octets, and indicates the length of 532 this TLV including the TLV Type, TLV Length, and the TLV data. 534 TLV Value: 536 The TLV Value field carries the data. For extensibility, the TLV 537 Value may be a TLV. In fact, this is the case with the 538 Netlink2-extension TLV. The Value encapsulated within a TLV is 539 dependent of the attribute being configured and is opaque to 540 Netlink2 and therefore is not restricted to any particular type 541 (example could be ascii strings such as XML, or OIDs etc). 543 TLVs must be 32 bit aligned. 545 5.3 Encapsulated TLVs 547 TLV values can be other TLVs. This gives the flexibility of being 548 able to add new attributes when needed. This is important for a 549 protocol such as ForCES for which attributes are expected to vary 550 over a wide range of configurable blocks (CEs, FES, LFBs, etc). 552 Note that Encapsulated TLVs could be viewed as abstractions that 553 represent dynamic lists of attributes 555 5.4 Netlink2-extension TLVs 557 The Netlink2-Extension and Service TLVs are Encapsulated TLVs. They 558 contain their respective TLVs as appropriate in the message being 559 sent. 561 0 1 2 3 562 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 563 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 564 | Outer TLV Type | Outer TLV Length | 565 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 566 | Inner TLV1 Type | Inner TLV1 Length | 567 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 568 ~ ~ 569 ~ ~~~~~~~~~~~~~~ VALUE1 ~~~~~~~~~~~~~~~~~~~~~~ ~ 570 ~ ~ 571 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 572 ~ ~ 573 ~ ~ 574 ~ ~ 575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 576 | Inner TLVn Type | Inner TLVn Length | 577 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 578 ~ ~ 579 ~ ~~~~~~~~~~~~~~ VALUEn ~~~~~~~~~~~~~~~~~~~~~~ ~ 580 ~ ~ 581 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 583 Outer TLV Type: 585 This is set to NL2_OPTIONS(0) to indicate the TLV is the 586 Netlink2-Extension TLV. The rest of the possible value types are 587 reserved for future use. 589 Outer TLV Length: 591 The Outer TLV Length is the length of everything within the TLV 592 including the Outer TLV Type field , Outer TLV Length, and all the 593 encapsulated TLVs which are treated as the the Outer TLV Value. 595 Outer TLV Value: 597 The Outer TLV Value is all the inner TLVs. The figure above shows 598 an outer TLV with n inner TLVs. 600 Inner TLV type, Length, Value: 602 These are all just normal TLVs. No assumption is made about their 603 data contents. 605 6. Addressing and Transport Extensions 607 We extend Netlink to make it distributed. The focus is on making 608 Netlink2 have a strong local scope view of the world while fitting 609 well into a global scope when the hop distance between the FE and CE 610 increases. 612 If the network interconnecting the FE(s) and CE(s) is completely 613 hidden from the outside (black-box view), for instance an internal 614 Ethernet segment or a switching fabric in which CE(s) and FE(s) are 615 connected within physical proximity, then communications between FE 616 and CE are assumed to be of a local scope. On the other hand, if 617 communications between FE and CE cross several hops of the network 618 then the scope is considered global 620 6.1 Transport Methods 622 The ideal environment for Netlink2 is considered to be a 623 multicast-capable medium with IP above it and with UDP/TCP/SCTP/DCCP 624 running over IP. 626 On the other hand, Netlink2 is also capable of running directly over 627 L2 (Ethernet for example). 629 In the case of non-IP, non-multicast-capable environment, extra 630 processing and messaging by the ForCES layer to compensate for 631 services that IP already offers would be needed (eg security, quality 632 of service, fragmentation, etc if underlying transport does not have 633 it). 635 6.1.1 Why Multicast? 637 Multicast is considered important to facilitate one-to-many/some 638 communication. For example, a single command from a CE can be 639 multicast to multiple FEs, which eases the scalability requirements 640 mentioned in [ForCES_REQ]. This is discussed in later sections. 642 When running Netlink2 over non-multicast-capable media, it is 643 expected that mechanisms similar to those used in OSPF NBMA [RFC2328] 644 networks will be put in place. 646 6.1.2 Why IP? 648 IP runs on virtually every link layer. Leveraging this fact alone 649 helps deploying the protocol wider and faster. 651 IP also provides numerous services such as fragmentation and 652 reassembly, prioritization, and security, which are inherent 653 requirements for the ForCES protocol. This means that to 654 successfully run an alternative to IP requires that similar services 655 be provided by whatever is underneath in order to meet the 656 requirements. 658 Netlink2-specific optional TLVs can be used to compensate for lacking 659 functionality if running on a network transport other than IP or 660 directly on the link layer. 662 Netlink already allows the definition of multipart messages with IP 663 segmenting/reassembling when the path MTU is exceeded. When running 664 on top of non-IP media, the Netlink2 message can be limited to not 665 exceed the MTU; the multipart messages facility can be then be used 666 to provide framing for segmenting/reassembling. 668 The Netlink2-specific Authentication TLV can be used to carry 669 authentication signatures over a transport that does not have this 670 capability. 672 The Netlink2-specific Checksum TLV can be used to carry checksums 673 over a medium that does not have this capability. 675 The Netlink2-specific Message Priority TLV can be used to carry 676 prioritization if transports are not capable of making priorities in 677 their headers. 679 6.1.3 Why UDP/TCP/SCTP/DCCP? 681 On a local scope, it is assumed that multicast UDP over IP is the 682 preferred mode of operation. 684 On a global scope it is expected that TCP or SCTP would be used for 685 enhanced reliability and Internet congestion friendliness. 687 All mentioned protocols provide 16-bit ports, which are further 688 address-demultiplexing points. Also, all three protocols provide 689 checksum capability to enhance integrity of the Netlink2 message. In 690 the case of UDP, the checksum is optional (which fits the model that 691 the local scope is less error-prone than global scope and hence the 692 integrity check could be turned on only when needed). 694 6.2 The Netlink2 wire and bundle 696 A Netlink2 wire displays the same behavior as a Netlink wire. It 697 interconnects FEs and CEs in order to support services they jointly 698 offer. 700 The only conceptual difference between a Netlink2 wire and a Netlink 701 wire is that whereas the Netlink wire is localized, the Netlink2 wire 702 is distributed. 704 We also introduce the concept of a Netlink2 bundle. A Netlink2 705 bundle interconnects a set of FE(s) and/or CE(s) by means of one or 706 more Netlink2 wires. Note that a Netlink2 bundle does not 707 necessarily mean a full-mesh interconnection (see examples later on). 709 Parties (FEs and CEs) on a Netlink2 bundle share a common 710 configuration, provisioning and event-notification end goals. 712 A Netlink2 wire MAY be constructed using a multicast connection or a 713 unicast connection or a multiple number of multicast and unicast 714 connections. A wire MUST belong to only one bundle. A bundle may 715 have only a single wire (unicast or multicast). In most cases we 716 believe there will only be one multicast address for a bundle, 717 although scalability issues could require the use of unicast 718 connections in addition. 720 When a multicast IP address is used, a Netlink2 wire MUST run over 721 UDP - a UDP port is used to uniquely identify the wire. There MAY be 722 multiple wires using the same multicast address as long as they run 723 over different UDP ports. 725 When a unicast IP address is used, the description of how to connect 726 to an endpoint (CE/FE) is subject to the agreement between the CE and 727 FE. The connection could be directly over IP (Note: need an IP 728 protocol number) or via transport-layer ports (TCP/UDP/SCTP/DCCP). 730 In both unicast and multicast wires, the necessary parameters (such 731 as IP address and port numbers) can be discovered by the involvement 732 of the FE and CE Managers. 734 6.2.1 What wires go in a bundle? 736 Netlink2 provides flexibility to have a bundle of purely unicast 737 wires or multicast wires or a hybrid of both. The decision of what 738 goes into a bundle can be made in the pre-association phase. 740 A good analogy is to think of a multicast wire as a broadcast link 741 (as is done in Netlink) in which CE(s) and FE(s) are parties attached 742 to that broadcast link. 744 Depending on the number of FEs and CEs on an NE, a choice of a single 745 multicast wire in the bundle may be sufficient. Multicast allows 746 one-to-some messagging. A single message sent by an originator is 747 seen by all parties on the wire. This simplifies synchronization in 748 an HA environment as well as implementation of the protocol. 750 The fact that multicast messages are seen by all parties could cause 751 scalability issues as the number of nodes grows. Parties need to 752 filter out messages not destined to them. This can take compute or 753 table resources if filtering is done in hardware. The extra messages 754 also consume unnecessary bandwidth for FE(s) and CE(s) not interested 755 in seeing these messages. 757 Unicast wires could be used to create point-to-point connections 758 between the parties; when every party is connected to every other 759 party, then this becomes a full mesh. 761 A full unicast mesh topology removes the need to filter the 762 unnecessary messages but introduces scalability concerns as the 763 number of connections required grows quadratically with the number of 764 parties (FEs and CEs) present. This requires a lot more compute and 765 state information to be maintained at each party. A pure mesh 766 topology also complicates HA because more state must be maintained 767 (for instance, the IP addresses of the CEs and FEs that are active 768 and what their backups are) and therefore needs to perform extra 769 processing to achieve failover. This becomes transparent if 770 multicast is used among all parties. 772 Netlink2 allows a bundle to have a hybrid of unicast and multicast 773 connections. Note this is a model used by other protocols such as 774 OSPF over broadcast links where the Hello protocol is multicast but 775 responses to LSA updates are unicasted. 777 We present some examples of Netlink2 bundles: 779 1. A trivial case is a Netlink2 bundle consisting of a single 780 unicast wire between the CE and FE it interconnects. 782 2. Multiple FEs and a CE could be interconnected with a Netlink2 783 bundle using a single multicast connection. 785 3. In the same example as 2) above, the unicast address of the CE 786 could in addition also be used, for instance, to deliver 787 acknowledgments or notifications from the FEs to the CE, and not 788 be seen by all other FEs. The unicast addresses of the FEs could 789 also be used, for instance, to deliver certain messages only to a 790 specific FE, such as a retransmission of a message in a two-phase 791 commit only to an FE that did not respond. 793 4. Multiple FEs and CEs could use a wire with two multicast 794 connections: one for all FEs, the other for all CEs, so that 795 messages only relevant to FEs are not seen by CEs and vice-versa. 797 6.3 Redefining the Netlink PID Semantics 799 We maintain the name PID for historical purposes and introduce a 800 Destination PID and a Source PID as mentioned earlier. 802 For every message received by each party on the wire, the destination 803 PID field indicates the recipient of the message. The addressed 804 party could be either a FE or a CE, respectively a LFB or a CPC. 806 In addition to Netlink2 wires (unicast or multicast) defining the 807 destination of a particular message delivered, the PID types provide 808 further control, namely to define which entity actually has to 809 process the message. So if the bundle uses only a single multicast 810 wire, messages will be heard by all parties on the wire, but only 811 those with a matching PID will actually process these messages. We 812 introduce special- purpose PIDs addressed to specific listeners on 813 the wire. 815 The following types of PIDs are defined and can be used in the 816 Netlink2 messages. The actual values for the PID of a FE or CE must 817 be the same across all wires of the same bundle and must be 818 established during the pre-association phase. 820 Default values are given. PIDs must be unique within a Netlink2 821 wire. They may also be unique within the NE. PIDs are subdivided into 822 two 16-bit subfields named wire and party in the form wire:party. 824 1. unicastPID: allows one to uniquely address a FE or CE. Each FE/ 825 CE must have such a unicast PID. Only the FE or CE assigned to 826 this PID must process an incoming message with such a Destination 827 PID. Other parties MAY silently discard the message. The wire 828 subfield is a unique identifier of the FE or CE. The party 829 subfield acts as a port number: it can for instance be used to 830 further demultiplex a message to the appropriate process in a CE 831 (CPC) or the appropriate LFB in an FE. 833 Default value: none. 835 2. logicalPID: in addition to unicastPID, a FE/CE MAY have zero or 836 more logical PIDs assigned to it. A logicalPID can be used for 837 active-backup pairs of FEs: for instance, the active and the 838 backup FE have the same logical PID or at least the same wire 839 subfield. The wire subfield is an identifier of the group of FEs 840 and/or CEs participating in the group. Pre-association 841 configuration ensures that the same party identifier is not 842 assigned twice to different CPCs or LFBs on the same wire. 844 Default value: none. 846 3. broadcastPID: all parties on all wires must process an incoming 847 message with such a Destination PID. An example of a message 848 that might be broadcast is when a CE is brought down for 849 maintenance. 851 Default value: 0xffffffff 853 4. FEbroadcastPID: all FEs on all wires must process an incoming 854 message with such a Destination PID. Typically a route update 855 from the CE to all FEs. Other parties (CEs) can silently discard 856 the message. 858 Default value: 0xffffefff 860 5. CEbroadcastPID: all CEs on all wires must process an incoming 861 message with such a Destination PID. Other parties (FEs) can 862 silently discard the message. 864 Default value: 0xffffdfff 866 A Netlink2 message must have as Destination PID one of the PIDs types 867 defined above. The Source PID of a Netlink message must be of the 868 unicastPID or logicalPID type. In addition, if the NLM_F_ACK flag is 869 set, then every party processing the message MUST reply with an 870 acknowledgment after processing the message, unless the NLM_F_ASTR 871 flag is used to prevent ACK implosion. 873 Pre-configured translation tables can be used to map a given PID into 874 the underlying wire in a bundle, i.e., an IP unicast or multicast 875 address. 877 6.4 Local Scope Addressing and Encapsulation 879 At a local scope, the preferred addressing used for a wire is a UDP 880 port on top of a multicast IP address. 882 Multiple wires can run on one multicast address with further 883 demultiplex level based on the UDP port. 885 The wire addressing parameters MAY be discovered during the 886 pre-association phase. 888 6.5 Global Scope Addressing and Encapsulation 890 When addressing a non-local scope the Netlink2 message is 891 encapsulated over a transport header and shuttled to the remote end 892 where it is decapsulated and run as if originating from the local 893 scope of that remote end. The global scope addressing could use any 894 transport protocol configured (SCTP, UDP, TCP or DCCP) as agreed upon 895 in the pre-association phase. 897 This can be viewed as extensions of the local scope wires. 899 7. Protocol Architecture 901 7.1 Protocol Phases 903 ForCES in relation to NEs involves three phases: the Pre-Association 904 phase, the association phase where the ForCES protocol operates, and 905 a termination phase where a party in the relationship leaves a 906 bundle. 908 7.1.1 The Pre-Association Phase 910 In a simple setup, this phase is static. All the parameters for the 911 association phase are well known (example multicast groups for each 912 Netlink2 wire in a bundle, etc.). 914 Vendors may use their own proprietary service discovery protocol. As 915 minimum, we assume a static configuration. In fact, although ForCES 916 mandates a minimal set of capability discovery, Netlink2 will also 917 operate in a mode where such capability discovery is done in 918 pre-association phase. In that case, the FE Manager and the CE 919 Manager agree on all the parameters and clearly articulate topology 920 and other information to each other in the pre-association phase. 922 On completion of the Service Discovery phase, the FEM will have 923 established contact with the appropriate CEM component. 924 Initialization and Authentication will be complete at this point. 925 Both the FE and CE know how to connect to each other for 926 configuration, accounting, identification and authentication 927 purposes. Both sides are also knowledgeable of all necessary 928 protocol parameters such as timers, etc. All capabilities may also 929 have been discovered at this point. 931 7.1.2 The Association Phase 933 In this phase, the FE and CE components cooperate to deliver the IP 934 service. The CE component might be registered (in the pre-association 935 phase) to receive FE-specific services (such as link events). 936 Essentially, in this phase, the service is provisioned and executing. 937 The FE component might continuously get updates from the control 938 plane component on how to operate the service (for example, the IPv4 939 forwarding route additions or deletions). 941 The association phase is where Netlink2 operates as the ForCES 942 protocol. 944 On startup, the FE connects to the bundle(s) to which the CE is 945 connected, using procedure defined in Section 8.3.1. The controlling 946 CE will either admit the FE into the NE or reject it. 948 Once granted access into the NE, the FE is continously updated or 949 queried. The FE may also send async event notifications to the CE. 950 This continues until a termination is initiated by either the CE or 951 FE. 953 7.1.3 Service Termination 955 Service termination could be issued by either component of the 956 service abstraction. 958 FE or the CE initiating the termination will issue a FIN command 960 7.2 Protocol Logical Model 962 In the diagram below we show a simple LFB-CPC logical relationship. 963 We use the IPv4 Forwarding LFB as an example. 965 CE----------------------------------- 966 | /^^^^^\ /^^^^^\ | 967 | | | / CPC-2 \ | 968 | | CPC-1 | | COPS | | 969 | | ospfd | | PEP | | 970 | \ / \_____/ | 971 | \_____/ | | 972 | | | | 973 ****************************************| 974 ************* NETLINK2 BUNDLE *********** 975 FE---------- *****************************************. 976 | IPv4 Forwarding| | | | 977 | LFBs | | | | 978 | --------------/ ----|-----------|-------- | 979 | | / | | | | 980 | | .-------. .-------. .------. | | 981 | | |ingress| | IPv4 | |Egress| | | 982 | | |police | |Forward| | QoS | | | 983 | | |_______| |_______| |Sched | | | 984 | | ------ | | 985 | --------------------------------------- | 986 | | 987 ----------------------------------------------------- 989 Netlink2 logically models LFBs and CPCs in the form of service blocks 990 interconnected to each other via a Netlink2 bundle. 992 Acknowledgements and responses to messages do not have to be sent 993 onto the same wire from which the triggering messages came from but 994 MUST be sent on the same bundle to the same originating PID. For 995 instance, a wire interconnecting a CE with multiple FEs using a 996 multicast address could be used to send route updates from the CE. 997 On the other hand, independent unicast wires from each FE to the CE 998 could be used to send back route events or acknowledgments. Note 999 that sequencing is done per wire and Source PID, and ACKs can travel 1000 back on any wire of a bundle. 1002 The Netlink2 wire can be shared or be specific to a service. There 1003 can be multiple Netlink2 wires bundled in a bundle carrying messages 1004 of the same service. In order to reduce (for example to avoid extra 1005 processing) or restrict the messaging accessible for partitioning or 1006 security reasons, additional Netlink2 wires can be used. A possible 1007 partitioning is a Netlink2 bundle per service. In the example above 1008 the IPv4 Forwarding LFB would be considered a service. 1010 Assuming capabilities have been discovered during the pre-association 1011 phase (between the FEM and CEM), blocks (CPCs or LFBs as illustrated 1012 above) connect to the agreed wires on the Netlink2 bundle, and listen 1013 to receive specific messages. CPCs may connect to multiple Netlink2 1014 wires if it helps them to control the service better. All blocks 1015 (CPCs and LFBs) dump packets on the Netlink2 wires. 1017 LFBs or CPCs join Netlink2 wires and listen to messages of interest 1018 for processing or monitoring purposes. 1020 All messages addressed to the LFB (for example the IPv4 forwarding 1021 LFB illustrated above) will have the FE PID agreed upon by both the 1022 CE and the FE at the pre-association phase. 1024 LFBs (as well as CPCs) also process messages with the broadcast PIDs. 1025 They may also process messages destined to other LFBs (as well as 1026 CPCs) for availability synchronization purposes. 1028 A further demultiplexing point is the command type in the Netlink2 1029 message. Each of the LFBs (e.g., the ingress police LFB above) knows 1030 how to respond to a specific command-set as defined by the Netlink2 1031 message type. 1033 7.3 Service Addressing 1035 Connecting to a service is achieved by connecting to a defined 1036 Netlink2 bundle by both the CPC and LFB. This Netlink2 bundle is 1037 derived in the pre-association phase. 1039 A service would typically be constrained to a specific Netlink2 1040 bundle. 1042 Connecting to a service is followed (at any point during the lifetime 1043 of the connection) by either issuing a service-specific command 1044 mostly for configuration purposes (from the CPC to the LFB) or for 1045 statistics collection. The LFB could also send event announcements to 1046 the CPC or respond or queries issued by the CPC. 1048 7.4 Service Templates 1050 LFBs throw events and are configured and queried by using service 1051 templates. 1053 Refer to the Netlink document [RFC3549] as well as Section 8.4 for 1054 the different templates used for different LFBs that fit within the 1055 current scope of the ForCES charter. 1057 7.5 Mechanisms for Creating Protocols 1059 Mechanisms for reliable or non-reliable protocols creation are 1060 provided. In addition, mechanisms for facilitating availability are 1061 embedded in Netlink2. 1063 7.5.1 Building Reliable Protocols 1065 By default the Netlink2 header flags NLM_F_PRIO and NLM_F_ACK are not 1066 set so that Netlink2 messages are sent with a lower priority and do 1067 not require acknowledgements. 1069 One could create a reliable protocol between an LFB and a CPC by 1070 using the combination of sequence numbers, ACKs and retransmit 1071 timers. Both sequence numbers and ACKs are provided by Netlink2. 1072 Timers are provided by the operating system or hardware. 1074 Prioritization is an orthogonal mechanism to reliability. When a 1075 node runs out of resources, a message sent with a higher priority 1076 will get preferential treatment. For instance, if a FE has only 1077 enough memory to allocate one message in response to a message from 1078 the CE and it has to choose between one of two messages to respond 1079 to, then it will use that memory for the request which was sent with 1080 the higher priority. This also applies to other resources such as 1081 computing cycles and bandwidth. In other words, the NLM_F_PRIO is 1082 more than only the classical bandwidth prioritization of packets on a 1083 link. 1085 Another orthogonal mechanism provided by Netlink2 is the ACK strategy 1086 which is selected by the NLM_F_ASTR flag. 1088 We define two types of acknowledgement strategies: 1090 1. partial ACKs (using multicast ACK slotting and damping techniques 1091 [XTP]): receivers multicast an ACK after a random time if they 1092 have not yet seen an ACK sent by another receiver. This limits 1093 the number of ACKs returned to the source of the message and 1094 improves performance. For messages which a CE sends to a group of 1095 FEs partial ACKs imply that anyone of the FEs generating an ACK 1096 back is sufficient to deem the message was delivered. 1098 2. full ACKs: each receiver sends an ACK back to the source. This 1099 allows the source to immediately detect problems with receivers. 1100 In two-phase commits it is important that all FEs respond so that 1101 the full ACKs strategy should be used. 1103 7.5.2 Building Availability 1105 A protocol component or an application could passively listen to 1106 Netlink2 commands and events within one or several Netlink2 wires. 1107 Doing so allows a very simple way of building complex applications 1108 which are aware of all service components that affect them for HA 1109 reasons. 1111 To ensure transparent CE or FE redundancy for certain services, it is 1112 sufficient to ensure that the backup CPC/LFB is always attached to 1113 the same wires to which the active CPC/LFB is attached, so that the 1114 backup CPC/LFB receives all messages destined to the active CPC/LFB 1115 (whatever PID they are sent to) as well as all messages originating 1116 from the active CPC/LFB. 1118 One could create a heartbeat protocol between the LFB and CPC by 1119 using the ECHO flags and the NLMSG_NOOP message(Section 8.3.5). The 1120 heartbeat, in addition to listening to FE or CE events, could be used 1121 to facilitate takeover. 1123 This topic is beyond the scope of ForCES and will not be discussed 1124 further here. Note, however, that Netlink2 has the mechanisms 1125 required to enable this when required. 1127 7.5.3 The ACK Netlink2 Message 1129 This message is actually used to denote both an ACK and a NACK. 1130 Typically the direction is from LFB to CPC (in response to an ACK 1131 request message). However, CPC should be able to send ACKs back to 1132 LFB when requested. The semantics for this are IP service specific. 1134 0 1 2 3 1135 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1136 0 1 2 3 1137 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1138 | Netlink2 message header | 1139 | type = NLMSG_ERROR | 1140 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1141 | error code | 1142 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1143 | OLD Netlink2 message header | 1144 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1146 Error code: integer (typically 32 bits) 1148 An error code of zero indicates that the message is an ACK response. 1149 An ACK response message contains the original Netlink2 message header 1150 that can be used to compare against (sent sequence numbers, etc). 1152 A non-zero error code message is equivalent to a Negative ACK (NACK). 1153 In such a situation, the Netlink2 data that was sent down to the 1154 kernel is returned appended to the original Netlink2 message header. 1156 7.5.4 Batching 1158 As mentioned earlier (repeated here for clarity) Standard Netlink 1159 multi-message batching looks as follows: 1161 NLMSG:NLMSG:NLMSG.... 1163 where NLMSG is a Netlink2 header and its associated payload. 1165 This has the advantage of allowing inter-mixing of multiple commands 1166 (example adds/deletes) generally in a request from CE->FE. It is also 1167 useful for batching multiple events from the FE->CE. 1169 Additionally, studies from [Goutaudier] have motivated batching of 1170 Service Templates within a single Netlink2 messages. Recall, a 1171 Netlink2 message looks like: 1173 NLMSGHDR:OET:ST 1175 where NLMSGHDR is a Netlink2 header, OET is the optional extension 1176 TLVs and ST is the service template. 1178 The template extension now looks like: 1180 NLMSGHDR:OET:ST:ST:ST..... 1182 In other words there are multiple service templates that can fit 1183 within the same message. There are caveats with such a batching 1184 scheme since only one ACK may be sent for a whole batch, it implies 1185 that it is difficult to know which service configuration failed. In a 1186 close proximity, low error rate link batching in this mode should 1187 allow for high throughputs for configurations while reducing the 1188 number of ACKs back. 1190 7.5.5 Atomicity and Ordering of Transactions 1192 In a two-phase commit messages are bound into a relationship. The 1193 first and all following headers have the NLM_F_MULTI Netlink2 header 1194 flag set, except for the last header, which has the Netlink2 header 1195 type NLMSG_DONE. Typically, in netlink, the NLMSG_DONE shows up in 1196 separate PDUs to define a commit. 1198 Atomicity of a transaction including that of a batch is achieved by 1199 using the NLM_F_ATOMIC flag. Use of the NLM_F_ATOMIC is expensive 1200 because it may necessitate the locking of access to tables (depending 1201 on the implementation. 1203 8. Putting together the base protocol for WG charter 1205 The design approach taken for Netlink2 protocol is to avoid over 1206 featuring the protocol and focus on the requirements under the 1207 current WG charter. Although Netlink2 could be used for CE-CE or 1208 FE-FE communication this is not discussed in this document to avoid 1209 complexity. Additionaly although Netlink2 provides the minimal 1210 required attribute discovery, it will work with existing proprietary 1211 or open protocols which exist to discover such attributes. 1213 8.1 Netlink2-Extension TLVs 1215 Netlink2-Extension TLVs are mostly used to compensate for the 1216 underlying transport not having mechanisms needed by Netlink2. 1218 8.1.1 Authentication 1220 [TBD] 1222 8.1.2 Checksum 1224 0 1 2 3 1225 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1226 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1227 | TLV Type = NL2_CSUM | TLV Length = 2 | 1228 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1229 | Checksum (16 bits) | Alignment Padding (16 bits) | 1230 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1232 This TLV is optional. To compute the correct checksum, an 1233 implementation MUST add the optional checksum TLV to the Netlink2 1234 message with the initial checksum value of 0 and compute the checksum 1235 over such a Netlink2 message. Refer to [RFC3358] for details on the 1236 Checksum TLV. 1238 8.1.3 Message Priority 1240 0 1 2 3 1241 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1242 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1243 | TLV Type = NL2_MPRIO | TLV Length = 2 | 1244 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1245 | Checksum (16 bits) | Alignment Padding (16 bits) | 1246 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1248 This TLV is optional. It is used if the network does not support 1249 prioritization. This field is used to indicate priorities to the 1250 remote end. 1252 8.1.4 SYN COOKIE 1254 TBF 1256 TLV_TYPE = NL2_COOKIE. 1258 8.1.5 Name ID 1260 0 1 2 3 1261 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1262 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1263 | Type = NL2_NAMEID | TLV Length = variable | 1264 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1265 | size of name | 1266 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1268 This TLV is optional. It is used to identify a name that a CE or FE 1269 wishes to be known as. Typically exchanged with SYN messages. 1271 8.2 LFB and FE Attributes and discovery 1273 In the association phase the CE queries the FE to determine its 1274 capabilities. These may include the FE-FE topology, the initial LFB 1275 topology for the FE, constraints on how the LFB topology can be 1276 modified (if possible), etc. A schema for representing FE and LFB 1277 attributes and capabilities is being defined in [ForCES_Model]. 1278 Appropriate Netlink2 TLVs will be defined to convey the identified 1279 parameters as the model work progresses. 1281 8.3 NE creation 1283 The FE and CE Managers communicate to decide communication parameters 1284 and rules that are to be used in the transaction between the CE and 1285 FE. 1287 Using the agreed on parameters, the FE attempts to join the NE. The 1288 CE may reject the FE or allow it to join. The FE then communicates to 1289 the FEM to inform it of the decision. Note that we do not discuss the 1290 FE-FEM or CE-CEM interfaces in this document as it is beyond the 1291 scope of ForCES. 1293 8.3.1 FE State transitions 1295 SYN retran. 1296 .-->-. 1297 ^ Y 1298 | | 1299 ^ Y 1300 \ Y 1301 send SYN +---------+ recvd SYN|ACK 1302 +--->----->----->---------->|SYN_SENT |---->>>----+ 1303 | +------<---------<----| | Y 1304 +------+--+ | recvd NACK or | state | +--------+ 1305 | INIT |<-+ max retransmit +---------+ | EST | 1306 | | | State | 1307 | State |<-+ +---------+ | | 1308 +---------+ | recvd FIN|ACK |FIN_SENT | +--------+ 1309 ^ +----<---<----------<-| | Y Y 1310 | | State |--<-<--+ | 1311 ^ +---------+ Send FIN Y 1312 | ^ Y | 1313 | | | | 1314 | +-<--+ | 1315 | FIN | 1316 | retrans | 1317 | | 1318 | recvd FIN|ACK or recvd SYN broadcast Y 1319 +-<---<---------<-------<---------<-------<-------------+ 1321 INIT state: 1323 When the FE is started (by FE manager or otherwise) it goes into the 1324 INIT state. At this point the FE has been informed by the FE Manager 1325 of the following (based on current implementation): 1327 o the bundle to join, 1329 o its PID, 1331 o the PID of the CE, 1332 o the number of retries for the SYN transmission and the SYN timer, 1334 o and the number of retries for the FIN transmission and the FIN 1335 timer value. 1337 The FE Manager would also instruct the FE to be either active or 1338 passive. Although this is beyond Forces charter, the active/passive 1339 setup description is introduced here to describe one way to achieve 1340 redundancy. Netlink2 does not mandate how redundancy is achieved. 1341 Netlink2 imposes that FE redundancy is the role of the FE plane as 1342 such netlink2 is designed so that the CE has no knowledge of FE 1343 redundancy. This greatly simplifies the protocol. 1345 After internal initialization, the FE sends a SYN message with the 1346 ACK flag on. The message will contain Netlink2-extension TLV of type 1347 NL2_NAMEID. The NL2_NAMEID TLV will contain the name the FE wishes to 1348 be known as. The FE then enters the SYN_SENT state. 1350 A FE could passively monitor the state of one or more FEs and 1351 synchronizes their state and communication data with the CE. The end 1352 goal of a passive FE is to act as a backup for the FE whose 1353 activities it is monitoring. The monitoring is trivial to achieve if 1354 multicast is used. The synchronization may also happen via a FE-FE 1355 protocol or via the FE Manager. A passive FE may be called on by the 1356 FE manager to take over the functionality of the FE it is monitoring. 1358 SYN_SENT state: 1360 The FE fires the SYN timer and waits for a response from the CE. Two 1361 events could happen: 1363 1. The timer expires. If the number of retries has not reached the 1364 maximum allowed value, then the SYN is retransmitted and timer 1365 restarted. If the maximum number of retries has been reached with 1366 the last SYN transmission then the FE notifies the FE manager and 1367 goes into INIT state. 1369 2. a packet is received from the CE: 1371 * A NACK packet to the sent SYN packet. Action: cancel the 1372 timer, inform the FE manager on the rejection reasons and go 1373 into INIT state. 1375 * an ACK packet to the sent SYN packet. Action: update the FE 1376 manager and go into EST state. 1378 EST state: 1380 This is the established state where normal Forces communication 1381 starts. 1383 Several events may force the FE to transition out of the EST state: 1385 1. the FE manager requests it to. In this case the FE will issue a 1386 FIN with an ACK request to the CE and transition to the FIN_SENT 1387 state. 1389 2. The CE asks it to leave. This is considered a reset of the FE. 1390 The FE receives a FIN from the CE to inform it to leave. The FE 1391 immediately informs the FE manager, sends a FIN and goes into 1392 INIT state. 1394 3. The CE restarts and sends a broadcast SYN. This may be caused by 1395 either the CE manager restarting the CE to clear its state or a 1396 result of the CE dying and being restarted. Control of restarting 1397 of the CE and association to the CE manager is out of scope for 1398 ForCES. Upon receiving the broadcast SYN, the FE assumes the CE 1399 has no knowledge of any state the FE is in and transits into the 1400 INIT state after informing the FE manager. 1402 Additionaly not discussed here are optional heartbeats from the CE to 1403 FE. If the CE doesnt see heartbeats after a timeout period then the 1404 transition to the INIT state will be made. 1406 FIN_SENT state: 1408 Two events could happen: 1410 1. The timer expires. If the number of retries has not reached the 1411 maximum allowed value then the FIN is retransmitted and timer 1412 restarted. If the maximum number of retries has been reached with 1413 the last FIN transmission then the FE notifies the FE manager and 1414 goes into INIT state. 1416 2. a valid FIN|ACK packet is received from the CE. Action: cancel 1417 the timer, inform the FE manager and go into INIT state. 1419 8.3.2 CE view of FE State transitions 1421 This is per FE information on the CE side. 1423 wait 1425 for FE 1426 .->-. 1427 ^ Y 1428 | | recvd SYN +---------+ setup complete 1429 ^ Y +->----->----->---------->|SYN_RCVD |---->>>----+ 1430 \ Y | | | Y 1431 +---------+ | state | +--------+ 1432 | INIT | +---------+ | EST | 1433 | | | State | 1434 | State |<-+ +---------+ | | 1435 +---------+ | recvd FIN|ACK | FIN_SENT| +--------+ 1436 ^ +----<---<----------<-| | Y Y 1437 | | State |--<-<--+ | 1438 ^ +---------+ Send FIN Y 1439 | ^ Y | 1440 | | | | 1441 | +-<--+ | 1442 | FIN | 1443 | retrans | 1444 | | 1445 | recvd FIN|ACK or recvd SYN Y 1446 +-<---<---------<-------<---------<-------<-------------+ 1448 INIT state: 1450 When the CE Manager informs the CE of a FE, basic state information 1451 is created for the FE and it is placed into the INIT state. At this 1452 point the CE has been informed by the CE Manager of the following: 1454 o the bundle the FE will join, 1456 o its PID that the FE is going to use to refer to tthe CE, 1458 o the unicast PID of the FE. 1460 o the number of retries for the SYN transmission and the SYN timer 1462 o the number of retries for the FIN transmission and the FIN timer 1463 value. 1465 o the expected timeouts before the FE joins and number of such 1466 timeout to wait for the FE. 1468 o whether the FE is interested in restart information if available 1469 (refer to the FIN_SENT state) 1471 The CE fires a timer waiting for the FE to join. Two things could 1472 happen: 1474 1. The timer expires. If the number of retries for waiting for the 1475 FE to join has not reached the maximum allowed value then the 1476 timer is restarted. If the maximum number of retries is reached 1477 then the CE deletes the FEs state info and informs the manager. 1479 2. A valid SYN packet is received from the FE. The CE transitions 1480 into the SYN_RCVD state. 1482 SYN_RCVD state: 1484 In this state the CE will do any necessary processing to prepare for 1485 the FE to be admitted into the NE. The CE issues a SYN|ACK and moves 1486 into the EST state. 1488 EST state: 1490 This is the established state where normal Forces communication 1491 starts. Several events may force the CE to transition out of the EST 1492 state: 1494 1. the CE manager requests it to. In this case the CE will issue a 1495 FIN with an ACK request to the FE and transition to the FIN_SENT 1496 state. 1498 2. The FE leaves. This is considered a reset of the FE. The FE sends 1499 a FIN to the CE to inform it it is leaving. The CE immediately 1500 sends a FIN ACK and notifies the CE manager. Transition is made 1501 to the INIT state. 1503 Not discussed here is use of hearbeats or other events (eg link down 1504 ) to transition to the INIT state on discovery that the FE is dead. 1506 FIN_SENT state: 1508 The CE fires the FIN timer and waits for a response from the FE. 1510 Two events could happen: 1512 1. The timer expires. If the number of retries has not reached the 1513 maximum allowed value then the FIN is retransmitted and timer 1514 restarted. If the maximum number of retries has been reached with 1515 the last FIN transmission then the CE notifies the CE manager and 1516 goes into INIT state. 1518 2. a valid FIN|ACK packet is received from the FE: 1520 * cancel the timer, inform the CE manager 1522 * transition to the INIT state. 1524 For states that transition to the init state observe that if the FE 1525 comes back and joins before the FE expiry time, its LFB state(s) 1526 would still be intact and maybe resent to it (The restart policy is 1527 agreed on at pre-association time). OTOH, the state will be garbage 1528 collected if no SYNs from the FE are seen within the period (or if 1529 they are new ones seen but FEM-CEM interface indicates no interest in 1530 the restart data). 1532 8.3.3 SYN Message Format 1534 A SYN message contains a base Netlink2 header (refer to Section 5.1) 1535 with the appropriate flags followed by the Extension TLV Name ID 1536 (refer to Section 8.1.5). The Name ID will have the name the FE 1537 wishes to be refered to. 1539 8.3.4 FIN Message Format 1541 A FIN message contains a base Netlink2 header with the appropriate 1542 flags (refer to Section 5.1). 1544 8.3.5 NOOP Message Format 1546 A NOOP message contains a base Netlink2 header with the appropriate 1547 flags (refer to Section 5.1) set. The NOOP carries no execution 1548 message and therefore no operations on LFBs are carried out as a 1549 result of receiving it. The flags of the message are still relevant. 1551 A standard use of NOOP message is for heartbeats. A CE may send LFBs 1552 keepalive messages using NLMSG_NOOP command. When requesting for 1553 replies, the CE sets the NLM_F_ECHO flag on to get the message sent 1554 back to it as is (essentially loopback of exact same message sans the 1555 ECHO flag). 1557 8.4 LFB and FE Service Templates 1559 In this section we describe Service Templates used to configure FEs 1560 and LFBs as well as for async event notification as required by the 1561 ForCES WG charter. 1563 Some of these message templates are already described in the Netlink 1564 document ([RFC3549]) but are repeated here for clarity. 1566 A feature of Netlink2 is that the same message template is used in 1567 configuration, querying or events. In the CE->FE direction 1568 configuration commands embedding Service Templates described in this 1569 section are used to configure (Add or delete a policy for example). 1570 In the FE->CE direction, the templates are used to give back query 1571 responses or throw events at the CE (on a per-LFB basis). 1573 As noted earlier, a single Netlink2 message may carry multiple 1574 service templates if the NLM_F_MS flag is set. This is not restricted 1575 to the config (CE->FE) only but also extends to responses or events 1576 (FE->CE). 1578 8.4.1 Physical Port and Address Functions 1580 [TBF] 1582 8.4.1.1 Interface Service Template 1584 This is very close to what the Port LFB is defined to be in the Model 1585 draft. Its expressive semantics are sufficient to define a physical 1586 port (regardless of the underlying physical links), virtual 1587 interface, etc. 1589 0 1 2 3 1590 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1591 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1592 | Family | Reserved | Device Type | 1593 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1594 | Interface Index | 1595 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1596 | Device Flags | 1597 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1598 | Change Mask | 1599 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1601 Family: 8 bits This is always set to AF_UNSPEC. 1603 Device Type: 16 bits This defines the type of the link. The link 1604 could be Ethernet, PCI, a tunnel, etc. 1606 Interface Index: 32 bits Uniquely identifies interface. 1608 Device Flags: 32 bits 1610 IFF_UP Interface is administratively up. 1612 IFF_BROADCAST Valid broadcast address set. 1614 IFF_DEBUG Internal debugging flag. 1616 IFF_LOOPBACK Interface is a loopback interface. 1618 IFF_POINTOPOINT Interface is a point-to-point link. 1620 IFF_RUNNING Interface is operationally up. 1622 IFF_NOARP No ARP protocol needed for this interface. 1624 IFF_PROMISC Interface is in promiscuous mode. 1626 IFF_NOTRAILERS Avoid use of trailers. 1628 IFF_ALLMULTI Receive all multicast packets. 1630 IFF_MASTER Master of a load balancing bundle. 1632 IFF_SLAVE Slave of a load balancing bundle. 1634 IFF_MULTICAST Supports multicast. 1636 IFF_PORTSEL Is able to select media type via ifmap. 1638 IFF_AUTOMEDIA Auto media selection active. 1640 IFF_DYNAMIC Interface was dynamically created. 1642 Change Mask: 32 bits Reserved for future use. Must be set to 1643 0xFFFFFFFF. 1645 Applicable attributes: 1646 IFLA_UNSPEC Unspecified. 1647 IFLA_ADDRESS Hardware address interface L2 address. 1648 IFLA_BROADCAST Hardware address L2 broadcast 1649 address. 1650 IFLA_IFNAME ASCII string device name. 1651 IFLA_MTU MTU of the device. 1652 IFLA_LINK ifindex of link to which this device 1653 is bound. 1654 IFLA_QDISC ASCII string defining egress root 1655 queuing discipline. 1656 IFLA_STATS Interface statistics. 1658 Netlink message types specific to this service: 1659 RTM_NEWLINK, RTM_DELLINK, and RTM_GETLINK 1661 8.4.1.2 Address Service Template 1663 The expressive semantics of this template are sufficient to define 1664 addressing for a port LFB (physical or virtual interfaces) including 1665 secondary addresses. Although the focus is on IPv4 and IPv6, the 1666 template could be used to configure IPX etc. We only focus on IP. 1668 0 1 2 3 1669 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1670 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1671 | Family | Length | Flags | Scope | 1672 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1673 | Interface Index | 1674 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1676 Family: 8 bits 1677 Address Family: AF_INET for IPv4; and AF_INET6 for IPv6. 1679 Length: 8 bits 1680 The length of the address mask. 1682 Flags: 8 bits 1683 IFA_F_SECONDARY For secondary address (alias interface). 1685 IFA_F_PERMANENT For a permanent address set by the user. 1686 When this is not set, it means the address 1687 was dynamically created (e.g., by stateless 1688 autoconfiguration). 1689 IFA_F_DEPRECATED Defines deprecated (IPv4) address. 1690 IFA_F_TENTATIVE Defines tentative (IPv4) address (duplicate 1691 address detection is still in progress). 1692 Scope: 8 bits 1693 The address scope in which the address stays valid. 1694 SCOPE_UNIVERSE: Global scope. 1695 SCOPE_SITE (IPv6 only): Only valid within this site. 1696 SCOPE_LINK: Valid only on this device. 1697 SCOPE_HOST: Valid only on this host. 1699 Applicable attributes: 1701 IFA_UNSPEC Unspecified. 1702 IFA_ADDRESS Raw protocol address of interface. 1704 IFA_LOCAL Raw protocol local address. 1705 IFA_LABEL ASCII string name of the interface. 1706 IFA_BROADCAST Raw protocol broadcast address. 1707 IFA_ANYCAST Raw protocol anycast address. 1708 IFA_CACHEINFO Cache address information. 1710 Netlink messages specific to this service: RTM_NEWADDR, 1711 RTM_DELADDR, and RTM_GETADDR. 1713 8.4.2 IPv4 and IPv6 L3 Forwarding Functions 1715 In this section we describe two LFB templates necessary for IPv4 and 1716 V6 L3 forwarding control. 1718 8.4.2.1 IPv4 and IPv6 Forwarding LFB Template 1720 The expressive semantics of this template are sufficient to describe 1721 any IPv4 or IPv6 route configuration including ability to express 1722 route entries for virtual routers within a physical router. 1724 0 1 2 3 1725 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1726 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1727 | Family | Src length | Dest length | TOS | 1728 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1729 | Table ID | Protocol | Scope | Type | 1730 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1731 | Flags | 1732 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1734 Family: 8 bits 1735 Address Family: AF_INET for IPv4; and AF_INET6 for IPv6. 1737 Src length: 8 bits 1738 Prefix length of source IP address. 1740 Dest length: 8 bits 1741 Prefix length of destination IP address. 1743 TOS: 8 bits 1744 The 8-bit TOS (should be deprecated to make room for DSCP). 1745 Table ID: 8 bits 1746 Table identifier. Up to 255 route tables are supported. 1747 RT_TABLE_UNSPEC An unspecified routing table. 1748 RT_TABLE_DEFAULT The default table. 1749 RT_TABLE_MAIN The main table. 1751 RT_TABLE_LOCAL The local table. 1753 The user may assign arbitrary values between 1754 RT_TABLE_UNSPEC(0) and RT_TABLE_DEFAULT(253). 1756 Protocol: 8 bits 1757 Identifies what/who added the route. 1758 Protocol Route origin. 1759 .............................................. 1760 RTPROT_UNSPEC Unknown. 1761 RTPROT_REDIRECT By an ICMP redirect. 1762 RTPROT_KERNEL By the kernel. 1763 RTPROT_BOOT During bootup. 1764 RTPROT_STATIC By the administrator. 1766 Values larger than RTPROT_STATIC(4) are not interpreted by the 1767 kernel, they are just for user information. They may be used to 1768 tag the source of a routing information or to distinguish between 1769 multiple routing daemons. 1771 Scope: 8 bits 1772 Route scope (valid distance to destination). 1773 RT_SCOPE_UNIVERSE Global route. 1774 RT_SCOPE_SITE Interior route in the 1775 local autonomous system. 1776 RT_SCOPE_LINK Route on this link. 1777 RT_SCOPE_HOST Route on the local host. 1778 RT_SCOPE_NOWHERE Destination does not exist. 1780 The values between RT_SCOPE_UNIVERSE(0) and RT_SCOPE_SITE(200) 1781 are available to the user. 1783 Type: 8 bits 1784 The type of route. 1786 Route type Description 1787 ---------------------------------------------------- 1788 RTN_UNSPEC Unknown route. 1789 RTN_UNICAST A gateway or direct route. 1790 RTN_LOCAL A local interface route. 1791 RTN_BROADCAST A local broadcast route 1792 (sent as a broadcast). 1793 RTN_ANYCAST An anycast route. 1794 RTN_MULTICAST A multicast route. 1795 RTN_BLACKHOLE A silent packet dropping route. 1796 RTN_UNREACHABLE An unreachable destination. 1797 Packets dropped and host 1798 unreachable ICMPs are sent to the 1799 originator. 1800 RTN_PROHIBIT A packet rejection route. Packets 1801 are dropped and communication 1802 prohibited ICMPs are sent to the 1803 originator. 1804 RTN_THROW When used with policy routing, 1805 continue routing lookup in another 1806 table. Under normal routing, 1807 packets are dropped and net 1808 unreachable ICMPs are sent to the 1809 originator. 1810 RTN_NAT A network address translation 1811 rule. 1812 RTN_XRESOLVE Refer to an external resolver (not 1813 implemented). 1815 Flags: 32 bits 1816 Further qualify the route. 1817 RTM_F_NOTIFY If the route changes, notify the 1818 user. 1819 RTM_F_CLONED Route is cloned from another route. 1820 RTM_F_EQUALIZE Allow randomization of next hop 1821 path in multi-path routing 1822 (currently not implemented). 1824 Attributes applicable to this service: 1825 Attribute Description 1826 --------------------------------------------------- 1827 RTA_UNSPEC Ignored. 1828 RTA_DST Protocol address for route 1829 destination address. 1830 RTA_SRC Protocol address for route source 1831 address. 1832 RTA_IIF Input interface index. 1833 RTA_OIF Output interface index. 1834 RTA_GATEWAY Protocol address for the gateway of 1835 the route 1836 RTA_PRIORITY Priority of route. 1837 RTA_PREFSRC Preferred source address in cases 1838 where more than one source address 1839 could be used. 1840 RTA_METRICS Route metrics attributed to route 1841 and associated protocols (e.g., 1842 RTT, initial TCP window, etc.). 1843 RTA_MULTIPATH Multipath route next hop's 1844 attributes. 1845 RTA_PROTOINFO Firewall based policy routing 1846 attribute. 1848 RTA_FLOW Route realm. 1849 RTA_CACHEINFO Cached route information. 1851 Additional Netlink message types applicable to this service: 1852 RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE 1854 8.4.2.2 Neighbor Discovery LFB Template 1856 The expressive semantics for this config are sufficient to describe 1857 both IPv4 neighbor resolution via ARP or IPv6 neighbor discovery 1858 (RFC2461). 1860 0 1 2 3 1861 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1862 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1863 | Family | Reserved1 | Reserved2 | 1864 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1865 | Interface Index | 1866 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1867 | State | Flags | Type | 1868 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1870 Family: 8 bits 1871 Address Family: AF_INET for IPv4; and AF_INET6 for IPv6. 1873 Interface Index: 32 bits 1874 The unique interface index. 1876 State: 16 bits 1877 A bitmask of the following states: 1878 NUD_INCOMPLETE Still attempting to resolve. 1879 NUD_REACHABLE A confirmed working cache entry 1880 NUD_STALE an expired cache entry. 1881 NUD_DELAY Neighbor no longer reachable. 1882 Traffic sent, waiting for 1883 confirmation. 1884 NUD_PROBE A cache entry that is currently 1885 being re-solicited. 1886 NUD_FAILED An invalid cache entry. 1887 NUD_NOARP A device which does not do neighbor 1888 discovery (ARP). 1889 NUD_PERMANENT A static entry. 1890 Flags: 8 bits 1891 NTF_PROXY A proxy ARP entry. 1892 NTF_ROUTER An IPv6 router. 1894 Attributes applicable to this service: 1896 NDA_UNSPEC Unknown type. 1897 NDA_DST A neighbour cache network 1898 layer destination address 1899 NDA_LLADDR A neighbor cache link layer 1900 address. 1901 NDA_CACHEINFO Cache statistics. 1903 Additional Netlink message types applicable to this service: 1904 RTM_NEWNEIGH, RTM_DELNEIGH, and RTM_GETNEIGH. 1906 8.4.3 Filtering Functions 1908 TBF 1910 8.4.4 QoS Functions 1912 TBF 1914 8.4.5 IPSEC Functions 1916 TBF 1918 8.4.6 Packet redirection Functions 1920 TBF 1922 8.4.7 Packet Mirroring Functions 1924 TBF 1926 8.4.8 Packet Sampling Functions 1928 TBF 1930 8.5 Security Considerations 1932 CEs may communicate vital and possibly confidential information to 1933 FEs via the ForCES protocol. For example, such information can be 1934 filtering rules or secret encryption keys. In addition, the ForCES 1935 protocol should not open new possibilities for Denial of Service 1936 attacks. A single box environment is an interconnect between CEs and 1937 FEs that can be physically secured. ForCES messages coming on 1938 physical ports not part of the interconnect are dropped. In such an 1939 environment, protection is required only against data-packet-based 1940 DoS attacks. A multi-hop environment places more requirements in 1941 terms of security. Protection against Netlink2-SYN-flood attack 1942 becomes necessary. In addition, some or all of the ForCES messages 1943 may have to be authenticated or encrypted. 1945 8.5.1 Denial of Service (DoS) attacks 1947 Preventing DoS attacks resulting from data packets redirected by the 1948 FE to the CE can be achieved by shaping according to configurable 1949 parameters such as a maximum rate. 1951 A data-packets DoS-resistant FE MUST therefore support the necessary 1952 LFBs that permit to place policers that shape traffic redirected to 1953 the CE by an FE. 1955 Preventing DoS attacks at the ForCES protocol level (such as Netlink2 1956 SYN flood) may be necessary if the underlying transport protocol is 1957 not resistant to such attacks. This can be the case if UDP is used, 1958 for instance. In the case of TCP and SCTP, cookie-based mechanisms 1959 already exist to prevent SYN flood DoS attacks (refer to the 1960 respective RFCs and [TCP-SYN-COOKIES]). 1962 A SYN-flood DoS-resistant FE or CE MUST therefore support a 1963 Netlink2-Extension Cookie TLV (TLV_TYPE = NL2_COOKIE). This Cookie 1964 TLV is placed in the ACK message that acknowledges a SYN message. 1965 This Cookie TLV MUST be returned as is in the SYNACK message. (Note: 1966 content and length of the Cookie TLV remain to be standardized, if 1967 necessary). 1969 8.5.2 Authentication and Encryption 1971 To perform authentication, the necessary information may be 1972 configured statically, such as shared secrets or public and private 1973 keys. On the other hand, in a dynamic environment, public keys may 1974 have to be distributed using certificates. Such certificates must 1975 contain names that are uniquely and permanently assigned to CEs and 1976 FEs. Addresses used for routing ForCES messages may change and are 1977 not suitable for that purpose. ForCES qualified names (Note: this 1978 needs to be defined in a draft of its own) MUST be used similarly to 1979 iSCSI qualified names [iSCSI-NAMING]. 1981 References 1983 [Diffserv] 1984 "Linux Diffserv", . 1986 [ForCES_Model] 1987 Yang, L., Halpern, J., Gopal, R., DeKok, A., Haraszti, Z. 1988 and S. Blake, "ForCES Forwarding Element Model", October 1989 2003, < . >. 1992 [ForCES_REQ] 1993 Khosravi, H. and T. Anderson, "Requirements for Separation 1994 of IP Control and Forwarding", October 2003, . 1998 [Goutaudier] 1999 Goutaudier, G., "Enhancements and Prototype Implementation 2000 of the ForCES Netlink2 Protocol, IBM Research Report 2001 RZ3482", September 2003, . 2005 [Netfilter] 2006 "Linux Netfilter", . 2008 [RFC1157] Case, J., Fedor, M., Schoffstall, M. and C. Davin, "Simple 2009 Network Management Protocol (SNMP)", May 1990, . 2012 [RFC1633] Braden, R., Clark, D. and S. Shenker, "Integrated Services 2013 in the Internet Architecture: an Overview", June 1994, 2014 . 2016 [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", June 2017 1995, . 2019 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2020 Requirement Levels", BCP 14, RFC 2119, March 1997. 2022 [RFC2328] Moy, J., "OSPF Version 2", April 1998, . 2025 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Weiss, W. 2026 and Z. Wang, "An Architecture for Differentiated 2027 Services", December 1998, . 2030 [RFC2748] Boyle, J., Cohen, R., Durham, D., Herzog, S., Rajan, R. 2031 and A. Sastry, "The COPS (Common Open Policy Service) 2032 Protocol", January 2000, . 2035 [RFC2844] Przygienda, T., Droz, P. and R. Haas, "OSPF over ATM and 2036 Proxy-PAR", May 2000, . 2039 [RFC3036] Andersson, L., Doolan, P., Feldman, N., Fredette, A. and 2040 B. Thomas, "LDP Specification", January 2001, . 2043 [RFC3292] Doria, A., "General Switch Management Protocol (GSMP) V3", 2044 June 2002, . 2046 [RFC3358] Przygienda, T., "Optional Checksums in Intermediate System 2047 to Intermediate System (ISIS)", August 2002, . 2050 [RFC3549] Hadi Salim, J., Khosravi, H., Kleen, A. and A. Kuznetsov, 2051 "Linux Netlink as an IP Services Protocol", July 2003, 2052 . 2054 [Stevens] Wright, G. and W. Stevens, "TCP/IP Illustrated Volume 2, 2055 Chapter 20", June 1995. 2057 [TCP-SYN-COOKIES] 2058 Dan, D., "SYN cookies", 1997, . 2061 [XTP] "Xpress Transport Protocol Specification, XTP Revision 2062 4.0", March 1995. 2064 [iSCSI-NAMING] 2065 "iSCSI Naming and Discovery, 2066 draft-ietf-ips-iscsi-name-disc-10.txt", June 2003, . 2070 Authors' Addresses 2072 Jamal Hadi Salim 2073 Znyx Networks 2074 195 Stafford Rd. West 2075 Ottawa, Ontario 2076 Canada 2078 EMail: hadi@znyx.com 2080 Robert Haas 2081 IBM Research 2082 Zurich Research Laboratory 2083 Saeumerstrasse 4 2084 CH-8803 Rueschlikon, 2085 Switzerland 2087 EMail: rha@zurich.ibm.com 2089 Steven Blake 2090 Ericsson 2091 920 Main Campus Drive, Suite 500 2092 Raleigh, NC 27606 2093 USA 2095 EMail: steven.blake@ericsson.com 2097 Appendix A. Sample Service Hierarchy 2099 In the diagram below we show a simple IP service, foo, and the 2100 interaction it has between CP and FE components for the 2101 service(labels 1-3). 2103 The diagram is also used to demonstrate CP< - >FE addressing. In 2104 this section we illustrate only the addressing semantics. In 2105 Appendix 2 , the diagram is referenced again to define the protocol 2106 interaction between service foo's CPC and LFB (labels 4-10). 2108 CP 2109 [--------------------------------------------------------. 2110 | .-----. | 2111 | | \ . --------. | 2112 | | CLI | / \ | 2113 | | | | CP protocol | | 2114 | \ /->> -. | component | <-. | 2115 | \__ _/ | | For | | | 2116 | | | IP service | ^ | 2117 | Y | foo | | | 2118 | | \___________/ ^ | 2119 | Y 1,4,6,8,9 / ^ 2,5,10 | 3,7 | 2120 --------------- Y------------/---|----------|----------- 2121 | ^ | ^ 2122 **|***********|****|**********|********** 2123 ************* Netlink2 layer ************ 2124 **|***********|****|**********|********** 2125 FE | | ^ ^ 2126 .-------- Y-----------Y----|--------- |----. 2127 | \ | / | 2128 | \ Y / | 2129 | .\ --------^-------. / | 2130 | |FE component/module|/ | 2131 | | for IP Service | | 2132 --->---|------>---| foo |----->-----|------>-- 2133 | ------------------- | 2134 | | 2135 | | 2136 ------------------------------------------ 2138 The control plane protocol for IP service foo does the following to 2139 connect to its FE counterpart. The steps below are also numbered in 2140 the diagram above. 2142 1. Connect to IP service foo through a socket connect. A typical 2143 connection would be via a call to: socket(AF_NETLINK, SOCK_RAW, 2144 NETLINK_FOO) 2146 2. Bind to listen to specific async events for service foo 2148 3. Bind to listen to specific async FE events 2150 Note that a wrapper socket can be created on top of the real sockets: 2151 depending on the dest PID given, it chooses the most appropriate 2152 socket to send the packet onto (if here are two multicast groups, one 2153 for all FEs, and one for all FEs and CEs, a packet from the CE to the 2154 FEs will use the first multicast group). The wrapper socket 2155 basically maps a message to the most appropriate wire in the bundle. 2157 Appendix B. Sample Protocol for the foo IP Service 2159 Our proverbial IP service "foo" is used again to demonstrate how one 2160 can deploy a simple IP service control using Netlink2. 2162 These steps are continued from Appendix 1 (hence the numbering). 2164 1. query for current config of FE component 2166 2. receive response to 4) via channel on 3) 2168 3. query for current state of IP service foo 2170 4. receive response to 6) via channel on 2) 2172 5. register the protocol specific packets you would like the FE to 2173 forward to you 2175 6. send specific service foo commands and receive responses for them 2176 if needed 2178 B.1 Interacting with Other IP Services 2180 The diagram in Appendix 1 shows another control component configuring 2181 the same service. In this case, it is a proprietary Command Line 2182 Interface. The CLI may or may not be using the Netlink protocol to 2183 communicate with the foo component. If the CLI should issue commands 2184 that will affect the policy of the LFB for service "foo", then the 2185 "foo" CPC is notified. It could then make algorithmic decisions 2186 based on this input. For example if a FE allowed another service to 2187 delete policies installed by a different service and a policy that 2188 foo installed was deleted by service bar, there might be a need to 2189 propagate this to all the peers of service "foo"). 2191 Appendix C. Examples 2193 In this example we show a simple configuration Netlink2 message sent 2194 from a TC CPC to an egress TC FIFO queue. This queue algorithm is 2195 based on packet counting and drops packets when the limit exceeds the 2196 configured limit (100 packets in the example policy below). We assume 2197 the queue is in hierarchical setup with a parent 100:0 and a classid 2198 of 100:1 and that it is to be installed on device with ifindex of 4. 2200 0 1 2 3 2201 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2202 0 1 2 3 2203 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2204 | Version | Flags_E | Length | 2205 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2206 | Type (RTM_NEWQDISC) | Flags (NLM_F_EXCL | | 2207 | |NLM_F_CREATE | NLM_F_REQUEST) | 2208 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2209 | Sequence Number (arbitrary number) | 2210 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2211 | Source PID | 2212 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2213 | Destination PID | 2214 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2215 | Type == NL2_SERVICE | Outer Length | 2216 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2217 | Type == NL2_QDISC | Inner Length | 2218 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2219 |Family(AF_INET)| Reserved1 | Reserved1 | 2220 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2221 | Interface Index (4) | 2222 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2223 | Qdisc handle (0x1000001) | 2224 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2225 | Parent Qdisc (0x1000000) | 2226 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2227 | TCM Info (0) | 2228 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2229 | Type (TCA_KIND) | Length(4) | 2230 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2231 | Value ("pfifo") | 2232 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2233 | Type (TCA_OPTIONS) | Length(4) | 2234 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2235 | Value (limit=100) | 2236 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2238 Intellectual Property Statement 2240 The IETF takes no position regarding the validity or scope of any 2241 intellectual property or other rights that might be claimed to 2242 pertain to the implementation or use of the technology described in 2243 this document or the extent to which any license under such rights 2244 might or might not be available; neither does it represent that it 2245 has made any effort to identify any such rights. Information on the 2246 IETF's procedures with respect to rights in standards-track and 2247 standards-related documentation can be found in BCP-11. Copies of 2248 claims of rights made available for publication and any assurances of 2249 licenses to be made available, or the result of an attempt made to 2250 obtain a general license or permission for the use of such 2251 proprietary rights by implementors or users of this specification can 2252 be obtained from the IETF Secretariat. 2254 The IETF invites any interested party to bring to its attention any 2255 copyrights, patents or patent applications, or other proprietary 2256 rights which may cover technology that may be required to practice 2257 this standard. Please address the information to the IETF Executive 2258 Director. 2260 Full Copyright Statement 2262 Copyright (C) The Internet Society (2003). All Rights Reserved. 2264 This document and translations of it may be copied and furnished to 2265 others, and derivative works that comment on or otherwise explain it 2266 or assist in its implementation may be prepared, copied, published 2267 and distributed, in whole or in part, without restriction of any 2268 kind, provided that the above copyright notice and this paragraph are 2269 included on all such copies and derivative works. However, this 2270 document itself may not be modified in any way, such as by removing 2271 the copyright notice or references to the Internet Society or other 2272 Internet organizations, except as needed for the purpose of 2273 developing Internet standards in which case the procedures for 2274 copyrights defined in the Internet Standards process must be 2275 followed, or as required to translate it into languages other than 2276 English. 2278 The limited permissions granted above are perpetual and will not be 2279 revoked by the Internet Society or its successors or assignees. 2281 This document and the information contained herein is provided on an 2282 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 2283 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 2284 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 2285 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 2286 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2288 Acknowledgment 2290 Funding for the RFC Editor function is currently provided by the 2291 Internet Society.