idnits 2.17.1 draft-ietf-forces-netlink-00.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 18 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 19 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 23 instances of too long lines in the document, the longest one being 17 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 106 has weird spacing: '...Netlink layer...' == Line 302 has weird spacing: '...Netlink layer...' == Line 508 has weird spacing: '...sioning servi...' == Line 658 has weird spacing: '...ap //is a...' == Line 659 has weird spacing: '... pseudo netwo...' == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 2001) is 8259 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC-2119' is mentioned on line 39, but not defined == Unused Reference: 'RFC1633' is defined on line 681, but no explicit reference was found in the text == Unused Reference: 'RFC1812' is defined on line 685, but no explicit reference was found in the text == Unused Reference: 'RFC2475' is defined on line 688, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 1633 ** Downref: Normative reference to an Informational RFC: RFC 2475 ** Downref: Normative reference to an Historic RFC: RFC 1157 ** Obsolete normative reference: RFC 3036 (Obsoleted by RFC 5036) Summary: 13 errors (**), 0 flaws (~~), 13 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 ForCES Working Group Jamal Hadi Salim 3 Internet Draft Znyx Networks 4 Hormuzd Khosravi 5 Intel 6 Andi Kleen 7 Suse 8 Alexey Kuznetsov 9 INR/Swsoft 10 September 2001 12 Netlink as an IP services protocol 13 draft-ietf-forces-netlink-00.txt 15 Status of this Memo 17 This document is an Internet-Draft and is in full conformance with 18 all provisions of Section 10 of RFC2026. Internet-Drafts are working 19 documents of the Internet Engineering Task Force (IETF), its areas, 20 and its working groups. Note that other groups may also distribute 21 working documents as Internet-Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as ``work in progress.'' 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 Conventions used in this document 36 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 37 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 38 this document are to be interpreted as described in [RFC-2119]. 40 1. Abstract 42 This document describes Linux Netlink, which is used in Linux both 43 as an inter-kernel messaging system as well as between kernel and 45 jhs_hk_ak_ak draft-forces-netlink-00.txt 47 user-space. The purpose of this document is intended as informa- 48 tional in the context of prior art for the ForCES IETF working 49 group. The focus of this document is to describe netlink from a 50 context of a protocol between a Forwording Engine Component (FEC) 51 and a Control Plane Component(CPC) that define an IP service. 53 The document ignores the ability of netlink as a inter-kernel mes- 54 saging system, as a an inter-process communication scheme (IPC) or 55 its use in configuring other non-network as well as network but 56 non-IP services (such as decnet etc). 58 2. Introduction 60 The concept of IP Service control-forwarding separation was first 61 introduced in the early 1980s by the BSD 4.4 routing sock- 62 ets[stevens]. The focus at that time was a simple IP(v4) forward- 63 ing service and how the CPC, either via a command line configura- 64 tion tool or a dynamic route daemon, can control forwarding tables 65 for that IPV4 forwarding service. 67 The IP world has evolved considerably since those days. Linux 68 netlink, when observed from a service provisioning point of view 69 takes routing sockets one step further by breaking the barrier of 70 focus around IPV4 forwarding. Since the 2.1 kernel, netlink has 71 been providing the IP service abstraction to a few services other 72 than the classical IPv4 forwarding. 74 We first give some concept definitions and then describe how 75 netlink fits in. 77 2.1. Some definitions 79 A Control plane(CP) is an execution environment that may have sev- 80 eral components which we refer to as CPCs. Each CPC provides con- 81 trol for a different IP service being executed by a FE component. 82 This means that there might be several CPCs on a physical CP if it 83 is controlling several IP services. In essence, the cohesion 84 between a CP component and a FE component is the service abstrac- 85 tion. 87 In the diagram below we show a simple FE<->CP setup to provide an 88 example of the classical IPv4 service with an extension to do some 89 basic QoS egress scheduling and how it fits in this described 91 jhs_hk_ak_ak draft-forces-netlink-00.txt 93 model. 95 Control Plane (CP) 96 .------------------------------------ 97 | /^^^^^\ /^^^^^\ | 98 | | | | COPS |-\ | 99 | | ospfd | | PEP | | | 100 | | / \____/ | | 101 /-----\_____/ | | | 102 | | | | | | 103 | |_____________________|____|_________| 104 | | | | 105 ****************************************** 106 Forwarding ************* Netlink layer ************ 107 Engine (FE) ***************************************** 108 .-------------|-----------|------------|---|----------- 109 | IPv4 forwading | / | 110 | FE Service / / | 111 | Component / / | 112 | ---------------/---------------/--------- | 113 | | | / | | 114 packet | | --------|-- ----|----- | packet 115 in | | | IPV4 | | Egress | | out 116 -->--->|------>|---->|Forwading |----->| QoS |--->| ---->|----> 117 | | | | | Scheduler| | | 118 | | ----------- ---------- | | 119 | | | | 120 | --------------------------------------- | 121 | | 122 ------------------------------------------------------- 124 2.1.1. Control Plane Components (CPCs) 126 Control plane components would encompass signalling protocols with 127 diversity ranging from dynamic routing protocols such as OSPF 128 [RFC2328] to tag distribution protocols such as CR-LDP [RFC3036]. 129 Classical Management protocols and activities also fall under this 130 category. These include SNMP [RFC1157], COPS [RFC2748] or propri- 131 etary CLI/GUI configuration mechanisms. 133 jhs_hk_ak_ak draft-forces-netlink-00.txt 135 The purpose of the control plane is to provide an execution envi- 136 ronment for the above mentioned activities with the ultimate goal 137 being to configure and manage the second NE component: the FE. The 138 result of the configuration would define the way packets travesing 139 the FE are treated. 141 The CP components are traditionaly run in software since they tend 142 to be very rich in syntax and are moving targets requiring ease of 143 modification. 145 In the above diagram, ospfd and COPS are distinct CPCs. 147 2.1.2. Forwarding Engine Components 149 The FE is the entity of the NE that incoming packets (from the net- 150 work into the NE) first encounter. 152 The FE's service specific component massages the packet to provide 153 it with a treatment to achieve a IP service as defined by the con- 154 trol plane components for that IP service. Different services will 155 utilize different FEC. Service modules maybe chained to achieve a 156 more complex service (as shown in the diagram). When built for 157 providing a specific service, the FE service component will adhere 158 to a Forwading Model. 160 In the above diagram, the IPV4 FE component includes both the IPV4 161 Forwarding service module as well as the Egress Scheduling service 162 module. Another service might may add a policy forwarder between 163 the IPV4 forwader and the QoS egress Scheduler. A simpler classi- 164 cal service would have constituted only the IPV4 forwarder. 166 2.1.3. IP Services 168 An IP Service is the treatment of an IP packet within the NE. This 169 treatment is provided by a combination of both the CPC and FEC 171 The time span of the service is from the moment when the packet 172 arrives at the NE to the moment it departs. In essence an IP ser- 173 vice in this context is a Per-Hop Behavior. A service control/sig- 174 naling protocol/management-application (CP components running on 175 NEs defining the end to end path) unifies the end to end view of 176 the IP service. As noted above, these CP components then define the 178 jhs_hk_ak_ak draft-forces-netlink-00.txt 180 behavior of the FE (and therefore the NE) to a described packet. 182 A simple example of an IP service is the classical IPv4 Forwading. 183 In this case, control components such as routing protocols(OSPF, 184 RIP etc) and proprietary CLI/GUI configurations modify the FE's 185 forwarding tables in order to offer the simple service of forward- 186 ing packets to the next hop. Traditionally, NEs offering this sim- 187 ple service are known as routers. 189 Over the years it has become important to add aditional services to 190 the routers to meet emerging requirements. More complex services 191 extending classical forwarding were added and standardized. These 192 newer services might go beyond the layer 3 contents of the packet 193 header. However, the name "router", although a misnomer, is still 194 used to describe these NEs. Services (which may look beyond the 195 classical L3 headers) here include firewalling, Qos in Diffserv and 196 RSVP, NATs, policy based routing etc. Newer control protocols or 197 management activities are introduced with these new services. 199 One extreme definition of a IP service is something a service 200 provider would be able to charge for. 202 3. Netlink Architecture 204 IP services components control is defined by using templates. 206 The FEC and CPC participate to deliver the IP service by communi- 207 cating using these templates. The FEC might continously get 208 updates from the control plane component on how to operate the ser- 209 vice (example for V4 forwarding route additions or deletions). 211 The interaction between the FEC and the CPC, in the netlink con- 212 text, would define a protocol. Netlink provides the mechanism for 213 the CPC(residing in user space) and FEC(residing in kernel space) 214 to define their own protocol definition. The FEC and CPC, using 215 netlink mechanisms, may choose to define a reliable protocol 216 between each other, for example. By default netlink provides an 217 unreliable communication. 219 Note that the FEC and CPC can both live in the same memory protec- 220 tion domain and use the connect() system call to create a path to 221 the peer and talk to each other. We will not discuss this further 222 other than to say it is available as a mechanism. Through out this 223 document we will refer interchangbly to the FEC to mean kernel- 224 space and the CPC to mean user-space. 226 jhs_hk_ak_ak draft-forces-netlink-00.txt 228 Note: Netlink allows participation in IP services by both service 229 components. 231 3.1. The message format 233 There are three levels to a netlink message: The general netlink 234 message header, the IP service specific template, the IP service 235 specific data. 237 0 1 2 3 238 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 239 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 240 | | 241 | Netlink message header | 242 | | 243 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 244 | | 245 | IP Service Template | 246 | | 247 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 248 | | 249 | IP Service specific data in TLVs | 250 | | 251 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 253 3.2. Wire Model 255 [In here we describe the pseudo-wire model that netlink uses inside 256 the kernel] 258 3.3. Protocol Model 260 This section expands on how netlink provides the mechanism for ser- 261 vice oriented FEC and CPC interaction. 263 3.3.1. Service Addressing 265 Access is provided by first connecting to the service on the FE. 266 This is done by making a socket() system call to the PF_NETLINK 268 jhs_hk_ak_ak draft-forces-netlink-00.txt 270 domain. Each FEC is identified by a protocol number. One may open 271 either SOCK_RAW or SOCK_DGRAM type sockets although netlink doesnt 272 distinguish the two. The socket connection provides the basis for 273 the FE<->CP addressing. 275 Connecting to a service is followed (at any point during the life 276 of the connection) by issuing either a service specific command 277 mostly for configuration purposes (from the CPC to the FEC) or sub- 278 scribing/unsubscribing to service(s') events. 280 3.3.1.1. Sample Service Hierachy 282 In the diagram below we show a simple IP service, foo, and the 283 interaction it has between CP and FE components for the service. 285 jhs_hk_ak_ak draft-forces-netlink-00.txt 287 CP 288 [--------------------------------------------------------. 289 | .-----. | 290 | | \ . --------. | 291 | | CLI | / \ | 292 | | | | CP protocol\ | 293 | | /->> --. | component | <-. | 294 | \__ _/ | | For | | | 295 | | | IP service | ^ | 296 | Y | foo | | | 297 | | \____________/ ^ | 298 | Y 1,4,6,8,9 / ^ 2,5,10 | 3,7 | 299 --------------- Y------------/---|----------|----------- 300 | ^ | ^ 301 **|***********|****|**********|********** 302 ************* Netlink layer ************ 303 **|***********|****|**********|********** 304 FE | | ^ ^ 305 .-------- Y-----------Y----|--------- |----. 306 | | / | 307 | Y / | 308 | . --------^-------. / | 309 | |FE component/module|/ | 310 | | for IP Service | | 311 --->---|------>---| foo |----->-----|------>-- 312 | ------------------- | 313 | | 314 | | 315 ------------------------------------------ 317 The control plane protocol for IP service foo does the following to 318 connect to its FE counterpart. The steps below are also numbered 319 above in the diagram. 321 1) Connect to IP service foo through a socket connect. A typical con- 322 nection would be via a call to: socket(AF_NETLINK, SOCK_RAW, 323 NETLINK_FOO) 325 2) Bind to listen to specific async events for service foo 327 3) Bind to listen to specific async FE events 329 jhs_hk_ak_ak draft-forces-netlink-00.txt 331 3.3.2. Netlink message header 333 Netlink messages consist of a byte stream with one or multiple 334 Netlink headers and associated payload. (For multipart messages the 335 first and all following headers have the NLM_F_MULTI netlink header 336 flag set, except for the last header which has the netlink header 337 type NLMSG_DONE.) 339 The netlink message header is shown below. 341 0 1 2 3 342 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 343 0 1 2 3 344 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 345 | Length | 346 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 347 | Type | Flags | 348 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 349 | Sequence Number | 350 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 351 | Process PID | 352 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 354 The fields in the header are: 356 jhs_hk_ak_ak draft-forces-netlink-00.txt 358 Length: 32 bits 359 The length of the message in bytes including the header. 361 Type: 16 bits 362 This field describes the message content. 363 It can be one of the standard message types: 364 NLMSG_NOOP message is ignored in the current implementation 365 NLMSG_ERROR the message signals an error and the payload 366 contains a nlmsgerr structure. This can be looked 367 at as a NACK and typically it is from FEC to CPC. 368 NLMSG_DONE message terminates a multipart message 370 Individual IP Services specify more message types, for e.g., 371 NETLINK_ROUTE Service specifies several types such as RTM_NEWLINK, 372 RTM_DELLINK, RTM_GETLINK, RTM_NEWADDR, RTM_DELADDR, RTM_NEWROUTE, 373 RTM_DELROUTE, etc. 375 Flags: 16 bits 376 The standard flag bits used in netlink are 377 NLM_F_REQUEST Must be set on all request messages (typically 378 from user space to kernel space) 379 NLM_F_MULTI Indicates the message is part of a multipart message 380 terminated by NLMSG_DONE 381 NLM_F_ACK Request for an acknowledgment on success. Typical 382 direction of request is from user space to kernel space. 383 NLM_F_ECHO Echo this request. Typical direction of request is from 384 user space to kernel space. 386 Additional flag bits for GET requests on config information in the FEC. 387 NLM_F_ROOT Return the complete table instead of a single entry. 388 NLM_F_MATCH Return all matching criteria passed in message content 389 NLM_F_ATOMIC Return an atomic snapshot of the table being referenced. 390 NLM_F_DUMP Return all that matches in the table. This is a shortcut 391 having both NLM_F_ROOT and NLM_F_MATCH flags set. 393 Additional flag bits for NEW requests 394 NLM_F_REPLACE Replace existing matching config object with this 395 request. 396 NLM_F_EXCL Don't replace the config object if it already exists. 397 NLM_F_CREATE Create config object if it doesn't already exist. 398 NLM_F_APPEND Add to the end of the object list. 400 For those familiar with BSDish use of such operations in route 401 sockets, the equivalent translations are: 403 jhs_hk_ak_ak draft-forces-netlink-00.txt 405 BSD ADD operation equates to NLM_F_CREATE or-ed with NLM_F_EXCL 407 BSD CHANGE operation equates to NLM_F_REPLACE 409 BSD Check operation equates to NLM_F_EXCL 411 BSD APPEND equaivalent is actually mapped to NLM_F_CREATE 413 Sequence Number: 32 bits 414 The sequence number of the message. 416 Process PID: 32 bits 417 The PID of the process sending the message. The PID is used by the 418 kernel to multiplex to the correct sockets. A PID of zero is used 419 when sending messages to user space from the kernel. 421 3.3.2.1. Mechanisms for creating protocols 423 One could create a reliable protocol between an FEC and a CPC by 424 using the combination of sequence numbers, ACKs and retransmit 425 timers. Both sequence numbers and sequence numbers are provided by 426 netlink. Timers are provided by Linux. 428 One could create a heartbeat protocol between the FEC and CPC by 429 using the ECHO flags. 431 3.3.2.2. The ACK netlink message 433 This message is actually used to denote both an ACK and a NACK. 434 Typically the direction is from kernel to user space (in response 435 to an ACK request message that is sent). However, user space should 436 be able to send ACKs back to kernel space when requested. This is 437 IP service specific. 439 jhs_hk_ak_ak draft-forces-netlink-00.txt 441 0 1 2 3 442 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 443 0 1 2 3 444 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 445 | Netlink message header | 446 | type = NLMSG_ERROR | 447 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 448 | error code | 449 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 450 | OLD Netlink message header | 451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 453 Error code: integer (typically 32 bits) 455 Error code of zero indicates that the message is an ACK response. 456 An ACK response message contains the original netlink message 457 header that can be used to compare against (sent sequence numbers 458 etc). 460 A non-zero error message is equivalent to a Negative ACK (NACK). 461 In such a situation, the netlink data that was sent down to the 462 kernel is returned appended to the original netlink message header. 463 An error code printable via the perror() is also set (not in the 464 message header, rather in the executing environment state vari- 465 able). 467 3.3.3. FE services' templates 469 These are services that are offered by the system for general use 470 by other services. They include ability to configure and listen to 471 changes in resource management. IP address management, link events 472 etc fit here. We separate them into this section here for logical 473 purposes despite the fact that they are accessed via the 474 NETLINK_ROUTE FEC. The reason that they exist within NETLINK_ROUTE 475 is due to historical cruft based on the fact that BSD 4.4 rather 476 narrowly focussed Route Sockets implemented them as part of the 477 IPV4 forwarding sockets. 479 3.3.3.1. 481 Network Interface Service Module 483 jhs_hk_ak_ak draft-forces-netlink-00.txt 485 This service provides the ability to create, remove or get informa- 486 tion about a specific network interface. The Interface service 487 message template is shown below. 489 0 1 2 3 490 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 491 0 1 2 3 492 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 493 | Family | Padding | Device Type | 494 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 495 | Interface Index | 496 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 497 | Device Flags | 498 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 499 | Change Mask | 500 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 502 Descriptions of the headers to be added. 504 3.3.3.2. IP Address Service module 506 This service provides the ability to add, remove or receive information 507 about an IP address associated with an interface. The Address provi- 508 sioning service message template is shown below. 510 0 1 2 3 511 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 512 0 1 2 3 513 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 514 | Family | Length | Flags | Scope | 515 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 516 | Interface Index | 517 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 519 Descriptions of the headers to be added. 521 4. Sample Protocol for The foo IP service 523 Our proverbial IP service "foo" is used again to demonstrate how 524 one can deploy a simple IP service control using netlink. 526 jhs_hk_ak_ak draft-forces-netlink-00.txt 528 These steps are continued from the "Sample Service Hierachy" sec- 529 tion. 531 4) query for current config of FE component 533 5) receive response to 4) via channel on 3) 535 6) query for current state of IP service foo 537 7) receive response to 6) via channel on 2) 539 9) register the protocol specific packets you would like the FE to 540 forward to you 542 10) send specific service foo commands and receive responses for them 543 if needed 545 4.1. Interacting with other IP services 547 The last diagram shows another control component configuring the 548 same service. In this case, it is a proprietary Command Line Inter- 549 face. The CLI (may or ) may not be using the netlink protocol to 550 communicate to the foo component. If the CLI should issue commands 551 that will affect the policy of the FEC for service "foo" then, then 552 the "foo" CPC is notified. It could then make algorithmic decisions 553 based on this input (example if a policy that foo installed was 554 deleted, there might be need to propagate this to all the peers of 555 service "foo"). 557 5. Currently Defined netlink IP services 559 Although there are many other IP services defined which are using 560 netlink, we will only mention those integrated into the kernel 561 today (kernel version 2.4.6). These are: 563 NETLINK_ROUTE,NETLINK_FIREWALL,NETLINK_ARPD,NETLINK_ROUTE6,NETLINK_IP6_FW 564 NETLINK_TAPBASE,NETLINK_SKIP,NETLINK_USERSOCK. 566 jhs_hk_ak_ak draft-forces-netlink-00.txt 568 5.1. IP Service NETLINK_ROUTE 570 This service allows CPCs to modify the IPv4 routing table in the 571 Forwarding Engine. It can also be used by CPCs to receive routing 572 updates. 574 5.1.1. Network Route Service Module 576 This service provides the ability to create, remove or receive informa- 577 tion about a network route. The service message template is shown 578 below. 580 0 1 2 3 581 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 582 0 1 2 3 583 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 584 | Family | Src length | Dest length | TOS | 585 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 586 | Table ID | Protocol | Scope | Type | 587 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 588 | Flags | 589 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 591 Descriptions of the headers to be added. 593 5.1.2. Neighbour Setup Service Module 595 This service provides the ability to add, remove or receive infor- 596 mation about a neighbour table entry (e.g. an ARP entry). The ser- 597 vice message template is shown below. 599 jhs_hk_ak_ak draft-forces-netlink-00.txt 601 0 1 2 3 602 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 603 0 1 2 3 604 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 605 | Family | Padding | Padding | 606 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 607 | Interface Index | 608 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 609 | State | Flags | Type | 610 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 612 5.1.3. Traffic Control Service 614 This service provides the ability to add, remove or get a queueing dis- 615 cipline. The service message template is shown below. 617 0 1 2 3 618 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 619 0 1 2 3 620 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 621 | Family | Padding | Padding | 622 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 623 | Interface Index | 624 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 625 | Qdisc handle | 626 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 627 | Parent Qdisc | 628 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 629 | TCM Info | 630 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 632 5.2. IP Service NETLINK_FIREWALL 634 This service allows CPCs to receive packets sent by the IPv4 fire- 635 wall module in the FE. 637 jhs_hk_ak_ak draft-forces-netlink-00.txt 639 5.3. IP Service NETLINK_ARPD 641 This service is used by CPCs for managing the ARP table in FE. 643 5.4. IP Service NETLINK_ROUTE6 645 This service allows CPCs to modify the IPv6 routing table in the 646 FE. It can also be used by CPCs to receive routing updates. 648 5.5. IP Service NETLINK_IP6_FW 650 This service allows CPCs to receive packets that failed the IPv6 651 firewall checks by that module in the FE. 653 5.6. IP Service NETLINK_TAPBASE 655 This service allows CPCs to simulate an ethernet driver belonging 656 to the FE. 658 //are the instances of the ethertap device. Ethertap //is a 659 pseudo network tunnel device that allows an //ethernet driver to 660 be simulated from user space. 662 5.7. IP Service NETLINK_SKIP 664 This service is reserved for ENskip (?). 666 jhs_hk_ak_ak draft-forces-netlink-00.txt 668 5.8. IP Service NETLINK_USERSOCK 670 This service is reserved for future Control Plane to FE protocols. 672 6. Security Considerations 674 Netlink lives in a trusted environment of a single host separated 675 by kernel and user space. Linux capabilities ensures that only 676 someone with CAP_NET_ADMIN capability (typically root user) is 677 allowed to open sockets. 679 7. References 681 [RFC1633] R. Braden, D. Clark, and S. Shenker, "Integrated 682 Services in the Internet Architecture: an Overview", RFC 1633, 683 ISI, MIT, and PARC, June 1994. 685 [RFC1812] F. Baker, "Requirements for IP Version 4 686 Routers", RFC 1812, June 1995. 688 [RFC2475] M. Carlson, W. Weiss, S. Blake, Z. Wang, D. 689 Black, and E. Davies, "An Architecture for Differentiated 690 Services", RFC 2475, December 1998. 692 [RFC2748] J. Boyle, R. Cohen, D. Durham, S. Herzog, R. 693 Rajan, A. Sastry, "The COPS (Common Open Policy Service) Pro- 694 tocol", RFC 2748, January 2000. 696 [RFC2328] J. Moy, "OSPF Version 2", RFC 2328, April 1998. 698 [RFC1157] J.D. Case, M. Fedor, M.L. Schoffstall, C. Davin, 699 "Simple Network Management Protocol (SNMP)", RFC 1157, May 700 1990. 702 jhs_hk_ak_ak draft-forces-netlink-00.txt 704 [RFC3036] L. Andersson, P. Doolan, N. Feldman, A. Fredette, 705 B. Thomas "LDP Specification", RFC 3036, January 2001. 707 [stevens] G.R Wright, W. Richard Stevens. "TCP/IP Illus- 708 trated Volume 2, Chapter 20", June 1995 710 8. Acknowledgements 712 1) Andi Kleen for man pages on netlink and rtnetlink. 714 2) Alexey Kuznetsov is credited for extending netlink to the IP ser- 715 vice delivery model. The original netlink character device was 716 written by Alan Cox. 718 9. Author's Address: 720 Jamal Hadi Salim 721 Znyx Networks 722 Ottawa, Ontario 723 Canada 724 hadi@znyx.com 726 Hormuzd M Khosravi 727 Intel 728 2111 N.E. 25th Avenue JF3-206 729 Hillsboro OR 97124-5961 730 1 503 264 0334 731 hormuzd.m.khosravi@intel.com