idnits 2.17.1 draft-helmy-pim-sm-implem-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 20 longer pages, the longest (page 3) being 60 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 251 instances of weird spacing in the document. Is it really formatted ragged-right, rather than justified? ** There are 78 instances of too long lines in the document, the longest one being 12 characters in excess of 72. ** There are 6 instances of lines with control characters in the document. ** The abstract seems to contain references ([1,2,3]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 12 has weird spacing: '... Drafts are ...' == Line 13 has weird spacing: '...cuments of t...' == Line 14 has weird spacing: '...ups may also ...' == Line 18 has weird spacing: '... Drafts may ...' == Line 19 has weird spacing: '...iate to use ...' == (246 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (Jan 19, 1997) is 9958 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'IFNAMSIZ' on line 251 -- Looks like a reference, but probably isn't: 'MAXVIFS' on line 545 -- Looks like a reference, but probably isn't: 'MFCTBLSIZ' on line 554 -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' Summary: 15 errors (**), 0 flaws (~~), 9 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Ahmed Helmy 3 Internet Draft 4 Expire in six months 5 draft-helmy-pim-sm-implem-00.txt Jan 19, 1997 7 Protocol Independent Multicast-Sparse Mode (PIM-SM): Implementation 8 Document 10 Status of This Memo 12 This document is an Internet Draft. Internet Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its Areas, 14 and its Working Groups. (Note that other groups may also distribute 15 working documents as Internet Drafts). 17 Internet Drafts are draft documents valid for a maximum of six 18 months. Internet Drafts may be updated, replaced, or obsoleted by 19 other documents at any time. It is not appropriate to use Internet 20 Drafts as reference material or to cite them other than as a 21 ``working'' draft'' or ``work in progress.'' 23 Please check the I-D abstract listing contained in each Internet 24 Draft directory to learn the current status of this or any other 25 Internet Draft. 27 Abstract 29 This document describes the details of the PIM-SM [1,2,3] version 2 30 implementation for UNIX platforms; namely SunOS and SGI-IRIX. A 31 generic kernel model is adopted, which is protocol independent, 32 however some supporting functions are added to the kernel for 33 encapsulation of data packets at user level and decapsulation of PIM 34 Registers. 36 Further, the basic model for the user level, PIM daemon (pimd), 37 implementation is described. 39 Implementation details and code are included in supplementary 40 appendices. 42 1 Introduction 44 In order to support multicast routing protocols in a UNIX 45 environment, both the kernel and the daemon parts have to interact 46 and cooperate in performing the processing and forwarding functions. 48 The kernel basically handles forwarding the data packets according to 49 a multicast forwarding cache (MFC) stored in the kernel. While the 50 protocol specific daemon is responsible for processing the control 51 messages from other routers and from the kernel, and maintaining the 52 multicast forwarding cache from user level through traps and system 53 calls. The daemon takes care of all timers for the multicast routing 54 table (MRT) entries according to the specifications of the multicast 55 protocol; PIM-SMv2 [3]. The details of the implementation are 56 presented as follows. First, an overview of the system (kernel and 57 daemon) is given, with emphasis on the basic functions of each. Then, 58 a structural model is presented for the daemon, outlining the basic 59 building blocks for the multicast routing table and the virtual 60 interface table at user space. An illustrative functional description 61 is given, thereafter, for the daemon-kernel interface, and the 62 kernel. Finally, supplementary appendices provide more detailed 63 information about the implementation specifics [*] 65 _________________________ 66 [*] The models discussed herein are merely illustra- 67 tive, and by no means are they exhaustive nor authori- 68 tative. 70 2 System Overview 72 The PIM daemon processes all the PIM control messages and sets up an 73 appropriate kernel environment to deliver multicast data packets. The 74 kernel has to support multicast packets forwarding (see figure 1). 76 [Figures are present only in the postscript version] 77 Fig. 1 System overview 79 2.1 The kernel level 81 When the kernel receives an IP packet, it passes it through the IP 82 handling routine [ip-intr()]. ip-intr dispatches the packet to the 83 appropriate handling machinery, based on the destination address and 84 the IP protocol number. Here, we are only concerned with the 85 following cases: 87 * If the packet is a multicast packet then it passes through 88 the multicast forwarding machinery in the kernel [ip- 89 mforward()]. Subsequently, if there is a matching entry 90 (source and group addresses), with the right incoming 91 interface (iif), we get a `cache hit', and the packet is 92 forwarded to the corresponding outgoing interfaces (oifs) 93 through the fast forwarding path. Otherwise, if the packet 94 does not match the source, group, and iif, we get a `cache 95 miss', and an internal control message is passed up 96 accordingly to the daemon for processing. 98 * If the IP packet is a PIM packet (i.e. has protocol number 99 of IPPROTO-PIM), it passes through the PIM machinery in the 100 kernel [pim-input()], and in turn is passed up to the 101 socket queue using the raw-input() call. 103 * If the IP packet is an IGMP packet (i.e. has protocol 104 number of IPPROTO-IGMP), it passes through the IGMP 105 machinery in the kernel [igmp-input()], and in turn is 106 passed up to the socket queue using the raw-input() call. 108 2.2 The user level (daemon) 110 All PIM, IGMP and internal control (e.g. cache miss and wrong 111 incoming interface) messages are passed to PIM daemon; the 112 daemon has the complete information to creat multicast routing 113 table (MRT). It also updates the multicast forwarding cache 114 (MFC) inside the kernel by using the `setsockopt()' system call, 115 to facilitate multicast packets forwarding (see figure 2). 117 [Figures are present only in the postscript version] 118 Fig. 2 IP-multicast implementation model 120 The PIM daemon listens to PIM and IGMP sockets and receives both 121 PIM and IGMP messages. The messages are processed by the 122 corresponding machinery in the daemon [accept-pim() and accept- 123 igmp(), respectively], and dispatched to the right component of 124 the protocol. 126 Modifications and updates are made to the multicast routing 127 table (MRT) according to the processing, and appropriate changes 128 are reflected in the kernel entries using the setsockopt() 129 system call . 131 Other messages may be triggered off of the state kept in the 132 daemon, to be processed by upstream/downstream routers. 134 3 User-level Implementation (The Multicast Routing Daemon) 136 The basic functional flow model for the PIM daemon is described 137 in this section (see figure 3). The user level implementation is 138 broken up into modules based on functionality, and includes 139 modules to handle the multicast routing table (MRT) [in mrt.c], 140 the virtual interface (vif) table [in vif.c], PIM IGMP messages 141 [in pim.c igmp.c], protocol specific routing functions [in 142 route.c], timers and housekeeping [in timer.c], kernel level 143 interface [in kern.c],etc. 145 [Figures are present only in the postscript version] 146 Fig. 3 Basic functional flow of the PIM daemon 148 Following is an explanation of these modules, in addition to the 149 data structures. 151 3.1 Data Structures 153 There are two basic data structures, the multicast routing entry 154 (mrtentry) and virtual interface entry (vifentry). These 155 structures are created and modified in the daemon by receiving 156 the PIM control messages. The definitions of the data structures 157 are given in `pim.h' file. 159 3.1.1 Multicast Routing Table 161 The multicast routing entry is shown below : 163 struct mrtentry{ 164 struct srcentry *source; /* source */ 165 struct grpentry *group; /* group */ 166 vifbitmap-t outgoing; /* outgoing vifs to downstream */ 167 vifbitmap-t vifmaskoff; /* deleted vifs */ 168 struct mrtentry *srcnext; /* next entry of same source */ 169 struct mrtentry *srcprev; /* prev entry of same source */ 170 struct mrtentry *grpnext; /* next entry of same group */ 171 struct nbrentry *upstream; /* upstream router, needed as in 172 * RPbit entry, the upstream 173 * router is different than 174 * the source upstream router. 175 */ 176 u-long pktrate-prev;/* packet count of prev check */ 177 u-long idlecnt-prev;/* pkt cnt to check idle states */ 178 u-int data-rate-timer;/* keep track of the data rate */ 179 u-int reg-rate-timer;/* keep track of Register rate at 180 * RP 181 */ 182 u-int reg-cnt; /* keep track of the Register 183 * count at RP 184 */ 185 u-char *timers; /* vif timer list */ 186 u-short timer; /* entry timer */ 187 u-short join-prune-timer; /* periodic join/prune timer */ 188 u-short flags; /* flags */ 189 u-int assert-rpf-timer;/* Assert timer */ 190 u-short registerBitTimer;/* Register-Suppression timer */ 191 }; 193 struct srcentry { 194 u-long source; /* subnet source of multicasts */ 195 vifi-t incoming; /* incoming vif */ 196 struct nbrentry *upstream; /* upstream router */ 197 u-short timer; /* timer for recompute incoming */ 198 u-long metric; /* Unicast Routing Metric for source */ 199 u-long preference; /* The metric preference value */ 200 struct mrtentry *mrtlink; /* link to routing entries */ 201 struct srcentry *next; /* link to next entry */ 202 }; 204 struct grpentry { 205 u-long group; /* subnet group of multicasts */ 206 vifbitmap-t leaves; /* outgoing vif to host */ 207 u-char *timers; /* vif timer list */ 208 struct mrtentry *mrtlink; /* link to routing entries */ 209 struct grpentry *next; /* link to next entry */ 210 struct mrtentry *rp-entry; /* Pointer to the (*,G) entry */ 211 struct rplist *active-rp; /* Pointer to the active RP */ 212 }; 214 The multicast routing table is the collection of all routing 215 entries, which are organized in a linked list in the daemon. 217 The overall structure of the multicast routing table in the 218 daemon is shown in figure 4. 220 [Figures are present only in the postscript version] 221 Fig. 4 The multicast routing table overall structure at user level 223 One of the frequently used fields in the mrtentry is the `flags' 224 field, where the values assigned to that field can be 225 one/combination of the following: 227 #define MRTF-SPT 0x0001 /* shortest path tree bit */ 228 #define MRTF-WC 0x0002 /* wildcard bit */ 229 #define MRTF-RP 0x0004 /* RP bit */ 230 #define MRTF-CRT 0x0008 /* newly created */ 231 #define MRTF-IIF-REGISTER 0x0020 /* iif = reg-vif */ 232 #define MRTF-REGISTER 0x0080 /* oif includes reg-vif */ 233 #define MRTF-KERNEL-CACHE 0x0200 /* kernel cache mirror */ 234 #define MRTF-NULL-OIF 0x0400 /* null oif cache */ 235 #define MRTF-REG-SUPP 0x0800 /* register suppress */ 236 #define MRTF-ASSERTED 0x1000 /* RPF is an assert winner*/ 237 #define MRTF-SG 0x2000 /* pure (S,G) entry */ 239 3.1.2 Virtual Interface List 241 The virtual interface data structure is shown below : 243 struct vifentry { 244 u-short flags; /* VIFF- flags */ 245 u-char threshold; /* min ttl required */ 246 u-long local; /* local interface address */ 247 u-long remote; /* remote address */ 248 u-long subnet; /* subnet number */ 249 u-long subnetmask; /* subnet mask */ 250 u-long broadcast; /* subnet broadcast addr */ 251 char name[IFNAMSIZ]; /* interface name */ 252 u-char timer; /* timer for sending queries */ 253 u-char gq-timer; /* Group Query timer, used by DR*/ 254 struct nbrentry *neighbors; /* list of neighboring routers */ 255 u-int rate-limit; /* max rate */ 256 }; 258 The virtual interface table is the collection of all virtual 259 interface entries. They are organized as an array 260 (viflist[MAXVIFS]; MAXVIFS currently set to 32). 262 In addition to defining `mrtenry', `vifentry' and other data 263 structures, the `pim.h' file also defines all default timer 264 values. 266 3.2 Initialization and Set up 268 Most of the initialization calls and socket setup are handled 269 through main() [in main.c]. Basically, after the alarm and other 270 signals are initialized, PIM and IGMP socket handlers are setup, 271 then the function awaits on a `select' call. The timer interval 272 is set to TIMER-INTERVAL [currently 5 seconds], after which the 273 timer function is invoked, if no packets were detected by 274 `select'. The timer function basically calls the other timing 275 and housekeeping functions; age-vifs() and age-routes(). Also, 276 another alarm is scheduled for the following interval. 278 3.3 PIM and IGMP message handling 280 aragraphPIM All PIM control messages (in PIM-SMv2) have an IP 281 protocol number of IPPROTO-PIM (assigned to 103), and are not 282 part of IGMP like PIMv1 messages. 284 Incoming PIM messages are received on the pim socket, and are 285 dispatched by accept-pim() [in pim.c], according to their type. 287 PIM types are: 289 PIM-HELLO 0 290 PIM-REGISTER 1 291 PIM-REGISTER-STOP 2 292 PIM-JOIN-PRUNE 3 293 PIM-BOOTSTRAP 4 294 PIM-ASSERT 5 295 PIM-GRAFT 6 296 PIM-GRAFT-ACK 7 297 PIM-CANDIDATE-RP-ADVERTISEMENT 8 299 Outgoing PIM messages are sent using send-pim() and send-pim- 300 unicast() [in pim.c]. 302 aragraphIGMP 304 IGMP messages are dispatched using similar machinery to that of 305 PIM, only IGMP messages are received on the igmp socket, 306 dispatched by accept-igmp(), and are sent using send-igmp() [in 307 igmp.c]. 309 3.4 MRT maintenance 311 The functions handling the MRT creation, access, query and 312 update are found in `mrt.c' file. 314 Major functions include route lookups; as in find-route(), 315 find-source(), and find-group(). 317 The hash function and group to RP mapping are also performed by 318 functions in `mrt.c'. 320 3.5 Protocol Specific actions (Routing) 322 File `route.c' contains the protocol specific functions, and 323 processes the incoming/outgoing PIM messages. 325 Functions processing incoming PIM messages include accept-join- 326 prune(), accept-assert(), accept-register(), accept-register- 327 stop()..etc, and other supporting functions. 329 Functions triggering outgoing PIM messages include event-join- 330 prune(), send-register(), trigger-assert(), send-C-RP- 331 Adv()..etc, and supporting functions. 333 In addition, route.c also handles the internal control messages 334 through `process-kernelCall()', which dispatches the internal 335 control messages according to their type. 337 Currently, there are three types of internal control messages: 339 IGMPMSG-NOCACHE 1 /* indicating a cache miss */ 340 IGMPMSG-WRONGVIF 2 /* indicating wrong incoming interface */ 341 IGMPMSG-WHOLEPKT 3 /* indicating whole data packet; used 342 * for registering 343 */ 345 These messages are dispatched to process-cacheMiss(), process- 346 wrongiif(), and process-wholepkt(), respectively. 348 3.6 Timing 350 The clock tick for the timing system is set to TIMER-INTERVAL 351 (currently 5 seconds). That means, that all periodic actions are 352 carried out over a 5 second granularity. 354 On every clock tick, the alarm interrupt calls the timer() 355 function in main.c, which, in turn, calls age-vifs() [in vif.c] 356 and age-routes() [in timer.c]. In this subsection, the functions 357 in `timer.c' are explained. 359 Basically, age-routes() browses through the routing 360 lists/tables, advancing all timers pertinent to the routing 361 tables. Specifically, the group list is traversed, and the leaf 362 membership information is aged by calling timeout-leaf(). Then a 363 for loop browses through the multicast route entries, aging the 364 outgoing interfaces [timeout-outgo()], and various related 365 timers (e.g. RegisterBitTimer, Assert timer..etc) for each 366 entry, as well as checking the rate thresholds for the Registers 367 (in the RP), and data (in the DRs). In addition, garbage 368 collection is performed on the timed out entries in the group 369 list, the source list and the MRT. The multicast forwarding 370 cache (MFC) in the kernel is also timed out, and deleted/updated 371 accordingly. Note that in this model, the MFC is passive, and 372 all timing is done in at user level, then communicated to the 373 kernel, see section 4. The periodic Join/Prune messaging is 374 performed per interface, by calling periodic-vif-join-prune(). 375 Then the RPSetTimer is aged, and periodic C-RP-Adv messages are 376 sent if the router is a Candidate RP. 378 3.7 Virtual Interface List 380 Functions in `vif.c' handle setting up the viflist array in both 381 the user level and the kernel, through `start-vifs()'. Special 382 care is given to the reg-vif; a dummy interface used for 383 encapsulating/decapsulating Registers, see section 7. The reg- 384 vif is installed by add-reg-vif(), and k-add-vif(reg-vif-num) 385 calls. 387 Other, per interface, tasks are carried out by vif.c; like 388 query-groups(), accept-group-report and query-neighbors(), in 389 addition to the periodic age-vifs() timing function. 391 3.8 Configuration 393 pimd looks for the configuration file at ``/etc/pimd.conf''. 394 Configuration parameters are parsed by `config.c'. Current PIM 395 specific configurable parameters include the register/data rate 396 thresholds, after which the RP/DRs switch to the SPT, 397 respectively. A candidate RP can be configured, with optional 398 interface address and C-RP-Adv period, and a candidate bootstrap 399 router can be configured with optional priority. 401 Following is an example `pimd.conf': 403 # Command formats: 404 # 405 # phyint [disable] [threshold ] 406 # candidate-rp [time ] 407 # bootstrap-router [priority ] 408 # switch-register-threshold [count time ] 409 # switch-data-threshold [count time ] 410 # 411 candidate-rp time 60 412 bootstrap-router priority 5 413 switch-register-threshold count 10 time 5 415 3.9 Interfacing to Unicast Routing 417 For proper implementation, PIM requires access to the unicast 418 routing tables. Given a specific destination address, PIM needs 419 at least information about the next hop (or the reverse path 420 forwarding `RPF' neighbor) and the interface (iif) to reach that 421 destination. Other information, are the metric used to reach the 422 destination, and the unicast routing protocol type, if such 423 information is attainable. In this document, only the RPF and 424 iif information are discussed. 426 Two models have been employed to interface to unicast routing in 427 pimd. 429 The first model, which requires `routing socket' support, is 430 implemented on the `IRIX' platform, and other `routing socket' 431 supporting systems. This model requires no kernel modifications 432 to access the unicast routing tables. All the functionality 433 required is provided in `routesock.c'. In this document, we 434 adopt this model. 436 The second model, is implemented on `SunOs' and other platforms 437 not supporting routing sockets, and requires some kernel 438 modifications. In this model, an `ioctl' code is defined (e.g. 439 `SIOCGETRPF'), and a supporting function [e.g. get-rpf()] is 440 added to the multicast code in the kernel, to query the unicast 441 forwarding information base, through `rtalloc()' call. 443 Other models may also be adopted, depending on the operating 444 system capabilities. 446 In any case, the unicast routing information has to be updated 447 periodically to adapt to routing changes and network failures. 449 4 User level - Kernel Interfaces 451 Communication between the user level and the kernel is 452 established in both directions. Messages for kernel 453 initialization and setup, and adding, deleting and updating 454 entries from the viftable or the mfc in the kernel, are 455 triggered by the multicast daemon. While PIM, IGMP, and internal 456 control messages are passed from the kernel to user level. 458 4.1 User level to kernel messages 460 Most user level interfacing to the kernel is done through 461 functions in `kern.c'. Traps used are `setsockopt', 462 `getsockopt', and `ioctl'. Following is a brief description of 463 each: 465 * setsockopt(): used by the daemon to modify and update the 466 kernel environment; including the forwarding cache, the 467 viftable.. etc. 469 Options used with this call are: 471 MRT-INIT initialization 472 MRT-DONE termination 473 MRT-ADD-VIF add a virtual interface 474 MRT-DEL-VIF delete a virtual interface 475 MRT-ADD-MFC add an entry to the multicast forwarding cache 476 MRT-DEL-MFC delete an entry from the multicast forwarding cache 477 MRT-PIM set a pim flag in the kernel [to stub the pim code] 479 * getsockopt(): used to get information from the kernel. 481 Options used with this call are: 483 MRT-VERSION get the version of the multicast kernel 484 MRT-PIM get the pim flag 486 * ioctl(): used by the daemon for 2 way communication with 487 the kernel. 489 Used to get interface information [in config.c and vif.c]. 490 `kern.c' uses `ioctl' with option `SIOCGETSGCNT' to get the 491 cache hit packet count for an (S,G) entry in the kernel. 492 Also, ioctl may be used to get unicast routing information 493 from the kernel using the option `SIOCGETRPF', if such 494 model is used to get unicast routing information, see 495 section 3.9. 497 4.2 Kernel to user level messages 499 The kernel uses two calls to send PIM, IGMP and internal control 500 messages to user level: 502 * raw-input(): used by the kernel to send messages/packets to 503 the raw sockets at user level. 505 Used by both the pim machinery [pim-input()], and igmp 506 machinery [igmp-input()], in the kernel, to pass the 507 messages to the raw socket queue, and in turn to pim and 508 igmp sockets, to which the pim daemon listens. 510 * socket-send(): used by the multicast forwarding machinery 511 to send internal control messages to the daemon. 513 Used by the multicast code in the kernel [in ip-mroute.c], 514 to send internal, multicast specific, control messages: 516 1 ip-mforward(), triggers an `IGMPMSG-NOCACHE' control 517 message, when getting a cache miss. 519 2 ip-mdq(), triggers an `IGMPMSG-WRONGVIF' control 520 message, when failing the RPF check (i.e. getting a 521 wrong iif). 523 3 register-send(), relays `IGMPMSG-WHOLEPKT' messages 524 containing the data packet, when called by ip-mdq() to 525 forward packets to the `reg-vif'. 527 5 IP Multicast Kernel Support 529 The kernel support for IP multicast is mostly provided through 530 `ip-mroute.c,h', providing the structure for the multicast 531 forwarding cache (MFC), the virtual interface table (viftable), 532 and supporting functions. 534 5.1 The Multicast Forwarding Cache 536 The Multicast Forwarding Cache (MFC) entry is defined in `ip- 537 mroute.h', and consists basically of the source address, group 538 address, an incoming interface (iif), and an outgoing interface 539 list (oiflist). Following is the complete definition: 541 struct mfc { 542 struct in-addr mfc-origin; /* ip origin of mcasts */ 543 struct in-addr mfc-mcastgrp; /* multicast group associated*/ 544 vifi-t mfc-parent; /* incoming vif */ 545 u-char mfc-ttls[MAXVIFS]; /* forwarding ttls on vifs */ 546 u-int mfc-pkt-cnt; /* pkt count for src-grp */ 547 u-int mfc-byte-cnt; /* byte count for src-grp */ 548 u-int mfc-wrong-if; /* wrong if for src-grp */ 549 int mfc-expire; /* time to clean entry up */ 550 }; 551 The multicast forwarding cache table (mfctable), is a hash table 552 of mfc entries defined as: 554 struct mbuf *mfctable[MFCTBLSIZ]; 556 where MFCTBLSIZ is 256. 558 In case of hash collisions, a collision chain is constructed. 560 5.2 The Virtual Interface Table 562 The viftable is an array of `vif' structures, defined as 563 follows: 565 struct vif { 566 u-char v-flags; /* VIFF- flags defined above */ 567 u-char v-threshold; /* min ttl required to forward on vif*/ 568 u-int v-rate-limit; /* max rate */ 569 struct tbf *v-tbf; /* token bucket structure at intf. */ 570 struct in-addr v-lcl-addr; /* local interface address */ 571 struct in-addr v-rmt-addr; /* remote address (tunnels only) */ 572 struct ifnet *v-ifp; /* pointer to interface */ 573 u-int v-pkt-in; /* # pkts in on interface */ 574 u-int v-pkt-out; /* # pkts out on interface */ 575 u-int v-bytes-in; /* # bytes in on interface */ 576 u-int v-bytes-out; /* # bytes out on interface */ 577 struct route v-route; /* Cached route if this is a tunnel */ 578 #ifdef RSVP-ISI 579 u-int v-rsvp-on; /* # RSVP listening on this vif */ 580 struct socket *v-rsvpd; /* # RSVPD daemon */ 581 #endif /* RSVP-ISI */ 582 }; 584 One of the frequently used fields is the `v-flags' field, that 585 may take one of the following values: 587 VIFF-TUNNEL 0x1 /* vif represents a tunnel end-point */ 588 VIFF-SRCRT 0x2 /* tunnel uses IP src routing */ 589 VIFF-REGISTER 0x4 /* vif used for register encap/decap */ 591 5.3 Kernel supporting functions 593 The major standard IP multicast supporting functions are: 595 * ip-mrouter-init() 597 Initialize the `ip-mrouter' socket, and the MFC. 599 Called by setsockopt() with option MRT-INIT. 601 * ip-mrouter-done() 603 Disable multicast routing. 605 Called by setsockopt() with option MRT-DONE. 607 * add-vif() 609 Add a new virtual interface to the viftable. 611 Called by setsockopt() with option MRT-ADD-VIF. 613 * del-vif() 615 Delete a virtual interface from the viftable. 617 Called by setsockopt() with option MRT-DEL-VIF. 619 * add-mfc() 621 Add/update an mfc entry to the mfctable. 623 Called by setsockopt() with the option MRT-ADD-MFC. 625 * del-mfc() 627 Delete an mfc entry from the mfctable. 629 Called by setsockopt() with the option MRT-DEL-MFC. 631 * ip-mforward() 633 Receive an IP multicast packet from interface `ifp'. If it 634 matches with a multicast forwarding cache, then pass it to 635 the next packet forwarding routine [ip-mdq()]. Otherwise, 636 if the packet does not match on an entry, then create an 637 'idle' cache entry, enqueue the packet to it, and send the 638 header in an internal control message to the daemon [using 639 socket-send()], indicating a cache miss. 641 * ip-mdq() 642 The multicast packet forwarding routine. An incoming 643 interface check is performed; the iif in the entry is 644 compared to that over which the packet was received. If 645 they match, the packet if forwarded on all vifs according 646 to the ttl array included in the mfc [this basically 647 constitutes the oif list]. Tunnels and Registers are 648 handled by this function, by forwarding to `dummy' vifs. If 649 the iif check does not pass, an internal control message 650 (basically the packet header) is sent to the daemon [using 651 socket-send()], including vif information, and indicating 652 wrong incoming interface. 654 * expire-upcalls() 656 Clean up cache entries if upcalls are not serviced. 658 Called by the Slow Timeout mechanism, every half second. 660 The following functions in the kernel provide support to PIM, 661 and are part of `ip-mroute.c': 663 * register-send() 665 Send the whole packet in an internal control message, 666 indicating a whole packet, for encapsulation at user level. 668 Called by ip-mdq(). 670 * pim-input() 672 The PIM receiving machinery in the kernel. Check the 673 incoming PIM control messages and passes them to the daemon 674 using raw-input(). If the PIM message is a Register 675 message, it is processed; the packet is decapsulated and 676 passed to register-mforward(), and header of the Register 677 message is passed up to the pim socket using raw-input(). 679 Called by ip-intr() based on IPPROTO-PIM. 681 * register-mforward() 683 Forward a packet resulting from register decapsulation. 684 This is performed by looping back a copy of the packet 685 using looutput(), such that the packet is enqueued on the 686 `reg-vif' queue and fed back into the multicast forwarding 687 machinery. 689 Called by pim-input(). 691 6 Appendix I 693 The user level code, kernel patches, and change description, are 694 available in, 696 http://catarina.usc.edu/ahelmy/pimsm-implem/ 698 or through anonymous ftp from, 700 catarina.usc.edu:/pub/ahelmy/pimsm-implem/ 702 7 Appendix II: Register Models 704 The sender model, in PIM-SM, is based on the sender's DR 705 registering to the active RP for the corresponding group. 706 Such process involves encapsulating data packets in PIM- 707 REGISTER messages. Register encapsulation requires 708 information about the RP, and is done at the user level 709 daemon. Added functionality to the kernel, is necessary to 710 pull up the data packet to user level for encapsulation. 712 Register decapsulation (at the RP), on the other hand, is 713 performed in the kernel, as the decapsulated packet has the 714 original source in the IP header, and most operating 715 systems do not allow such packet to be forwarded from user 716 level carrying a non-local address (spoofing). 718 The kernel is modified to have a pim-input() machinery to 719 receive PIM packets. If the PIM type is REGISTER, the 720 packet is decapsulated. The decapsulated packet is then 721 looped back and treated as a normal multicast packet. 723 The two models discussed above are further detailed in this 724 section. 726 7.1 Register Encapsulation 728 Upon receiving a multicast data packet from a directly 729 connected source, a DR [initially having no (S,G) cache] 730 looks up the entry in the kernel cache. When the look-up 731 machinery gets a cache miss, the following actions take 732 place (see figure 5): 734 [Figures are present only in the postscript version] 735 Fig. 5 At the DR: Creating (S,G) entries for local senders and 737 1 an idle (S,G) cache entry is created in the kernel, with 738 oif = null, 740 2 the data packet is enqueued to this idle entry [a threshold 741 of 4 packets queue length is currently enforced], 743 3 an expiry timer is started for the idle queue, and 745 4 an internal control packet is sent on the socket queue 746 using socket-send(), containing the packet header and 747 information about the incoming interface, and the cache 748 miss code. 750 [Note that the above procedures occur for the first packet 751 only, when the cache is cold. Further packets will be either 752 enqueued (if the cache is idle and the queue is not full), 753 dropped (if the cache is idle and the queue is full), or 754 forwarded (if the cache entry is active).] 756 At user space, the igmp processing machinery receives this 757 packet, the internal control protocol is identified and the 758 message is passed to the proper function to handle the 759 kernelCalls [process-kernelCall()]. 761 The cache miss code is checked, then the router checks to see: 763 1 if the sender of the packet is a directly connected source, 764 and 766 2 if the router is the DR on the receiving interface. 768 If the check does not pass, no action pertaining to Registers is 769 taken [*] If the daemon does not activate the idle kernel 770 cache, the cache eventually times out, and the enqueued packets 771 are dropped. 773 If the check passes, the daemon creates an (S,G) entry with the 774 REGISTER bit set in the `flags' field, the iif set the interface 775 on which the packet was received, and the reg-vif included in 776 the oiflist, in addition to any other oifs copied from wild card 777 entries according to the PIM spec. `reg-vif' is an added `dummy 778 interface' for use in the register models. Further, the daemon 779 installs this entry in the kernel cache; using setsockopt() with 780 the `ADD-MFC' option. 782 This triggers the add-mfc() function in the kernel, which in 783 turn calls the forwarding machinery [ip-mdq()]. The forwarding 784 machinery iterates on the virtual interfaces, and if the vif is 785 the reg-vif, then the register-send() function is called. The 786 latter function, is the added function for encapsulation 787 support, which sends the enqueued packets as WHOLE-PKTs (in an 788 internal control message) to the user level using socket-send(). 790 The message flows through the igmp, and process-kernelCall 791 machineries, then the [(S,G) REGISTER] is matched, and the 792 packet is encapsulated and unicast to the active RP of the 793 corresponding group. 795 Subsequent packets, match on (S,G) (with oif=reg-vif) in the 796 kernel, and get sent to user space directly using register- 797 send(). 799 7.2 Register Decapsulation 801 At the RP, the unicast Registers, by virtue of being PIM 802 messages, are received by the pim machinery in the kernel [pim- 803 _________________________ 804 [*] Other checks are performed according to the longest 805 match rules in the PIM spec. Optionally, if no entry is 806 matched, a kernel cache with oif = null may be in- 807 stalled, to avoid further cache misses on the same en- 808 try. 810 input()]. The PIM type is checked. If REGISTER, pim-input() 811 checks the null register bit, if not set, the packet is passed 812 to register-mforward(), which loops it back on the `reg-vif' 813 queue using looutput(). In any case, the header of the Register 814 (containing the original IP header, the PIM message header and 815 the IP header of the inner encapsulated packet) is passed to 816 raw-input(), and in turn to pim socket, to which the PIM daemon 817 listens (see figure 6). 819 [Figures are present only in the postscript version] 820 Fig. 6 At the RP, receiving Registers, decapsulating and forwarding 822 At the PIM daemon, the message is processed by the pim machinery 823 [accept-pim()]. REGISTER type directs the message to the 824 accept-register() function. The Register message is parsed, and 825 processed according to PIM-SM rules given in the spec. 827 If the Register is to be forwarded, the daemon performs the 828 following: 830 1 creates (S,G) entry, with iif=reg-vif, and the oiflist is 831 copied from wild card entries [the data packets are to be 832 forwarded down the unpruned shared tree, according to PIM- 833 SM longest match rules], and 835 2 installs the entry in the kernel cache, using 836 setsockopt(ADDMFC) 838 At the same time, the decapsulated packet enqueued at the reg- 839 vif queue is fed into ip-intr() [the IP interrupt service 840 routine], and passed to ip-mforward() as a native multicast data 841 packet. A cache lookup is performed on the decapsulated packet. 842 If the cache hits and the iif matches (i.e. cache iif = reg- 843 vif), the packet is forwarded according to the installed 844 oiflist. Otherwise, a cache miss internal control message is 845 sent to user level, and processed accordingly. 847 Note that, a race condition may occur, where the decapsulated 848 packet reaches ip-mforward() before the daemon installs the 849 kernel cache. This case is handled in process-cacheMiss(), in 850 conformance with the PIM spec, and the packet is forwarded 851 accordingly. 853 8 Acknowledgments 855 Special thanks to Deborah Estrin (USC/ISI), Van Jacobson (LBL), 856 Bill Fenner, Stephen Deering (Xerox PARC), Dino Farinacci (Cisco 857 Systems) and David Thaler (UMich), for providing comments and 858 hints for the implementation. An earlier version of PIM version 859 1.0, was written by Charley Liu and Puneet Sharma at USC. 861 A. Helmy did much of this work as a summer intern at Silicon 862 Graphics Inc. 864 PIM was supported by grants from the National Science Foundation 865 and Sun Microsystems. 867 References 869 [1] D. Estrin, D. Farinacci, A. Helmy, D. Thaler, 870 S. Deering, M. Handley, V. Jacobson, C. Liu, P. Sharma, L. Wei. 871 Protocol Independent Multicast-Sparse Mode (PIM-SM): Motivation and 872 Architecture. 873 Experimental RFC, Dec 1996. 875 [2] S. Deering, D. Estrin, D. Farinacci, V. Jacobson, C. Liu, L. Wei, P. 876 Sharma, and A. Helmy. 877 Protocol Independent Multicast (PIM): Specification. 878 Internet Draft, June 95. 880 [3] D. Estrin, D. Farinacci, A. Helmy, D. Thaler, 881 S. Deering, M. Handley, V. Jacobson, C. Liu, P. Sharma, L. Wei. 882 Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol Specification 883 Experimental RFC, Dec 1996. 885 Address of Author: 887 Ahmed Helmy 888 Computer Science Dept/ISI 889 University of Southern Calif. 890 Los Angeles, CA 90089 891 ahelmy@usc.edu