idnits 2.17.1 draft-ietf-forces-sctptml-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC3654], [RFC4960], [FE-PROTO]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 29, 2009) is 5323 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 3768 (Obsoleted by RFC 5798) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Hadi Salim 3 Internet-Draft Mojatatu Networks 4 Intended status: Standards Track K. Ogawa 5 Expires: April 2, 2010 NTT Corporation 6 September 29, 2009 8 SCTP based TML (Transport Mapping Layer) for ForCES protocol 9 draft-ietf-forces-sctptml-06 11 Status of this Memo 13 This Internet-Draft is submitted to IETF in full conformance with the 14 provisions of BCP 78 and BCP 79. This document may contain material 15 from IETF Documents or IETF Contributions published or made publicly 16 available before November 10, 2008. The person(s) controlling the 17 copyright in some of this material may not have granted the IETF 18 Trust the right to allow modifications of such material outside the 19 IETF Standards Process. Without obtaining an adequate license from 20 the person(s) controlling the copyright in such materials, this 21 document may not be modified outside the IETF Standards Process, and 22 derivative works of it may not be created outside the IETF Standards 23 Process, except to format it for publication as an RFC or to 24 translate it into languages other than English. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF), its areas, and its working groups. Note that 28 other groups may also distribute working documents as Internet- 29 Drafts. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/ietf/1id-abstracts.txt. 39 The list of Internet-Draft Shadow Directories can be accessed at 40 http://www.ietf.org/shadow.html. 42 This Internet-Draft will expire on April 2, 2010. 44 Copyright Notice 46 Copyright (c) 2009 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents in effect on the date of 51 publication of this document (http://trustee.ietf.org/license-info). 52 Please review these documents carefully, as they describe your rights 53 and restrictions with respect to this document. 55 Abstract 57 This document defines the SCTP based TML (Transport Mapping Layer) 58 for the ForCES protocol. It explains the rationale for choosing the 59 SCTP (Stream Control Transmission Protocol) [RFC4960] and also 60 describes how this TML addresses all the requirements described in 61 [RFC3654] and the ForCES protocol [FE-PROTO] draft. 63 Table of Contents 65 1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 66 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 67 3. Protocol Framework Overview . . . . . . . . . . . . . . . . . 4 68 3.1. The PL . . . . . . . . . . . . . . . . . . . . . . . . . . 5 69 3.2. The TML . . . . . . . . . . . . . . . . . . . . . . . . . 6 70 3.2.1. TML and PL Interfaces . . . . . . . . . . . . . . . . 6 71 3.2.2. TML Parameterization . . . . . . . . . . . . . . . . . 7 72 4. SCTP TML overview . . . . . . . . . . . . . . . . . . . . . . 7 73 4.1. Rationale for using SCTP for TML . . . . . . . . . . . . . 9 74 4.2. Meeting TML requirements . . . . . . . . . . . . . . . . . 10 75 4.2.1. SCTP TML Channels . . . . . . . . . . . . . . . . . . 11 76 4.2.2. Satisfying TML Requirements . . . . . . . . . . . . . 16 77 5. SCTP TML Channel Work . . . . . . . . . . . . . . . . . . . . 18 78 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 79 7. Security Considerations . . . . . . . . . . . . . . . . . . . 18 80 7.1. IPsec Usage . . . . . . . . . . . . . . . . . . . . . . . 19 81 7.1.1. SAD and SPD setup . . . . . . . . . . . . . . . . . . 20 82 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 20 83 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 84 9.1. Normative References . . . . . . . . . . . . . . . . . . . 20 85 9.2. Informative References . . . . . . . . . . . . . . . . . . 21 86 Appendix A. Suggested SCTP TML Channel Work Implementation . . . 21 87 A.1. SCTP TML Channel Initialization . . . . . . . . . . . . . 22 88 A.2. Channel work scheduling . . . . . . . . . . . . . . . . . 22 89 A.2.1. FE Channel work scheduling . . . . . . . . . . . . . . 22 90 A.2.2. CE Channel work scheduling . . . . . . . . . . . . . . 23 91 A.3. SCTP TML Channel Termination . . . . . . . . . . . . . . . 23 92 A.4. SCTP TML NE level channel scheduling . . . . . . . . . . . 24 93 Appendix B. Suggested Service Interface . . . . . . . . . . . . . 24 94 B.1. TML Boot-strapping . . . . . . . . . . . . . . . . . . . . 25 95 B.2. TML Shutdown . . . . . . . . . . . . . . . . . . . . . . . 26 96 B.3. TML Sending and Receiving . . . . . . . . . . . . . . . . 27 97 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 28 99 1. Definitions 101 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 102 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 103 document are to be interpreted as described in RFC 2119. 105 The following definitions are taken from [RFC3654]and [RFC3746]: 107 ForCES Protocol -- The protocol used at the Fp reference point in the 108 ForCES Framework in [RFC3746]. 110 ForCES Protocol Layer (ForCES PL) -- A layer in ForCES protocol 111 architecture that defines the ForCES protocol architecture and the 112 state transfer mechanisms as defined in [FE-PROTO]. 114 ForCES Protocol Transport Mapping Layer (ForCES TML) -- A layer in 115 ForCES protocol architecture that specifically addresses the protocol 116 message transportation issues, such as how the protocol messages are 117 mapped to different transport media (like SCTP, IP, ATM, Ethernet, 118 etc), and how to achieve and implement reliability, security, etc. 120 2. Introduction 122 The ForCES (Forwarding and Control Element Separation) working group 123 in the IETF defines the architecture and protocol for separation of 124 Control Elements(CE) and Forwarding Elements(FE) in Network 125 Elements(NE) such as routers. [RFC3654] and [RFC3746] respectively 126 define architectural and protocol requirements for the communication 127 between CE and FE. The ForCES protocol layer specification 128 [FE-PROTO] describes the protocol semantics and workings. The ForCES 129 protocol layer operates on top of an inter-connect hiding layer known 130 as the TML. The relationship is illustrated in Figure 1. 132 This document defines the SCTP based TML for the ForCES protocol 133 layer. It also addresses all the requirements for the TML including 134 security, reliability, etc as defined in [FE-PROTO]. 136 3. Protocol Framework Overview 138 The reader is referred to the Framework document [RFC3746], and in 139 particular sections 3 and 4, for an architectural overview and 140 explanation of where and how the ForCES protocol fits in. 142 There is some content overlap between the ForCES protocol draft 143 [FE-PROTO] and this section (Section 3) in order to provide basic 144 context to the reader of this document. 146 The ForCES protocol layering constitutes two pieces: the PL and TML 147 layer. This is depicted in Figure 1. 149 +----------------------------------------------+ 150 | CE PL | 151 +----------------------------------------------+ 152 | CE TML | 153 +----------------------------------------------+ 154 ^ 155 | 156 ForCES PL |messages 157 | 158 v 159 +-----------------------------------------------+ 160 | FE TML | 161 +-----------------------------------------------+ 162 | FE PL | 163 +-----------------------------------------------+ 165 Figure 1: Message exchange between CE and FE to establish an NE 166 association 168 The PL is in charge of the ForCES protocol. Its semantics and 169 message layout are defined in [FE-PROTO]. The TML is necessary to 170 connect two ForCES end-points as shown in Figure 1. 172 Both the PL and TML are standardized by the IETF. While only one PL 173 is defined, different TMLs are expected to be standardized. The TML 174 at each of the nodes (CE and FE) is expected to be of the same 175 definition in order to inter-operate. 177 When transmitting from a ForCES end-point, the PL delivers its 178 messages to the TML. The TML then delivers the PL message to the 179 destination TML(s). 181 On reception of a message, the TML delivers the message to its 182 destination PL level (as described in the ForCES header). 184 3.1. The PL 186 The PL is common to all implementations of ForCES and is standardized 187 by the IETF [FE-PROTO]. The PL level is responsible for associating 188 an FE or CE to an NE. It is also responsible for tearing down such 189 associations. 191 An FE may use the PL level to asynchronously send packets to the CE. 192 The FE may redirect via the PL (from outside the NE) various control 193 protocol packets (e.g. OSPF, etc) to the CE. Additionally, the FE 194 delivers various events that CE has subscribed-to via PL [FE-MODEL]. 196 The CE and FE may interact synchronously via the PL. The CE issues 197 status requests to the FE and receives responses via the PL. The CE 198 also configures the associated FE's LFBs' components using the PL 199 [FE-MODEL]. 201 3.2. The TML 203 The TML level is responsible for transport of the PL level messages. 204 [FE-PROTO] section 5 defines the requirements that need to be met by 205 a TML specification. The SCTP TML specified in this document meets 206 all the requirements specified in [FE-PROTO] section 5. 207 Section 4.2.2 describes how the TML requirements are met. 209 3.2.1. TML and PL Interfaces 211 There are two interfaces to the PL and TML, both of which are out of 212 scope for ForCES. The first one is the interface between the PL and 213 TML and the other is the CE Manager (CEM)/FE Manager (FEM)[RFC3746] 214 interface to both the PL and TML. Both interfaces are shown in 215 Figure 2. 217 +----------------------------+ 218 | +----------------------+ | 219 | | | | 220 +---------+ | | PL Layer | | 221 | | | +----------------------+ | 222 |FEM/CEM |<---->| ^ | 223 | | | | | 224 +---------+ | |TML API | 225 | | | 226 | V | 227 | +----------------------+ | 228 | | | | 229 | | TML Layer | | 230 | | | | 231 | +----------------------+ | 232 +----------------------------+ 233 Figure 2: The TML-PL interface 235 Figure 2 also shows an interface referred to as CEM/FEM[RFC3746] 236 which is responsible for bootstrapping and parameterization of the 237 TML. In its most basic form the CEM/FEM interface takes the form of 238 a simple static config file which is read on startup in the pre- 239 association phase. 241 Appendix B discusses in more details the service interfaces. 243 3.2.2. TML Parameterization 245 It is expected that it should be possible to use a configuration 246 reference point, such as the FEM or the CEM, to configure the TML. 248 Some of the configured parameters may include: 250 o PL ID 252 o Connection Type and associated data. For example if a TML uses 253 IP/SCTP then parameters such as SCTP ports and IP addresses need 254 to be configured. 256 o Number of transport connections 258 o Connection Capability, such as bandwidth, etc. 260 o Allowed/Supported Connection QoS policy (or Congestion Control 261 Policy) 263 4. SCTP TML overview 265 SCTP [RFC4960] is an end-to-end transport protocol that is equivalent 266 to TCP, UDP, or DCCP in many aspects. With a few exceptions, SCTP 267 can do most of what UDP, TCP, or DCCP can achieve. SCTP as well can 268 do most of what a combination of the other transport protocols can 269 achieve (e.g. TCP and DCCP or TCP and UDP). 271 Like TCP, it provides ordered, reliable, connection-oriented, flow- 272 controlled, congestion controlled data exchange. Unlike TCP, it does 273 not provide byte streaming and instead provides message boundaries. 275 Like UDP, it can provide unreliable, unordered data exchange. Unlike 276 UDP, it does not provide multicast support 278 Like DCCP, it can provide unreliable, ordered, congestion controlled, 279 connection-oriented data exchange. 281 SCTP also provides other services that none of the 3 transport 282 protocols mentioned above provide. These include: 284 o Multi-homing 285 An SCTP connection can make use of multiple destination IP 286 addresses to communicate with its peer. 288 o Runtime IP address binding 289 With the SCTP Dynamic Address Reconfiguration ([RFC5061]) feature, 290 a new IP address can be bound at runtime. This allows for 291 migration of endpoints without restarting the association 292 (valuable for high availability). 294 o A range of reliability shades with congestion control 295 SCTP offers a range of services from full reliability to none, and 296 from full ordering to none. With SCTP, on a per message basis, 297 the application can specify a message's time-to-live. When the 298 expressed time expires, the message can be "skipped". 300 o Built-in heartbeats 301 SCTP has built-in heartbeat mechanism that validate the 302 reachability of peer addresses. 304 o Multi-streaming 305 A known problem with TCP is head of line (HOL) blocking. If you 306 have independent messages, TCP enforces ordering of such messages. 307 Loss at the head of the messages implies delays of delivery of 308 subsequent packets. SCTP allows for defining up to 64K 309 independent streams over the same socket connection, which are 310 ordered independently. 312 o Message boundaries with reliability 313 SCTP allows for easier message parsing (just like UDP but with 314 reliability built in) because it establishes boundaries on a PL 315 message basis. On a TCP stream, one would have to use techniques 316 such peeking into the message to figure the boundaries. 318 o Improved SYN DOS protection 319 Unlike TCP, which does a 3 way connection setup handshake, SCTP 320 does a 4 way handshake. This improves against SYN-flood attacks 321 because listening sockets do not set up state until a connection 322 is validated. 324 o Simpler transport events 325 An application (such as the TML) can subscribe to be notified of 326 both local and remote transport events. Events that can be 327 subscribed-to include indication of association changes, 328 addressing changes, remote errors, expiry of timed messages, etc. 330 These events are off by default and require explicit subscription. 332 o Simplified replicasting 333 Although SCTP does not allow for multicasting it allows for a 334 single message from an application to be sent to multiple peers. 335 This reduces the messaging that typically crosses different memory 336 domains within a host (example in a kernel to user space domain of 337 an operating system). 339 4.1. Rationale for using SCTP for TML 341 SCTP has all the features required to provide a robust TML. As a 342 transport that is all-encompassing, it negates the need for having 343 multiple transport protocols in order to satisfy the TML requirements 344 ([FE-PROTO] section 5). As a result it allows for simpler coding and 345 therefore reduces a lot of the interoperability concerns. 347 SCTP is also very mature and widely used making it a good choice for 348 ubiquitous deployment. 350 4.2. Meeting TML requirements 352 PL 353 +----------------------+ 354 | | 355 +-----------+----------+ 356 | TML API 357 TML | 358 +-----------+----------+ 359 | | | 360 | +------+------+ | 361 | | TML core | | 362 | +-+----+----+-+ | 363 | | | | | 364 | SCTP socket API | 365 | | | | | 366 | | | | | 367 | +-+----+----+-+ | 368 | | SCTP | | 369 | +------+------+ | 370 | | | 371 | | | 372 | +------+------+ | 373 | | IP | | 374 | +-------------+ | 375 +----------------------+ 377 Figure 3: The TML-SCTP interface 379 Figure 3 details the interfacing between the PL and SCTP TML and the 380 internals of the SCTP TML. The core of the TML interacts on its 381 north-bound interface to the PL (utilizing the TML API). On the 382 south-bound interface, the TML core interfaces to the SCTP layer 383 utilizing the standard socket interface[SCTP-API] There are three 384 SCTP socket connections opened between any two PL endpoints (whether 385 FE or CE). 387 4.2.1. SCTP TML Channels 389 +--------------------+ 390 | | 391 | TML core | 392 | | 393 +-+-------+--------+-+ 394 | | | 395 | Med prio, | 396 | Semi-reliable | 397 | channel | 398 | | Low prio, 399 | | Unreliable 400 | | channel 401 | | | 402 ^ ^ ^ 403 | | | 404 Y Y Y 405 High prio,| | | 406 reliable | | | 407 channel | | | 408 Y Y Y 409 +-+--------+--------+-+ 410 | | 411 | SCTP | 412 | | 413 +---------------------+ 415 Figure 4: The TML-SCTP channels 417 Figure 4 details further the interfacing between the TML core and 418 SCTP layers. There are 3 channels used to separate and prioritize 419 the different types of ForCES traffic. Each channel constitutes a 420 socket interface. It should be noted that all SCTP channels are 421 congestion aware (and for that reason that detail is left out of the 422 description of the 3 channels). SCTP port 6700, 6701, 6702 are used 423 for the higher, medium and lower priority channels respectively. 424 SCTP Payload Protocol ID (PPID) values of 21, 22, and 23 are used for 425 the higher, medium and lower priority channels respectively. 427 4.2.1.1. Justifying Choice of 3 Sockets 429 SCTP allows up to 64K streams to be sent over a single socket 430 interface. The authors initially envisioned using a single socket 431 for all three channels (mapping a channel to an SCTP stream). This 432 simplifies programming of the TML as well as conserves use of SCTP 433 ports. 435 Further analysis revealed head of line blocking issues with this 436 initial approach. Lower priority packets not needing reliable 437 delivery could block higher priority packets (needing reliable 438 delivery) under congestion situation for an indeterminate period of 439 time (depending on how many outstanding lower priority packets are 440 pending). For this reason, we elected to go with mapping each of the 441 three channels to a different SCTP socket (instead of a different 442 stream within a single socket). 444 4.2.1.2. Higher Priority, Reliable channel 446 The higher priority (HP) channel uses a standard SCTP reliable socket 447 on port 6700. SCTP PPID 21 is used for all messages on the HP 448 channel. The HP channel is used for CE solicited messages and their 449 responses: 451 1. ForCES configuration messages flowing from CE to FE and responses 452 from the FE to CE. 454 2. ForCES query messages flowing from CE to FE and responses from 455 the FE to the CE. 457 PL priorities 4-7 MUST be used for all PL messages using this 458 channel. The following PL messages MUST use the HP channel for 459 transport: 461 o Association Setup (default priority: 7) 463 o Association Setup Response (default priority: 7) 465 o Association Teardown (default priority: 7) 467 o Config (default priority: 4) 469 o Config Response (default priority: 4) 471 o Query (default priority: 4) 473 o Query Response (default priority: 4) 475 Although an implementation may choose different values from the 476 defined range (4-7), it is strongly recommended that default 477 priorities be used. A response to a ForCES message MUST contain the 478 same priority as the request. Example, a config sent by the CE with 479 priority 5 MUST have a config-response from the FE with priority 5. 481 4.2.1.3. Medium Priority, Semi-Reliable channel 483 The medium priority (MP) channel uses SCTP-PR on port 6701. SCTP 484 PPID 22 MUST be used for all messages on the MP channel. Time limits 485 on how long a message is valid are set on each outgoing message. 486 This channel is used for events from the FE to the CE that are 487 obsoleted over time. Events that are accumulative in nature and are 488 recoverable by the CE (by issuing a query to the FE) can tolerate 489 lost events and therefore should use this channel. For example, a 490 generated event which carries the value of a counter that is 491 monotonically incrementing fits to use this channel. 493 PL priority 3 MUST be used for PL messages on this channel. The 494 following PL messages MUST use the MP channel for transport: 496 o Event Notification (default priority: 3) 498 4.2.1.4. Lower Priority, Unreliable channel 500 The lower priority (LP) channel uses SCTP port 6702. SCTP PPID 23 is 501 used for all messages on the LP channel. The LP channel also MUST 502 use SCTP-PR with lower timeout values than the MP channel. The 503 reason an unreliable channel is used for redirect messages is to 504 allow the control protocol at both the CE and its peer-endpoint to 505 take charge of how the end-to-end semantics of the said control 506 protocol's operations. For example: 508 1. Some control protocols are reliable in nature, therefore making 509 this channel reliable introduces an extra layer of reliability 510 which could be harmful. So any end-to-end retransmits will 511 happen from remote. 513 2. Some control protocols may desire to have obsolescence of 514 messages over retransmissions; making this channel reliable 515 contradicts that desire. 517 Given ForCES PL level heartbeats are traffic sensitive, sending them 518 over the LP channel also makes sense. If the other end is not 519 processing other channels it will eventually get heartbeats; and if 520 it is busy processing other channels heartbeats will be obsoleted 521 locally over time (and it does not matter if they did not make it). 523 PL priorities 1-2 MUST be used for PL messages on this channel. PL 524 messages that MUST use the MP channel for transport are: 526 o Packet Redirect (default priority: 2) 527 o Heartbeats (default priority: 1) 529 4.2.1.5. Scheduling of The 3 Channels 531 Strict priority work-conserving scheduling is used to process both on 532 sending and receiving (of the PL messages) by the TML Core as shown 533 in Figure 5. 535 This means that the HP messages are always processed first until 536 there are no more left. The LP channel is processed only if a 537 channel that is higher priority than itself has no more messages left 538 to process. This means that under congestion situation, a higher 539 priority channel with sufficient messages that occupy the available 540 bandwidth would starve lower priority channel(s). 542 The design intent of the SCTP TML is to tie prioritization as 543 described in Section 4.2.1.1 and transport congestion control to 544 provide implicit node congestion control. This is further detailed 545 in Appendix A.2. 547 SCTP channel +----------+ 548 Work available | DONE +---<--<--+ 549 | +---+------+ | 550 Y ^ 551 | +-->--+ +-->---+ | 552 +-->-->-+ | | | | | 553 | | | | | | ^ 554 | ^ ^ Y ^ Y | 555 ^ / \ | | | | | 556 | / \ | ^ | ^ ^ 557 | / Is \ | / \ | / \ | 558 | / there \ | /Is \ | /Is \ | 559 ^ / HP work \ ^ /there\ ^ /there\ ^ 560 | \ ? / | /MP work\ | /LP work\ | 561 | \ / | \ ? / | \ ? / | 562 | \ / | \ / | \ / ^ 563 | \ / ^ \ / ^ \ / | 564 | \ / | \ / | \ / | 565 ^ Y-->-->-->+ Y-->-->-->+ Y->->->-+ 566 | | NO | NO | NO 567 | | | | 568 | Y Y Y 569 | | YES | YES | YES 570 ^ | | | 571 | Y Y Y 572 | +----+------+ +---|-------+ +----|------+ 573 | |- process | |- process | |- process | 574 | | HP work | | MP work | | LP work | 575 | +------+----+ +-----+-----+ +-----+-----+ 576 | | | | 577 ^ Y Y Y 578 | | | | 579 | Y Y Y 580 +--<--<---+--<--<----<----+-----<---<-----+ 582 Figure 5: SCTP TML Strict Priority Scheduling 584 4.2.1.6. SCTP TML Parameterization 586 The following is a list of parameters needed for booting the TML. It 587 is expected these parameters will be extracted via the FEM/CEM 588 interface for each PL ID. 590 1. The IP address(es) or a resolvable DNS/hostname(s) of the CE/FE. 592 2. Whether to use IPsec or not. If IPsec is used, how to 593 parameterize the different required ciphers, keys etc as 594 described in Section 7.1 596 3. The HP SCTP port, as discussed in Section 4.2.1.2. The default 597 HP port value is 6700 (Section 6). 599 4. The MP SCTP port, as discussed in Section 4.2.1.3. The default 600 MP port value is 6701 (Section 6). 602 5. The LP SCTP port, as discussed in Section 4.2.1.4. The default 603 LP port value is 6702 (Section 6). 605 4.2.2. Satisfying TML Requirements 607 [FE-PROTO] section 5 lists requirements that a TML needs to meet. 608 This section describes how the SCTP TML satisfies those requirements. 610 4.2.2.1. Satisfying Reliability Requirement 612 As mentioned earlier, a shade of reliability ranges is possible in 613 SCTP. Therefore this requirement is met. 615 4.2.2.2. Satisfying Congestion Control Requirement 617 Congestion control is built into SCTP. Therefore, this requirement 618 is met. 620 4.2.2.3. Satisfying Timeliness and Prioritization Requirement 622 By using 3 sockets in conjunction with the partial-reliability 623 feature, both timeliness and prioritization can be achieved. 625 4.2.2.4. Satisfying Addressing Requirement 627 There are no extra headers required for SCTP to fulfil this 628 requirement. SCTP can be told to replicast packets to multiple 629 destinations. The TML implementation will need to translate PL level 630 addresses, to a variety of unicast IP addresses in order to emulate 631 multicast and broadcast PL addresses. 633 4.2.2.5. Satisfying HA Requirement 635 Transport link resiliency is one of SCTP's strongest point. Failure 636 detection and recovery is built in, as mentioned earlier. 638 o The SCTP multi-homing feature is used to provide path diversity. 639 Should one of the peer IP addresses become unreachable, the 640 other(s) are used without needing lower layer convergence 641 (routing, for example) or even the TML becoming aware. 643 o SCTP heartbeats and data transmission thresholds are used on a per 644 peer IP address to detect reachability faults. The faults could 645 be a result of an unreachable address or peer, which may be caused 646 by a variety of reasons, like interface, network, or endpoint 647 failures. The cause of the fault is noted. 649 o With the ADDIP feature, one can migrate IP addresses to other 650 nodes at runtime. This is not unlike the VRRP[RFC3768] protocol 651 use. This feature is used in addition to multi-homing in a 652 planned migration of activity from one FE/CE to another. In such 653 a case, part of the provisioning recipe at the CE for replacing an 654 FE involves migrating activity of one FE to another. 656 4.2.2.6. Satisfying Node Overload Prevention Requirement 658 The architecture of this TML defines three separate channels, one per 659 socket, to be used within any FE-CE setup. The scheduling design for 660 processing the TML channels (Section 4.2.1.5) is strict priority. A 661 fundamental desire of the strict prioritization is to ensure that 662 more important work always gets node resources such as CPU and 663 bandwidth over lesser important work. 665 When a ForCES node CPU is overwhelmed because the incoming packet 666 rate is higher than it can keep up with, the channel queues grow and 667 transport congestion subsequently follows. By virtue of using SCTP, 668 the congestion is propagated back to the source of the incoming 669 packets and eventually alleviated. 671 The HP channel work gets prioritized at the expense of the MP which 672 gets prioritized over LP channels. The preferential scheduling only 673 kicks in when there is node overload regardless of whether there is 674 transport congestion. As a result of the preferential work 675 treatment, the ForCES node achieves a robust steady processing 676 capacity. Refer to Appendix A.2 for details on scheduling. 678 For an example of how the overload prevention works: consider a 679 scenario where an overwhelming amount redirected packets (from 680 outside the NE) coming into the NE may overload the FE while it has 681 outstanding config work from the CE. In such a case, the FE, while 682 it is busy processing config requests from the CE essentially ignores 683 processing the redirect packets on the LP channel. If enough 684 redirect packets accumulate, they are dropped either because the LP 685 channel threshold is exceeded or because they are obsoleted. If on 686 the other hand, the FE has successfully processed the higher priority 687 channels and their related work, then it can proceed and process the 688 LP channel. So as demonstrated in this case, the TML ties transport 689 and node overload implicitly together. 691 4.2.2.7. Satisfying Encapsulation Requirement 693 The SCTP TML sets SCTP PPIDs to identify channels used as described 694 in Section 4.2.1.1. 696 5. SCTP TML Channel Work 698 There are two levels of TML channel work within an NE when a ForCES 699 node (CE or FE) is connected to multiple other ForCES nodes: 701 1. NE-level I/O work where a ForCES node (CE or FE) needs to choose 702 which of the peer nodes to process. 704 2. Node-level I/O work where a ForCES node, handles the three SCTP 705 TML channels separately for each single ForCES endpoint. 707 NE-level scheduling definition is left up to the implementation and 708 is considered out of scope for this document. Appendix A.4 discuss 709 briefly some constraints that an implementor needs to worry about. 711 This document provides suggestions on SCTP channel work 712 implementation in Appendix A. 714 The FE SHOULD do channel connections to the CE in the order of 715 incrementing priorities i.e. LP socket first, followed by MP and 716 ending with HP socket connection. The CE, however, MUST NOT assume 717 that there is ordering of socket connections from any FE. 719 6. IANA Considerations 721 This document makes request of IANA to reserve SCTP ports 6700, 6701, 722 and 6702. This document also requests for SCTP PPID 21, 22, and 23. 724 7. Security Considerations 726 The SCTP TML provides the following security services to the PL 727 level: 729 o A mechanism to authenticate ForCES CEs and FEs at transport level 730 in order to prevent the participation of unauthorized CEs and 731 unauthorized FEs in the control and data path processing of a 732 ForCES NE. 734 o A mechanism to ensure message authentication of PL data and 735 headers transferred from the CE to FE (and vice-versa) in order to 736 prevent the injection of incorrect data into PL messages. 738 o A mechanism to ensure the confidentiality of PL data and headers 739 transferred from the CE to FE (and vice-versa), in order to 740 prevent disclosure of PL level information transported via the 741 TML. 743 Security choices provided by the TML are made by the operator and 744 take effect during the pre-association phase of the ForCES protocol. 745 An operator may choose to use all, some or none of the security 746 services provided by the TML in a CE-FE connection. 748 When operating under a secured environment, or for other operational 749 concerns (in some cases performance issues) the operator may turn off 750 all the security functions between CE and FE. 752 IP Security Protocol (IPsec) [RFC4301] is used to provide needed 753 security mechanisms. 755 IPsec is an IP level security scheme transparent to the higher-layer 756 applications and therefore can provide security for any transport 757 layer protocol. This gives IPsec the advantage that it can be used 758 to secure everything between the CE and FE without expecting the TML 759 implementation to be aware of the details. 761 The IPsec architecture is designed to provide message integrity and 762 message confidentiality outlined in the TML security requirements 763 ([FE-PROTO]). Mutual authentication and key exchange protocol are 764 provided by Internet Key Exchange (IKE)[RFC4109]. 766 7.1. IPsec Usage 768 A ForCES FE or CE MUST support the following: 770 o Internet Key Exchange (IKE)[RFC4109] with certificates for 771 endpoint authentication. 773 o Transport Mode Encapsulating Security Payload (ESP)[RFC4303]. 775 o HMAC-SHA1-96 [RFC2404] for message integrity protection 777 o AES-CBC with 128-bit keys [RFC3602] for message confidentiality. 779 o Replay protection[RFC4301]. 781 It is expected to be possible for the CE or FE to be operationally 782 configured to negotiate other cipher suites and even use manual 783 keying. 785 7.1.1. SAD and SPD setup 787 To minimize the operational configuration it is recommended that only 788 the IANA issued SCTP protocol number(132) be used as a selector in 789 the Security Policy Database (SPD) for ForCES. In such a case only a 790 single SPD and SAD entry is needed. 792 It should be straightforward to extend such a policy to alternatively 793 use the 3 SCTP TML port numbers as SPD selectors. But as noted above 794 this choice will require increased number of SPD entries. 796 In scenarios where multiple IP addresses are used within a single 797 association, and there is desire to configure different policies on a 798 per IP address, then it is recommended to follow [RFC3554] 800 8. Acknowledgements 802 The authors would like to thank Joel Halpern, Michael Tuxen, Randy 803 Stewart, Evangelos Haleplidis and Chuanhuang Li for engaging us in 804 discussions that have made this draft better. 806 9. References 808 9.1. Normative References 810 [RFC2404] Madson, C. and R. Glenn, "The Use of HMAC-SHA-1-96 within 811 ESP and AH", RFC 2404, November 1998. 813 [RFC3554] Bellovin, S., Ioannidis, J., Keromytis, A., and R. 814 Stewart, "On the Use of Stream Control Transmission 815 Protocol (SCTP) with IPsec", RFC 3554, July 2003. 817 [RFC3602] Frankel, S., Glenn, R., and S. Kelly, "The AES-CBC Cipher 818 Algorithm and Its Use with IPsec", RFC 3602, 819 September 2003. 821 [RFC4109] Hoffman, P., "Algorithms for Internet Key Exchange version 822 1 (IKEv1)", RFC 4109, May 2005. 824 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 825 Internet Protocol", RFC 4301, December 2005. 827 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", 828 RFC 4303, December 2005. 830 [RFC4960] Stewart, R., "Stream Control Transmission Protocol", 831 RFC 4960, September 2007. 833 [RFC5061] Stewart, R., Xie, Q., Tuexen, M., Maruyama, S., and M. 834 Kozuka, "Stream Control Transmission Protocol (SCTP) 835 Dynamic Address Reconfiguration", RFC 5061, 836 September 2007. 838 9.2. Informative References 840 [FE-MODEL] 841 Halpern, J. and J. Hadi Salim, "ForCES Forwarding Element 842 Model", October 2008. 844 [FE-PROTO] 845 Doria (Ed.), A., Haas (Ed.), R., Hadi Salim (Ed.), J., 846 Khosravi (Ed.), H., M. Wang (Ed.), W., Dong, L., and R. 847 Gopal, "ForCES Protocol Specification", November 2008. 849 [RFC3654] Khosravi, H. and T. Anderson, "Requirements for Separation 850 of IP Control and Forwarding", RFC 3654, November 2003. 852 [RFC3746] Yang, L., Dantu, R., Anderson, T., and R. Gopal, 853 "Forwarding and Control Element Separation (ForCES) 854 Framework", RFC 3746, April 2004. 856 [RFC3768] Hinden, R., "Virtual Router Redundancy Protocol (VRRP)", 857 RFC 3768, April 2004. 859 [SCTP-API] 860 Stewart, R., Poon, K., Tuexen, M., Yasevich, V., and P. 861 Lei, "Sockets API Extensions for Stream Control 862 Transmission Protocol (SCTP)", Feb. 2009. 864 Appendix A. Suggested SCTP TML Channel Work Implementation 866 As mentioned in Section 5, there are two levels of TML channel work 867 within an NE when a ForCES node (CE or FE) is connected to multiple 868 other ForCES nodes: 870 1. NE-level I/O work where a ForCES node (CE or FE) needs to choose 871 which of the peer nodes to process. 873 2. Node-level I/O work where a ForCES node, handles the three SCTP 874 TML channels separately for each single ForCES endpoint. 876 NE-level scheduling definition is left up to the implementation and 877 is considered out of scope for this document. Appendix A.4 discuss 878 briefly some constraints that an implementor needs to worry about. 880 This document and in particular Appendix A.1, Appendix A.2 and 881 Appendix A.3 discuss details of node-level I/O work. 883 A.1. SCTP TML Channel Initialization 885 As discussed in Section 5, it is recommended that the FE SHOULD do 886 socket connections to the CE in the order of incrementing priorities 887 i.e. LP socket first, followed by MP and ending with HP socket 888 connection. The CE, however, MUST NOT assume that there is ordering 889 of socket connections from any FE. Appendix B.1 has more details on 890 the expected initialization of SCTP channel work. 892 A.2. Channel work scheduling 894 This section provides high level details of the scheduling view of 895 the SCTP TML core (Section 4.2.1). A practical scheduler 896 implementation takes care of many little details (such as timers, 897 work quanta, etc) not described in this document. The implementor is 898 left to take care of those details. 900 The CE(s) and FE(s) are coupled together in the principles of the 901 scheduling scheme described here to tie together node overload with 902 transport congestion. The design intent is to provide the highest 903 possible robust work throughput for the NE under any network or 904 processing congestion. 906 A.2.1. FE Channel work scheduling 908 The FE scheduling, in priority order, needs to I/O process: 910 1. The HP channel I/O in the following priority order: 912 1. Transmitting back to the CE any outstanding result of 913 executed work via the HP channel transmit path. 915 2. Taking new incoming work from the CE which creates ForCES 916 work to be executed by the FE. 918 2. ForCES events which result in transmission of unsolicited ForCES 919 packets to the CE via the MP channel. 921 3. Incoming Redirect work in the form of control packets that come 922 from the CE via LP channel. After redirect processing, these 923 packets get sent out on external (to the NE) interface. 925 4. Incoming Redirect work in the form of control packets that come 926 from other NEs via external (to the NE) interfaces. After some 927 processing, such packets are sent to the CE. 929 It is worth emphasizing at this point again that the SCTP TML 930 processes the channel work in strict priority. For example, as long 931 as there are messages to send to the CE on the HP channel, they will 932 be processed first until there are no more left before processing the 933 next priority work (which is to read new messages on the HP channel 934 incoming from the CE). 936 A.2.2. CE Channel work scheduling 938 The CE scheduling, in priority order, needs to deal with: 940 1. The HP channel I/O in the following priority order: 942 1. Process incoming responses to requests of work it made to the 943 FE(s). 945 2. Transmitting any outstanding HP work it needs for the FE(s) 946 to complete. 948 2. Incoming ForCES events from the FE(s) via the MP channel. 950 3. Outgoing Redirect work in the form of control packets that get 951 sent from the CE via LP channel destined to external (to the NE) 952 interface on FE(s). 954 4. Incoming Redirect work in the form of control packets that come 955 from other NEs via external (to the NE) interfaces on the FE(s). 957 It is worth to repeat for emphasis again that the SCTP TML processes 958 the channel work in strict priority. For example, if there are 959 messages incoming from an FE on the HP channel, they will be 960 processed first until there are no more left before processing the 961 next priority work which is to transmit any outstanding HP channel 962 messages going to the FE. 964 A.3. SCTP TML Channel Termination 966 Appendix B.2 describes a controlled disassociation of the FE from the 967 NE. 969 It is also possible for connectivity to be lost between the FE and CE 970 on one or more sockets. In cases where SCTP multi-homing features 971 are used for path availability, the disconnection of a socket will 972 only occur if all paths are unreachable; otherwise, SCTP will ensure 973 reachability. In the situation of a total connectivity loss of even 974 one SCTP socket, it is recommended that the FE and CE SHOULD assume a 975 state equivalent to ForCES Association Teardown being issued and 976 follow the sequence described in Appendix B.2. 978 A CE could also disconnect sockets to an FE to indicate an "emergency 979 teardown". The "emergency teardown" may be necessary in cases when a 980 CE needs to disconnect an FE but knows that an FE is busy processing 981 a lot of outstanding commands (some of which the FE hasn't got around 982 to processing yet). By virtue of the CE closing the connections, the 983 FE will immediately be asynchronously notified and will not have to 984 process any outstanding commands from the CE. 986 A.4. SCTP TML NE level channel scheduling 988 In handling NE-level I/O work, an implementation needs to worry about 989 being both fair and robust across peer ForCES nodes. 991 Fairness is desired so that each peer node makes progress across the 992 NE. For the sake of illustration consider two FEs connected to a CE; 993 whereas one FE has a few HP messages that need to be processed by the 994 CE, another may have infinite HP messages. The scheduling scheme may 995 decide to use a quota scheduling system to ensure that the second FE 996 does not hog the CE cycles. 998 Robustness is desired so that the NE does not succumb to a DoS attack 999 from hostile entities and always achieves a maximum stable workload 1000 processing level. For the sake of illustration consider again two 1001 FEs connected to a CE. Consider FE1 as having a large number of HP 1002 and MP messages and FE2 having a large number of MP and LP messages. 1003 The scheduling scheme needs to ensure that while FE1 always gets its 1004 messages processed, at some point we allow FE2 messages to be 1005 processed. A promotion and preemption based scheduling could be used 1006 by the CE to resolve this issue. 1008 Appendix B. Suggested Service Interface 1010 This section provides high level service interface between FEM/CEM 1011 and TML, the PL and TML, and between local and remote TMLs. The 1012 intent of this interface discussion is to provide general guidelines. 1013 The implementer is expected to worry about details and even follow a 1014 different approach if needed. 1016 The theory of operation for the PL-TML service is as follows: 1018 1. The PL starts up and bootstraps the TML. The end result of a 1019 successful TML bootstrap is that the CE TML and the FE TML 1020 connect to each other at the transport level. 1022 2. Sending and reception of the PL level messages commences after a 1023 successful TML bootstrap. The PL uses send and receive PL-TML 1024 interfaces to communicate to its peers. The TML is agnostic to 1025 the nature of the messages being sent or received. The first 1026 message exchanges that happen are to establish ForCES 1027 association. Subsequent messages maybe either unsolicited events 1028 from the FE PL, control message redirects from/to the CE to/from 1029 FE, and configuration from the CE to the FE and their responses 1030 flowing from the FE to the CE. 1032 3. The PL does a shutdown of the TML after terminating ForCES 1033 association. 1035 B.1. TML Boot-strapping 1037 Figure 6 illustrates a flow for the TML bootstrapped by the PL. 1039 When the PL starts up (possibly after some internal initialization), 1040 it boots up the TML. The TML first interacts with the FEM/CEM and 1041 acquires the necessary TML parameterization (Section 4.2.1.6). Next 1042 the TML uses the information it retrieved from the FEM/CEM interface 1043 to initialize itself. 1045 The TML on the FE proceeds to connect the 3 channels to the CE. The 1046 socket interface is used for each of the channels. The TML continues 1047 to re-try the connections to the CE until all 3 channels are 1048 connected. It is advisable that the number of connection retry 1049 attempts and the time between each retry is also configurable via the 1050 FEM. On failure to connect one or more channels, and after the 1051 configured number of retry thresholds is exceeded, the TML will 1052 return an appropriate failure indicator to the PL. On success (as 1053 shown in Figure 6), a success indication is presented to the TML. 1055 FE PL FE TML FEM CEM CE TML CE PL 1056 | | | | | | 1057 | | | | | Bootup | 1058 | | | | |<-------------------| 1059 | Bootup | | | | | 1060 |----------->| | |get CEM info| | 1061 | |get FEM info | |<-----------| | 1062 | |------------>| ~ ~ | 1063 | ~ ~ |----------->| | 1064 | |<------------| | | 1065 | | |-initialize TML | 1066 | | |-create the 3 chans.| 1067 | | | to listen to FEs | 1068 | | | | 1069 | |-initialize TML |Bootup success | 1070 | |-create the 3 chans. locally |------------------->| 1071 | |-connect 3 chans. remotely | | 1072 | |------------------------------>| | 1073 | ~ ~ - FE TML connected ~ 1074 | ~ ~ - FE TML info init ~ 1075 | | channels connected | | 1076 | |<------------------------------| | 1077 | Bootup | | | 1078 | succeeded | | | 1079 |<-----------| | | 1080 | | | | 1082 Figure 6: SCTP TML Bootstrapping 1084 On the CE things are slightly different. After initializing from the 1085 CEM, the TML on the CE side proceeds to initialize the 3 channels to 1086 listen to remote connections from the FEs. The success or failure 1087 indication is passed on to the CE PL level (in the same manner as was 1088 done in the FE). 1090 Post boot-up, the CE TML waits for connections from the FEs. Upon a 1091 successful connection by an FE, the CE TML level keeps track of the 1092 transport level details of the FE. Note, at this stage only 1093 transport level connection has been established; ForCES level 1094 association follows using send/receive PL-TML interfaces (refer to 1095 Appendix B.3 and Figure 8). 1097 B.2. TML Shutdown 1099 Figure 7 shows an example of an FE shutting down the TML. It is 1100 assumed at this point that the ForCES Association Teardown has been 1101 issued by the CE. 1103 When the FE PL issues a shutdown to its TML for a specific PL ID, the 1104 TML releases all the channel connections to the CE. This is achieved 1105 by closing the sockets used to communicate to the CE. 1107 FE PL FE TML CE TML CE PL 1108 | | | | 1109 | Shutdown | | | 1110 |----------->| | | 1111 | |-disconnect 3 chans. | | 1112 | |------------------------>| | 1113 | | | | 1114 | | |-FE TML info cleanup| 1115 | | |-optionally tell PL | 1116 | | |------------------->| 1117 | |- clean up any state of | | 1118 | | channels disconnected | | 1119 | | | | 1120 | |<------------------------| | 1121 | Shutdown | | | 1122 | succeeded | | | 1123 |<-----------| | | 1124 | | | | 1126 Figure 7: FE Shutting down 1128 On the CE side, a TML level disconnection would result in possible 1129 cleanup of the FE state. Optionally, depending on the 1130 implementation, there may be need to inform the PL about the TML 1131 disconnection. 1133 B.3. TML Sending and Receiving 1135 The TML is agnostic to the nature of the PL message it delivers to 1136 the remote TML (which subsequently delivers the message to its PL). 1137 Figure 8 shows an example of a message exchange originated at the FE 1138 and sent to the CE (such as a ForCES association message) which 1139 illustrates all the necessary service interfaces for sending and 1140 receiving. 1142 When the FE PL sends a message to the TML, the TML is expected to 1143 pick one of HP/MP/LP channels and send out the ForCES message. 1145 FE PL FE TML CE TML CE PL 1146 | | | | 1147 |PL send | | | 1148 |----------->| | | 1149 | | | | 1150 | |-Format msg. | | 1151 | |-pick channel | | 1152 | |-TML Send | | 1153 | |------------->| | 1154 | | |-TML Receive on chan. | 1155 | | |-decapsulate | 1156 | | |- mux to PL/PL recv | 1157 | | |--------------------->| 1158 | | | ~ 1159 | | | ~ PL Process 1160 | | | ~ 1161 | | | PL send | 1162 | | |<---------------------| 1163 | | |-Format msg. for send | 1164 | | |-pick chan to send on | 1165 | | |-TML send | 1166 | |<-------------| | 1167 | |-TML Receive | | 1168 | |-decapsulate | | 1169 | |-mux to PL | | 1170 | PL Recv | | | 1171 |<---------- | | | 1172 | | | | 1174 Figure 8: Send and Recv Flow 1176 When the CE TML receives the ForCES message on the channel it was 1177 sent on, it demultiplexes the message to the CE PL. 1179 The CE PL, after some processing (in this example dealing with the 1180 FE's association), sends to the TML the response. And as in the case 1181 of FE PL, the CE TML picks the channel to send on before sending. 1183 The processing of the ForCES message upon arriving at the FE TML and 1184 delivery to the FE PL is similar to the CE side equivalent as shown 1185 above in Appendix B.3. 1187 Authors' Addresses 1189 Jamal Hadi Salim 1190 Mojatatu Networks 1191 Ottawa, Ontario 1192 Canada 1194 Email: hadi@mojatatu.com 1196 Kentaro Ogawa 1197 NTT Corporation 1198 3-9-11 Midori-cho 1199 Musashino-shi, Tokyo 180-8585 1200 Japan 1202 Email: ogawa.kentaro@lab.ntt.co.jp