idnits 2.17.1 draft-pwouters-ipsecme-multi-sa-performance-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (12 July 2021) is 1020 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'TBD1' is mentioned on line 671, but not defined == Missing Reference: 'TBD2' is mentioned on line 672, but not defined == Missing Reference: 'TBD3' is mentioned on line 673, but not defined == Missing Reference: 'TBD4' is mentioned on line 674, but not defined == Missing Reference: 'TO DO' is mentioned on line 535, but not defined ** Downref: Normative reference to an Informational RFC: RFC 2367 -- Obsolete informational reference (is this intentional?): RFC 6982 (Obsoleted by RFC 7942) Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network A. Antony 3 Internet-Draft secunet 4 Intended status: Standards Track T. Brunner 5 Expires: 13 January 2022 codelabs 6 S. Klassert 7 secunet 8 P. Wouters 9 Aiven 10 12 July 2021 12 IKEv2 support for per-queue Child SAs 13 draft-pwouters-ipsecme-multi-sa-performance-00 15 Abstract 17 This document defines four Notify Message Type Payloads for the 18 Internet Key Exchange Protocol Version 2 (IKEv2) indicating support 19 for the negotiation of multiple identical Child SAs to optimize 20 performance. 22 The CPU_QUEUES notification indicates support for multiple queues or 23 CPUs. The QOS_QUEUES notification indicates support for different 24 Quality of Service (QoS) levels. The CPU_QUEUE_INFO and 25 QOS_QUEUE_INFO notification are used to confirm and optionally convey 26 information about the specific queue, such as QoS level. 28 Using multiple identical Child SAs has the benefit that each stream 29 has its own Sequence Number Counter, ensuring that CPUs don't have to 30 synchronize their crypto state or disable their packet replay 31 protection. 33 Status of This Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at https://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on 13 January 2022. 50 Copyright Notice 52 Copyright (c) 2021 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 57 license-info) in effect on the date of publication of this document. 58 Please review these documents carefully, as they describe your rights 59 and restrictions with respect to this document. Code Components 60 extracted from this document must include Simplified BSD License text 61 as described in Section 4.e of the Trust Legal Provisions and are 62 provided without warranty as described in the Simplified BSD License. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 67 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 68 2. Performance bottlenecks . . . . . . . . . . . . . . . . . . . 4 69 3. Negotiation of CPU specific Child SAs . . . . . . . . . . . . 4 70 4. Negotiation of QoS specific Child SAs . . . . . . . . . . . . 6 71 5. Implementation specifics . . . . . . . . . . . . . . . . . . 6 72 5.1. per-CPU Child SAs . . . . . . . . . . . . . . . . . . . . 6 73 5.2. per-QoS Child SAs . . . . . . . . . . . . . . . . . . . . 7 74 5.3. Combining per-CPU and per-QoS level Child SAs . . . . . . 8 75 6. Payload Format . . . . . . . . . . . . . . . . . . . . . . . 8 76 6.1. CPU_QUEUES Notify Message Payload . . . . . . . . . . . . 8 77 6.2. QOS_QUEUES Notify Message Payload . . . . . . . . . . . . 9 78 6.3. CPU_QUEUE_INFO Notify Message Payload . . . . . . . . . . 9 79 6.4. QOS_QUEUE_INFO Notify Message Payload . . . . . . . . . . 10 80 7. Operational Considerations . . . . . . . . . . . . . . . . . 11 81 8. Security Considerations . . . . . . . . . . . . . . . . . . . 12 82 9. Implementation Status . . . . . . . . . . . . . . . . . . . . 12 83 9.1. Linux XFRM . . . . . . . . . . . . . . . . . . . . . . . 13 84 9.2. Libreswan . . . . . . . . . . . . . . . . . . . . . . . . 14 85 9.3. strongSwan . . . . . . . . . . . . . . . . . . . . . . . 14 86 9.4. iproute2 . . . . . . . . . . . . . . . . . . . . . . . . 15 87 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 88 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 89 11.1. Normative References . . . . . . . . . . . . . . . . . . 15 90 11.2. Informative References . . . . . . . . . . . . . . . . . 16 91 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 93 1. Introduction 95 IPsec implementations are currently limited to using one queue or CPU 96 per Child SA. The result is that a machine with many queues/CPUs is 97 limited to only using one of these per Child SA. This severely 98 limits the throughput that can be attained. An unencrypted link of 99 10Gbps or more is commonly reduced to 2-5Gbps when IPsec is used to 100 encrypt the link using AES-GCM. By using the implementation 101 specified in this document, aggregate throughput increased from 5Gbps 102 using 1 CPU to 40-60 Gbps using 25-30 CPUs 104 Furthermore, IPsec implementations are currently limited to use the 105 same Child SA for all Quality of Service (QoS) types because the QoS 106 type is not a part of the Traffic Selector (TS) payload. The result 107 is that IPsec cannot support active Quality of Service prioritization 108 without disabling the anti-replay protection. 110 While this could be (partially) mitigated by setting up multiple 111 narrowed Child SAs, for example using Populate From Packet (PFP) as 112 specified in [RFC4301], this IPsec feature is not widely implemented. 113 Some route based IPsec implementations might be able to implement 114 this with specific rules into separate network interfaces, but these 115 methods might not be available for policy based IPsec 116 implementations. 118 To make better use of multiple network queues and CPUs, it can be 119 beneficial to negotiate and install multiple identical Child SAs. 120 IKEv2 [RFC7296] already allows installing multiple identical Child 121 SAs, it offers no method to negotiate the number of Child SAs or 122 indicate the purpose for the multiple Child SAs that are requested. 124 When two IKEv2 peers want to negotiate multiple Child SAs, it is 125 useful to be able to convey how many Child SAs are required for 126 optimized traffic. This avoids triggering CREATE_CHILD_SA exchanges 127 that will only be rejected by the peer. 129 1.1. Requirements Language 131 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 132 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 133 "OPTIONAL" in this document are to be interpreted as described in BCP 134 14 [RFC2119] [RFC8174] when, and only when, they appear in all 135 capitals, as shown here. 137 2. Performance bottlenecks 139 Currently, most IPsec implementations are limited by using one CPU or 140 network queue per Child SA. There are a number of practical reasons 141 for this, but a key limitation is that sharing the crypto state, 142 counters and sequence numbers between multiple CPUs is not feasible 143 without a significant performance penalty. There is a need to 144 negotiate and establish multiple Child SAs with identical TSi/TSr on 145 a per-queue or per-CPU basis. 147 3. Negotiation of CPU specific Child SAs 149 When negotiating CPU specific Child SAs, the first SA negotiated 150 either in an IKE_AUTH exchange or CREATE_CHILD_SA is called Fallback 151 SA. This Child SA is similar to a regular Cgild SA in that it is not 152 bound to a single resource (CPU or QoS queue). This Fallback Child 153 SA (or its rekeyed successors) MUST remain active for the lifetime of 154 the IPsec session to ensure that there is always a Child SA that can 155 be selected to send traffic over, in case a per-resource Child SA is 156 not available. Additional Child SAs are installed bound to a 157 specific resource (CPU or QoS queue). These Child SAs are 158 responsible for the bulk of the traffic. 160 The CPU_QUEUES notification payload is sent in the IKE_AUTH or 161 CREATE_CHILD_SA Exchange indicating the negotiated Child SA is a 162 Fallback SA. 164 The CPU_QUEUES notification value refers to the number of additional 165 resource-specific Child SAs that may be installed for this particular 166 TSi/TSr combination excluding the Fallback Child SA. Both peers send 167 the preferred minimum number of additional Child SAs to install. 168 Both peers pick the maximum of the two numbers (within reason). That 169 is, if the initiator prefers 16 and the responder prefers 48, then 170 the number negotiated is 48. The responder may at any time reject 171 additional Child SAs by returning TS_UNACCEPTABLE. It should not 172 return NO_ADDITIONAL_SAS, as there might be another Child SAs with 173 different Traffic Selectors that would still be allowed by the peer. 175 [Antony: Valery's feedback was not to use TS_UNACCEPTABLE. instead 176 create a new notify or use TEMPORARY_FAILURE. TEMPORARY_FAILURE 177 because the situation may change again if you try again. I have 178 preference to define new NO_CPU_QUEUE_INFO_SA] 180 Resource-specific Child SAs are negotiated as regular Child SAs using 181 the CREATE_CHILD_SA exchange and are identified by a CPU_QUEUE_INFO 182 notification. Upon installation, each Child SA is associated with an 183 additional local selector, such as CPU or queue. These additional 184 Child SAs MUST be negotiated with identical Child SA properties that 185 were negotiated for the Fallback SA. This includes cryptographic 186 algorithms, Traffic Selectors, Mode (e.g. transport mode), 187 compression usage, etc. However, the Child SAs do have their own 188 individual keying material that is derived according to the regular 189 IKEv2 process. The CPU_QUEUE_INFO can be empty or contain some 190 identifying data that could be useful for debugging purposes. 192 Additional Child SAs can be started on-demand or can be started all 193 at once. Peers may also delete specific per-resource Child SAs if 194 they deem the associated resource to be idle. The Fallback SA MUST 195 NOT be deleted while any per-resource Child SAs are still present. 197 During the CREATE_CHILD_SA rekey for the Child SA, the CPU_QUEUE_INFO 198 notification MAY be included, but regardless of whether or not it is 199 included, the rekeyed Child SA MUST be bound to the same resource(s) 200 as the Child SA that is being rekeyed. 202 As with regular Child SA rekeying, the new Child SA may not be 203 different from the rekeyed Child SA with respect to cryptographic 204 algorithms and MUST cover the original Traffic Selector ranges. 206 If a CREATE_CHILD_SA exchange request containing both a 207 CPU_QUEUE_INFO and a CPU_QUEUES notification is received, the 208 responder MUST ignore the CPU_QUEUE_INFO payload. If a 209 CREATE_CHILD_SA exchange reply is received with both CPU_QUEUE_INFO 210 and CPU_QUEUES notifications, the initiator MUST ignore the 211 notification that it did not send in the request. 213 [Steffen: I tend to tread these cases as an error.] 215 [Tobias: That's currently how I implemented it (being lenient on what 216 I accept). But we could also treat those cases as errors. The 217 question would just be what we should return (NO_PROPOSAL_CHOSEN and 218 keep IKE and other Child SAs or even INALID_SYNTAX and kill the whole 219 IKE_SA - and as initiator we either have to terminate the Child or 220 the IKE_SA actively if we receive both notifies).] 222 The CPU_QUEUES notification, even when it is sent in the IKE_AUTH 223 exchange, is not an attribute of the IKE peer. It is an attribute of 224 the Child SA, similar to the USE_TRANSPORT notification. That is, an 225 IKE peer can have multiple Child SAs covering different traffic 226 selectors and selectively decide whether or not to enable additional 227 per-resource Child SAs for each of these Child SAs covering different 228 Traffic Selectors. 230 4. Negotiation of QoS specific Child SAs 232 To install multiple Child SAs for different QoS levels, a similar 233 negotiation method is used. The QOS_QUEUES notification is sent with 234 the negotiation of the Fallback Child SA that is used for all QoS 235 levels not matched by more specific Child SAs. Additional Child SAs 236 are installed per QoS level by including the QOS_QUEUE_INFO 237 notification describing the specific QoS level that this additional 238 Child SA will cover. This allows both peers to install the Child SA 239 using the same QoS level. 241 [Steffen: Maybe mention IPv6 flow label too] 243 If a certain QoS level proposed by the peer is not acceptable to the 244 responder, TS_UNACCEPTABLE MUST be returned. 246 [Tobias: Would a more specific error notify make sense here?] 248 [Antony: We need specific error if is rejected QOS_QUEUE_INFO] 250 5. Implementation specifics 252 There are various considerations that an implementation can use to 253 determine the best way to install multiple Child SAs. Below are 254 examples of such strategies. 256 5.1. per-CPU Child SAs 258 A simple distribution could be to install one additional Child SA on 259 each CPU. The Fallback Child SA ensures that any CPU generating 260 traffic to be encrypted has an available (if not optimal) Child SA to 261 use. Any subsequent Child SAs with identical TSi/TSr Traffic 262 Selectors are installed in such a way to only be used by a single CPU 263 or network queue. 265 Performing per-CPU Child SA negotiations can result in both peers 266 initiating additional Child SAs at once. This is especially likely 267 if per-CPU Child SAs are triggered by individual SADB_ACQUIRE 268 [RFC2367] messages. Responders should install the additional Child 269 SA on a CPU with the least amount of additional Child SAs for this 270 TSi/TSr pair. It should count outstanding SADB_ACQUIREs as an 271 assigned additional Child SA. It is still possible that when the 272 peers only have one slot left to assign, that both peers send a 273 CREATE_CHILD_SA request at the same time. [Paul: Is there anything 274 we can do at the protocol level to terminate one of these without 275 race conditions?] [Antony: if CPU_QUEUE_INFO is a MUST, that info 276 could be used for better one-to-one mapping, as well as delete the 277 extra SAs. Also, keep in mind the general case IKE window > 1] 278 As an optimization, additional Child SAs that see little traffic MAY 279 be deleted. The Fallback Child SA MUST NOT be deleted when idle, as 280 it is likely to be idle if enough per-CPU Child SAs are installed. 281 However, if one of those per-CPU child SAs is deleted because it was 282 idle, and subsequently that CPU starts to generate traffic again, 283 that traffic does not have a per-CPU Child SA and will be encrypted 284 using the Fallback Child SA. Meanwhile, the IKE daemon might be 285 negotiating to bring up a new per-CPU Child SA. 287 When the number of queues or CPUs are different between the peers, 288 the peer with the least amount of queues or CPUs MAY decide to not 289 install a second outbound Child SA for the same resource as it will 290 never use it to send traffic. However, it MUST install all inbound 291 Child SAs as it has committed to receiving traffic on these 292 negotiated Child SAs. 294 If per-CPU SADB_ACQUIRE messages are implemented (see Section 7), the 295 Traffic Selector (TSi) entry containing the information of the 296 trigger packet should still be included in the TS set. This 297 information MAY be used by the peer to select the most optimal target 298 CPU to install the additional Child SA on. For example, if the 299 trigger packet was for a TCP destination to port 25 (SMTP), it might 300 be able to install the Child SA on the CPU that is also running the 301 mail server process. Trigger packet Traffic Selectors are documented 302 in [RFC7296] Section 2.9. 304 As per RFC 7296, rekeying a Child SA SHOULD use the same (or wider) 305 Traffic Selectors to ensure that the new Child SA covers everything 306 that the rekeyed Child SA covers. This includes Traffic Selectors 307 negotiated via Configuration Payloads (CP) such as 308 INTERNAL_IP4_ADDRESS which may use the original wide TS set or use 309 the narrowed TS set. 311 5.2. per-QoS Child SAs 313 [Paul: is there anything we need to say here?] 315 [Steffen: If we want to say something about that case, maybe this:] 317 Most considerations from the per-CPU case apply to the per-QoS case 318 as well. The main difference between these two cases is that the 319 number of possible QoS types are always the same for both peers (e.g. 320 64 types for IPv4). Unlike the per-CPU case, handling different 321 numbers of QoS types is not necessary. 323 [Paul: I was hoping we could negotiate things like "only 2 different 324 levels needed", and not just a "we want to install SAs for all 325 theoretical possible levels"] 327 5.3. Combining per-CPU and per-QoS level Child SAs 329 It is unlikely but not disallowed, to use both per CPU and per QoS 330 level Child SAs. Any conflicts between the performance improving 331 types of SAs would need to be handled by local policies. For some, 332 the QoS might be more important to honour as best as possible, while 333 for others, CPU distribution might be more important. There is 334 currently no operational experience with combining these two types of 335 Child SAs. 337 [Tobias: How would this look like? Would you send both notifies on 338 the same set of SAs (CPU/QOS_QUEUE on the fallback SA and INFO on the 339 others)? (So each SA would be for a specific CPU AND QoS class.) Or 340 would you negotiate separate per-CPU and per-QoS SAs all with the 341 same TS? (e.g. if you already bound certain classes to certain CPUs 342 anyway and use a QoS specific SA for that, but still want to use 343 multiple CPUs for the other traffic and negotiate per-CPU SAs without 344 QoS identifier for that)] 346 [Paul: I don't really know - perhaps we should remove QoS until we 347 have someone who actually wants to run this and can provide guidance 348 for standardization ? ] 350 6. Payload Format 352 All multi-octet fields representing integers are laid out in big 353 endian order (also known as "most significant byte first", or 354 "network byte order"). 356 6.1. CPU_QUEUES Notify Message Payload 358 1 2 3 359 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 360 +-+-----------------------------+-------------------------------+ 361 ! Next Payload !C! RESERVED ! Payload Length ! 362 +---------------+---------------+-------------------------------+ 363 ! Protocol ID ! SPI Size ! Notify Message Type ! 364 +---------------+---------------+-------------------------------+ 365 ! Minimum number of IPsec SAs ! 366 +-------------------------------+-------------------------------+ 368 * Protocol ID (1 octet) - MUST be 0. MUST be ignored if not 0. 370 * SPI Size (1 octet) - MUST be 0. MUST be ignored if not 0. 372 * Notify Message Type (2 octets) - set to [TBD1] 373 * Minimum number of per-CPU IPsec SAs (4 octets). MUST be greater 374 than 0. If 0 is received, it MUST be interpreted as 1. 376 Note: The Fallback Child SA that is not bound to a single CPU is not 377 counted as part of these numbers. 379 6.2. QOS_QUEUES Notify Message Payload 381 1 2 3 382 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 383 +-+-----------------------------+-------------------------------+ 384 ! Next Payload !C! RESERVED ! Payload Length ! 385 +---------------+---------------+-------------------------------+ 386 ! Protocol ID ! SPI Size ! Notify Message Type ! 387 +---------------+---------------+-------------------------------+ 388 ! Minimum number of IPsec SAs ! 389 +-------------------------------+-------------------------------+ 391 * Protocol ID (1 octet) - MUST be 0. MUST be ignored if not 0. 393 * SPI Size (1 octet) - MUST be 0. MUST be ignored if not 0. 395 * Notify Message Type (2 octets) - set to [TBD2] 397 * Maximum number of QoS level IPsec SAs (4 octets). MUST be greater 398 than 0. If 0 is received, it MUST be interpreted as 1. 400 * [Steffen: Does it make sense to negotiate the max. number of QoS 401 types? Unlike the per-CPU case, there is no tradeoff between the 402 peers. Both peers always support the same number of QoS types (64 403 on IPv4)] 405 * [Tobias: I agree with Steffen. This doesn't seem necessary and 406 might even be confusing as reducing the number would not tell the 407 peer what classes should actually be sent.] 409 * [Paul: I was hoping to send the desired number of different 410 levels, not the theoretical maximum of used levels 412 Note: The Fallback Child SA that is not bound to a single QoS is not 413 counted as part of these numbers. 415 6.3. CPU_QUEUE_INFO Notify Message Payload 416 1 2 3 417 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 418 +-+-----------------------------+-------------------------------+ 419 ! Next Payload !C! RESERVED ! Payload Length ! 420 +---------------+---------------+-------------------------------+ 421 ! Protocol ID ! SPI Size ! Notify Message Type ! 422 +---------------+---------------+-------------------------------+ 423 ! ! 424 ~ Optional queue identifier ~ 425 ! ! 426 +-------------------------------+-------------------------------+ 428 * Protocol ID (1 octet) - MUST be 0. MUST be ignored if not 0. 430 * SPI Size (1 octet) - MUST be 0. MUST be ignored if not 0. 432 * Notify Message Type (2 octets) - set to [TBD3] 434 * Optional Payload Data. This value MAY be set to convey the local 435 identity of the queue. The value SHOULD be a unique identifier 436 and the peer SHOULD only use it for debugging purposes. 438 6.4. QOS_QUEUE_INFO Notify Message Payload 440 1 2 3 441 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 442 +-+-----------------------------+-------------------------------+ 443 ! Next Payload !C! RESERVED ! Payload Length ! 444 +---------------+---------------+-------------------------------+ 445 ! Protocol ID ! SPI Size ! Notify Message Type ! 446 +---------------+---------------+-------------------------------+ 447 ! ! 448 ~ Mandatory QoS level specifier ~ 449 ! ! 450 +-------------------------------+-------------------------------+ 452 * Protocol ID (1 octet) - MUST be 0. MUST be ignored if not 0. 454 * SPI Size (1 octet) - MUST be 0. MUST be ignored if not 0. 456 * Notify Message Type (2 octets) - set to [TBD4] Mandatory Payload 457 Data. This value MUST be set to identify the QoS level. [Paul: 458 Can we say 'one byte for each level of QoS included for this SA' 459 ?] [Steffen: I don't understand that? Do we support more than one 460 QoS type per SA? I think we need space to cover either a 6 bit 461 IPv4 QoS type or a 20 bit IPv6 flow label.] [Tobias: Hm, one 462 problem here is that CHILD_SAs can have traffic selectors of both 463 address families. So how could we negotiate that we need a QoS 464 type AND a flow label? Would that require two notifies 465 (QOS_4|6_QUEUE_INFO types) or could we have two fields in the 466 notify that may be set to 0? Or should that just not be allowed? 467 I don't even know if it makes sense and whether QoS classes and 468 flow labels are combinable in that way (I guess a dual-stack VoIP 469 client would classify traffic in a comparable way for each 470 family). And I also wonder if there is a mechanism to apply a 471 flow label to an outer IPv4 header's TOS field and vice-versa. If 472 multiple classes/labels should be supported per SA we could also 473 send multiple notifies (but I guess that would mean that on-path 474 routers had to treat all these classes/labels the same way, which 475 begs the question why different values would get assigned to the 476 packets in the first place).] 478 7. Operational Considerations 480 Implementations supporting per-CPU SAs SHOULD extend their local SPD 481 selector, and the mechanism of on-demand negotiation that is 482 triggered by traffic to include a CPU (or queue) identifier in their 483 SADB_ACQUIRE message from the SPD to the IKE daemon. If the IKEv2 484 extension defined in this document is negotiated with the peer, a 485 node which does not support receiving per-CPU SADB_ACQUIRE messages 486 MAY initiate all its Child SAs immediately upon receiving the (only) 487 SADB_ACQUIRE it will receive from the IPsec stack. Such 488 implementations also need to be careful when receiving a Delete 489 Notify request for a per-CPU Child SA, as it has no method to detect 490 when it should bring up such a per-CPU Child SA again later. And 491 bringing the deleted per-CPU Child SA up again immediately after 492 receiving the Delete Notify might cause an infinite loop between the 493 peers. Another issue of not bringing up all its per-CPU Child SAs is 494 that if the peer acts similarly, the two peers might end up with only 495 the Fallback SA without ever activating any per-CPU Child SAs. It is 496 there for RECOMMENDED to implement per-CPU SADB_ACQUIRE messages. [ 497 Antony: It would be nice to add manual/scripts for starting of 498 connection and bringing up per-CPU SAs. It could be very simple, a 499 external program decides to start a per-CPU SA. ] 500 The minimum number of Child SAs negotiated should not be treated as 501 the maximum number of allowed Child SAs. Peers SHOULD be lenient 502 with this number to account for corner cases. For example, during 503 Child SA rekeying, there might be a large number of additional Child 504 SAs created before the old Child SAs are torn down. Similarly, when 505 using on-demand Child SAs, both ends could trigger multiple Child SA 506 requests as the initial packet causing the Child SA negotiation might 507 have been transported to the peer via the Fallback SA where its reply 508 packet might also trigger an on-demand Child SA negotiation to start. 509 A peer may want to allow up to double the negotiated minimum number 510 of Child SAs, and rely on idleness of Child SAs to tear down any 511 unused Child SAs gradually to to reach an optimal number of Child 512 SAs. Adding too many SAs may slow down per-packet SAD lookup. 514 Implementations might support dynamically moving a per-CPU Child SAs 515 from one CPU to another CPU. If this method is supported, 516 implementations must be careful to move both the inbound and outbound 517 SAs. If the IPsec endpoint is a gateway, it can move the inbound SA 518 and outbound SA independently from each other. It is likely that for 519 a gateway, IPsec traffic would be asymmetric. If the IPsec endpoint 520 is the same host responsible for generating the traffic, the inbound 521 and outbound SAs SHOULD remain as a pair on the same CPU. If a host 522 previously skipped installing an outbound SA because it would be an 523 unused duplicate outbound SA, it will have to create and add the 524 previously skipped outbound SA to the SAD with the new CPU ID. The 525 inbound SA may not have CPU ID in the SAD. Adding the outbound SA to 526 the SAD requires access to the key material, whereas for updating the 527 CPU selector on an existing outbound SAs. access to key material 528 might not be needed. To support this, the IKE software might have to 529 hold on to the key material longer than it normally would, as it 530 might actively attempt to destroy key material from memorya that it 531 no longer needs access to. 533 8. Security Considerations 535 [TO DO] 537 9. Implementation Status 539 [Note to RFC Editor: Please remove this section and the reference to 540 [RFC6982] before publication.] 542 This section records the status of known implementations of the 543 protocol defined by this specification at the time of posting of this 544 Internet-Draft, and is based on a proposal described in [RFC7942]. 545 The description of implementations in this section is intended to 546 assist the IETF in its decision processes in progressing drafts to 547 RFCs. Please note that the listing of any individual implementation 548 here does not imply endorsement by the IETF. Furthermore, no effort 549 has been spent to verify the information presented here that was 550 supplied by IETF contributors. This is not intended as, and must not 551 be construed to be, a catalog of available implementations or their 552 features. Readers are advised to note that other implementations may 553 exist. 555 According to [RFC7942], "this will allow reviewers and working groups 556 to assign due consideration to documents that have the benefit of 557 running code, which may serve as evidence of valuable experimentation 558 and feedback that have made the implemented protocols more mature. 559 It is up to the individual working groups to use this information as 560 they see fit". 562 Authors are requested to add a note to the RFC Editor at the top of 563 this section, advising the Editor to remove the entire section before 564 publication, as well as the reference to [RFC7942]. 566 9.1. Linux XFRM 568 Organization: Linux kernel XFRM 570 Name: XFRM-PCPU-v1 571 https://git.kernel.org/pub/scm/linux/kernel/git/klassert/linux- 572 stk.git/log/?h=xfrm-pcpu-v1 574 Description: An initial Kernel IPsec implementation of the per-CPU 575 method. 577 Level of maturity: Alpha 579 Coverage: Implements Fallback Child SA and per-CPU Child SAs. It 580 only supports the NETLINK API. The PFKEYv2 API is not supported. 582 Licensing: GPLv2 584 Implementation experience: The Linux XFRM implementation added two 585 additional attributes to support per-CPU SAs. There is a new 586 attribute XFRMA_SA_PCPU, u32, for the SAD entry. This attribute 587 should present on the outgoing SA, per-CPU Child SAs, starting 588 from 0. This attribute MUST NOT be present on the Fallback XFRM 589 SA. It is used by the kernel only for the outgoing traffic, 590 (clear to encrypted). The incoming SAs, both the Fallback and the 591 per-CPU SA, do not need XFRMA_SA_PCPU attribute. XFRM stack can 592 not use CPU id on the incoming SA. The kernel internally sets the 593 value to 0xFFFFFF for the incoming SA and the Fallback SA. 594 However, one may add XFRMA_SA_PCPU to the incoming per-CPU SA to 595 steer the ESP flow, to a specific Q or CPU e.g ethtool ntuple 596 configuration. The SPD entry has new flag 597 XFRM_POLICY_CPU_ACQUIRE. It should be set only on the "out" 598 policy. The flag should be disabled when the policy is a trap 599 policy, without SPD entries. After a successful negotiation of 600 CPU_QUEUES, while adding the Fallback Child SA, the SPD entry can 601 be updated with the XFRM_POLICY_CPU_ACQUIRE flag. When 602 XFRM_POLICY_CPU_ACQUIRE is set, the XFRM_MSG_ACQUIRE generated 603 will include the XFRMA_SA_PCPU attribute. 605 Contact: Steffen Klassert steffen.klassert@secunet.com 607 9.2. Libreswan 609 Organization: The Libreswan Project 611 Name: pcpu-3 https://libreswan.org/wiki/XFRM_pCPU 613 Description: An initial IKE implementation of the per-CPU method. 615 Level of maturity: Alpha 617 Coverage: implements Fallback Child SA and per-CPU additional Child 618 SAs 620 Licensing: GPLv2 622 Implementation experience: TBD 624 Contact: Libreswan Development: swan-dev@libreswan.org 626 9.3. strongSwan 628 Organization: The StrongSwan Project 630 Name: StrongSwan https://github.com/strongswan/strongswan/tree/per- 631 cpu-sas-poc/ 633 Description: An initial IKE implementation of the per-CPU method. 635 Level of maturity: Alpha 637 Coverage: implements Fallback Child SA and per-CPU additional Child 638 SAs 640 Licensing: GPLv2 642 Implementation experience: StrongSwan use private space values for 643 notifications CPU_QUEUES (40970) and QUEUE_INFO (40971). 645 Contact: Tobias Brunner tobias@strongswan.org 647 9.4. iproute2 649 Organization: The iproute2 Project 651 Name: iproute2 https://github.com/antonyantony/iproute2/tree/pcpu-v1 653 Description: Implemented the per-CPU attributes for the "ip xfrm" 654 command. 656 Level of maturity: Alpha 658 Licensing: GPLv2 660 Implementation experience: TBD 662 Contact: Antony Antony antony.antony@secunet.com 664 10. IANA Considerations 666 This document defines four new IKEv2 Notify Message Type payloads for 667 the IANA "IKEv2 Notify Message Types - Status Types" registry. 669 Value Notify Type Messages - Status Types Reference 670 ----- ------------------------------ --------------- 671 [TBD1] CPU_QUEUES [this document] 672 [TBD2] QOS_QUEUES [this document] 673 [TBD3] CPU_QUEUE_INFO [this document] 674 [TBD4] QOS_QUEUE_INFO [this document] 676 Figure 1 678 11. References 680 11.1. Normative References 682 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 683 Requirement Levels", BCP 14, RFC 2119, 684 DOI 10.17487/RFC2119, March 1997, 685 . 687 [RFC2367] McDonald, D., Metz, C., and B. Phan, "PF_KEY Key 688 Management API, Version 2", RFC 2367, 689 DOI 10.17487/RFC2367, July 1998, 690 . 692 [RFC7296] Kaufman, C., Hoffman, P., Nir, Y., Eronen, P., and T. 693 Kivinen, "Internet Key Exchange Protocol Version 2 694 (IKEv2)", STD 79, RFC 7296, DOI 10.17487/RFC7296, October 695 2014, . 697 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 698 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 699 May 2017, . 701 11.2. Informative References 703 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 704 Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, 705 December 2005, . 707 [RFC6982] Sheffer, Y. and A. Farrel, "Improving Awareness of Running 708 Code: The Implementation Status Section", RFC 6982, 709 DOI 10.17487/RFC6982, July 2013, 710 . 712 [RFC7942] Sheffer, Y. and A. Farrel, "Improving Awareness of Running 713 Code: The Implementation Status Section", BCP 205, 714 RFC 7942, DOI 10.17487/RFC7942, July 2016, 715 . 717 Authors' Addresses 719 Antony Antony 720 secunet Security Networks AG 722 Email: antony.antony@secunet.com 724 Tobias Brunner 725 codelabs GmbH 727 Email: tobias@codelabs.ch 729 Steffen Klassert 730 secunet Security Networks AG 732 Email: steffen.klassert@secunet.com 734 Paul Wouters 735 Aiven 736 Email: paul.wouters@aiven.io