idnits 2.17.1 draft-stevens-advanced-api-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 96 instances of too long lines in the document, the longest one being 12 characters in excess of 72. ** The abstract seems to contain references ([1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 5 instances of lines with non-RFC2606-compliant FQDNs in the document. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 407: '... via raw sockets MUST be in network by...' Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 201 has weird spacing: '...ip6_vfc ip6_...' == Line 202 has weird spacing: '...p6_flow ip6_c...' == Line 203 has weird spacing: '...p6_plen ip6_c...' == Line 204 has weird spacing: '...ip6_nxt ip6_...' == Line 205 has weird spacing: '...p6_hlim ip6_c...' == (30 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 19, 1996) is 10050 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0' is mentioned on line 344, but not defined == Missing Reference: '8' is mentioned on line 533, but not defined -- Looks like a reference, but probably isn't: 'IFNAMSIZ' on line 872 == Unused Reference: '3' is defined on line 1365, but no explicit reference was found in the text ** Obsolete normative reference: RFC 1883 (ref. '1') (Obsoleted by RFC 2460) == Outdated reference: A later version (-06) exists of draft-ietf-ipngwg-bsd-api-05 ** Downref: Normative reference to an Informational draft: draft-ietf-ipngwg-bsd-api (ref. '2') ** Obsolete normative reference: RFC 1981 (ref. '3') (Obsoleted by RFC 8201) ** Obsolete normative reference: RFC 1970 (ref. '4') (Obsoleted by RFC 2461) Summary: 16 errors (**), 0 flaws (~~), 12 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT W. Richard Stevens 2 Expires: April 19, 1997 Matt Thomas (Digital) 3 October 19, 1996 5 Advanced Sockets API for IPv6 6 8 Abstract 10 Specifications are in progress for changes to the sockets API to 11 support IP version 6 [1]. These changes are for TCP and UDP-based 12 applications and will support most end-user applications in use 13 today: Telnet and FTP clients and servers, HTTP clients and servers, 14 and the like. 16 But another class of applications exists that will also be run under 17 IPv6. We call these "advanced" applications and today this includes 18 programs such as Ping, Traceroute, routing daemons, multicast 19 routing daemons, router discovery daemons, and the like. The API 20 feature typically used by these programs that make them "advanced" 21 is a raw socket to access ICMPv4, IGMPv4, or IPv4, along with some 22 knowledge of the packet header formats used by these protocols. To 23 provide portability for applications that use raw sockets under 24 IPv6, some standardization is needed for the advanced API features. 26 There are other features of IPv6 that some applications will need to 27 access: interface identification (specifying the outgoing interface 28 and determining the incoming interface) and IPv6 extension headers 29 that are not addressed in [1]: hop-by-hop options, destination 30 options, and the routing header (source routing). 32 Status of this Memo 34 This document is an Internet Draft. Internet Drafts are working 35 documents of the Internet Engineering Task Force (IETF), its Areas, 36 and its Working Groups. Note that other groups may also distribute 37 working documents as Internet Drafts. 39 Internet Drafts are draft documents valid for a maximum of six 40 months. Internet Drafts may be updated, replaced, or obsoleted by 41 other documents at any time. It is not appropriate to use Internet 42 Drafts as reference material or to cite them other than as a 43 "working draft" or "work in progress". 45 To learn the current status of any Internet-Draft, please check the 46 "1id-abstracts.txt" listing contained in the internet-drafts Shadow 47 Directories on: ftp.is.co.za (Africa), nic.nordu.net (Europe), 48 ds.internic.net (US East Coast), ftp.isi.edu (US West Coast), and 49 munnari.oz.au (Pacific Rim). 51 Table of Contents 53 1. Introduction .................................................... 3 54 2. Common Structures and Definitions ............................... 4 55 2.1. The ip6hdr Structure ....................................... 4 56 2.1.1. IPv6 Next Header Values ............................. 5 57 2.2. The icmp6hdr Structure ..................................... 5 58 2.2.1. ICMPv6 Type and Code Values ......................... 6 59 2.2.2. ICMPv6 Neighbor Discovery Type and Code Values ...... 7 60 2.3. Address Testing Macros ..................................... 9 61 3. IPv6 Raw Sockets ................................................ 9 62 3.1. Checksums .................................................. 10 63 3.2. ICMPv6 Type Filtering ...................................... 10 64 4. Ancillary Data .................................................. 12 65 4.1. The msghdr Structure ....................................... 13 66 4.2. The cmsghdr Structure ...................................... 14 67 4.2.1. CMSG_FIRSTHDR ....................................... 15 68 4.2.2. CMSG_NXTHDR ......................................... 15 69 4.2.3. CMSG_DATA ........................................... 16 70 4.2.4. CMSG_SPACE .......................................... 16 71 4.2.5. CMSG_LENGTH ......................................... 16 72 4.3. Summary of Options Described Using Ancillary Data .......... 16 73 4.4. TCP Access to Ancillary Data ............................... 17 74 5. Interface Identification ........................................ 18 75 5.1. Obtaining the Interface Index .............................. 19 76 5.2. The ifreq Structure ........................................ 19 77 5.3. Returning Received Interface and Destination IPv6 Address .. 20 78 5.4. Specifying Outgoing Interface and Source IPv6 Address ...... 21 79 5.4.1. Additional Errors with sendmsg() .................... 21 80 6. Hop-By-Hop Options .............................................. 22 81 6.1. Receiving Hop-by-Hop Options ............................... 22 82 6.2. Sending Hop-by-Hop Options ................................. 22 83 7. Destination Options ............................................. 23 84 7.1. Receiving Destination Options .............................. 23 85 7.2. Sending Destination Options ................................ 23 86 8. Source Route Option ............................................. 24 87 8.1. inet6_srcrt_space .......................................... 25 88 8.2. inet6_srcrt_init ........................................... 25 89 8.3. inet6_srcrt_add ............................................ 25 90 8.4. inet6_srcrt_reverse ........................................ 26 91 8.5. inet6_srcrt_segments ....................................... 26 92 8.6. inet6_srcrt_getaddr ........................................ 26 93 8.7. inet6_srcrt_getflags ....................................... 27 95 9. Ordering of Ancillary Data and IPv6 Extension Headers ........... 27 96 10. Additional Items ................................................ 28 97 10.1. Path MTU Discovery and UDP ................................ 29 98 10.2. Neighbor Reachability and UDP ............................. 29 99 10.3. Reading the Routing Table ................................. 29 100 10.4. Obtaining Interface and Address Information ............... 29 101 11. References ...................................................... 30 102 12. Acknowledgments ................................................. 30 103 13. Authors' Addresses .............................................. 31 105 1. Introduction 107 Specifications are in progress for changes to the sockets API to 108 support IP version 6 [2]. These changes are for TCP and UDP-based 109 applications. The current document defines some the "advanced" 110 features of the sockets API that are required for applications to 111 take advantage of additional features of IPv6. 113 Today, the portability of applications using IPv4 raw sockets is 114 quite high, but this is mainly because most IPv4 implementations 115 started from a common base (the Berkeley source code) or at least 116 started with the Berkeley headers. This allows programs such as 117 Ping and Traceroute, for example, to compile with minimal effort on 118 many hosts that support the sockets API. With IPv6, however, there 119 is no common source code base that implementors are starting from, 120 and the possibility for divergence at this level between different 121 implementations is high. To avoid a complete lack of portability 122 amongst applications that use raw IPv6 sockets, some standardization 123 is necessary. 125 There are also features from the basic IPv6 specification that are 126 not addressed in [2]: sending and receiving hop-by-hop options, 127 destination options, and routing headers, specifying the outgoing 128 interface, and being told of the receiving interface. 130 This document can be divided into the following main sections. 132 1. Definitions of the basic constants and structures required for 133 applications to use raw IPv6 sockets. This includes structure 134 definitions for the IPv6 and ICMPv6 headers and all associated 135 constants (e.g., values for the next header field). 137 2. Some basic semantic definitions for IPv6 raw sockets. For exam- 138 ple, a raw ICMPv4 socket requires the application to calculate 139 and store the ICMPv4 header checksum. But with IPv6 this would 140 require the application to choose the source IPv6 address 141 because the source address is part of the pseudo header that 142 ICMPv6 now uses for its checksum computation. It should be 143 defined that with a raw ICMPv6 socket the kernel always calcu- 144 lates and stores the ICMPv6 header checksum. 146 3. Interface identification: how applications specify the outgoing 147 interface and are told of the incoming interface. There are a 148 class of applications that need this capability and the tech- 149 nique should be portable. 151 4. Access to the optional hop-by-hop, destination, and routing 152 headers. 154 The final two items (interface identification and access to the IPv6 155 extension headers) are specified using the "ancillary data" fields 156 that were added to the 4.3BSD Reno sockets API in 1990. The reason 157 is that these ancillary data fields are part of the Posix.1g stan- 158 dard (which should be approved in 1997) and should therefore be 159 adopted by most vendors. 161 This document does not address application access to either the 162 authentication header or the encapsulating security payload header. 164 All examples in this document omit error checking in favor of 165 brevity and clarity. 167 Datatypes in this document follow the Posix.1g format: u_intN_t 168 means an unsigned integer of exactly N bits (e.g., u_int16_t) and 169 u_intNm_t means an unsigned integer of at least N bits (e.g., 170 u_int32m_t). 172 2. Common Structures and Definitions 174 Many advanced applications examine fields in the IPv6 header and set 175 and examine fields in the various ICMPv6 headers. Common structure 176 definitions for these headers are required, along with common con- 177 stant definitions for the structure members. 179 When an include file is specified, that include file is allowed to 180 include other files that do the actual declaration or definition. 182 2.1. The ip6hdr Structure 184 The following structure is defined as a result of including 185 . Note that this is a new header. 187 struct ip6hdr { 188 union { 189 struct ip6hdrctl { 190 u_int32_t ctl6_flow; /* 24 bits of flow-ID */ 191 u_int16_t ctl6_plen; /* payload length */ 192 u_int8_t ctl6_nxt; /* next header */ 193 u_int8_t ctl6_hlim; /* hop limit */ 194 } un_ctl6; 195 u_int8_t un_vfc; /* 4 bits version, 4 bits priority */ 196 } ip6_ctlun; 197 struct in6_addr ip6_src; /* source address */ 198 struct in6_addr ip6_dst; /* destination address */ 199 }; 201 #define ip6_vfc ip6_ctlun.un_vfc 202 #define ip6_flow ip6_ctlun.un_ctl6.ctl6_flow 203 #define ip6_plen ip6_ctlun.un_ctl6.ctl6_plen 204 #define ip6_nxt ip6_ctlun.un_ctl6.ctl6_nxt 205 #define ip6_hlim ip6_ctlun.un_ctl6.ctl6_hlim 206 #define ip6_hops ip6_ctlun.un_ctl6.ctl6_hlim 208 2.1.1. IPv6 Next Header Values 210 IPv6 defines many new values for the next header field. The follow- 211 ing constants are defined as a result of including . 213 #define IPPROTO_HOPOPTS 0 /* IPv6 hop-by-hop options */ 214 #define IPPROTO_IPV6 41 /* IPv6 header */ 215 #define IPPROTO_ROUTING 43 /* IPv6 routing header */ 216 #define IPPROTO_FRAGMENT 44 /* IPv6 fragmentation header */ 217 #define IPPROTO_ESP 50 /* encapsulating security payload */ 218 #define IPPROTO_AH 51 /* authentication header */ 219 #define IPPROTO_ICMPV6 58 /* ICMPv6 */ 220 #define IPPROTO_NONE 59 /* IPv6 no next header */ 221 #define IPPROTO_DSTOPTS 60 /* IPv6 destination options */ 223 Berkeley-derived IPv4 implementations also define IPPROTO_IP to be 224 0. This should not be a problem since IPPROTO_IP is used only with 225 IPv4 sockets and IPPROTO_HOPOPTS only with IPv6 sockets. 227 2.2. The icmp6hdr Structure 229 The ICMPv6 header is needed by numerous IPv6 applications including 230 Ping, Traceroute, router discovery daemons, and neighbor discovery 231 daemons. The following structure is defined as a result of includ- 232 ing . Note that this is a new header. 234 struct icmp6hdr { 235 u_int8_t icmp6_type; /* type field */ 236 u_int8_t icmp6_code; /* code field */ 237 u_int16_t icmp6_cksum; /* checksum field */ 238 union { 239 u_int32_t un_data32[1]; /* type-specific field */ 240 u_int16_t un_data16[2]; /* type-specific field */ 241 u_int8_t un_data8[4]; /* type-specific field */ 242 } icmp6_dataun; 243 }; 245 #define icmp6_data32 icmp6_dataun.un_data32 246 #define icmp6_data16 icmp6_dataun.un_data16 247 #define icmp6_data8 icmp6_dataun.un_data8 248 #define icmp6_pptr icmp6_data32[0] /* parameter prob */ 249 #define icmp6_mtu icmp6_data32[0] /* packet too big */ 250 #define icmp6_id icmp6_data16[0] /* echo request/reply */ 251 #define icmp6_seq icmp6_data16[1] /* echo request/reply */ 252 #define icmp6_maxdelay icmp6_data16[0] /* mcast group membership */ 254 2.2.1. ICMPv6 Type and Code Values 256 In addition to a common structure for the ICMPv6 header, common def- 257 initions are required for the ICMPv6 type and code fields. The fol- 258 lowing constants are also defined as a result of including 259 . 261 #define ICMPV6_DEST_UNREACH 1 262 #define ICMPV6_PKT_TOOBIG 2 263 #define ICMPV6_TIME_EXCEED 3 264 #define ICMPV6_PARAMPROB 4 266 #define ICMPV6_INFOMSG_MASK 0x80 /* all informational messages */ 268 #define ICMPV6_ECHORQST 128 269 #define ICMPV6_ECHORPLY 129 270 #define ICMPV6_MGM_QUERY 130 271 #define ICMPV6_MGM_REPORT 131 272 #define ICMPV6_MGM_REDUCTION 132 274 #define ICMPV6_DEST_UNREACH_NOROUTE 0 /* no route to destination */ 275 #define ICMPV6_DEST_UNREACH_ADMIN 1 /* communication with destination */ 276 /* administratively prohibited */ 277 #define ICMPV6_DEST_UNREACH_NOTNEIGHBOR 2 /* not a neighbor */ 278 #define ICMPV6_DEST_UNREACH_ADDR 3 /* address unreachable */ 279 #define ICMPV6_DEST_UNREACH_NOPORT 4 /* bad port */ 280 #define ICMPV6_TIME_EXCEED_HOPS 0 /* Hop Limit == 0 in transit */ 281 #define ICMPV6_TIME_EXCEED_REASSEMBLY 1 /* Reassembly time out */ 283 #define ICMPV6_PARAMPROB_HDR 0 /* erroneous header field */ 284 #define ICMPV6_PARAMPROB_NXT_HDR 1 /* unrecognized Next Header */ 285 #define ICMPV6_PARAMPROB_OPTS 2 /* unrecognized IPv6 option */ 287 The five ICMP message types defined by IPv6 neighbor discovery 288 (133-137) are defined in the next section. 290 2.2.2. ICMPv6 Neighbor Discovery Type and Code Values 292 The following constants are defined as a result of including 293 . Note that this is a new header. 295 #define ND6_ROUTER_SOLICITATION 133 296 #define ND6_ROUTER_ADVERTISEMENT 134 297 #define ND6_NEIGHBOR_SOLICITATION 135 298 #define ND6_NEIGHBOR_ADVERTISEMENT 136 299 #define ND6_REDIRECT 137 301 enum nd6_option { 302 ND6_OPT_SOURCE_LINKADDR=1, 303 ND6_OPT_TARGET_LINKADDR=2, 304 ND6_OPT_PREFIX_INFORMATION=3, 305 ND6_OPT_REDIRECTED_HEADER=4, 306 ND6_OPT_MTU=5, 307 ND6_OPT_ENDOFLIST=256 308 }; 310 struct nd_router_solicit { /* router solicitation */ 311 struct icmp6_hdr rsol_hdr; 312 }; 314 #define rsol_type rsol_hdr.icmp6_type 315 #define rsol_code rsol_hdr.icmp6_code 316 #define rsol_cksum rsol_hdr.icmp6_cksum 317 #define rsol_reserved rsol_hdr.icmp6_data32[0] 319 struct nd_router_advert { /* router advertisement */ 320 struct icmp6_hdr radv_hdr; 321 u_int32_t radv_reachable; /* reachable time */ 322 u_int32_t radv_retransmit; /* reachable retransmit time */ 323 }; 325 #define radv_type radv_hdr.icmp6_type 326 #define radv_code radv_hdr.icmp6_code 327 #define radv_cksum radv_hdr.icmp6_cksum 328 #define radv_maxhoplimit radv_hdr.icmp6_data8[0] 329 #define radv_m_o_res radv_hdr.icmp6_data8[1] 330 #define ND6_RADV_M_BIT 0x80 331 #define ND6_RADV_O_BIT 0x40 332 #define radv_router_lifetime radv_hdr.icmp6_data16[1] 334 struct nd6_nsolicitation { /* neighbor solicitation */ 335 struct icmp6_hdr nsol6_hdr; 336 struct in6_addr nsol6_target; 337 }; 339 struct nd6_nadvertisement { /* neighbor advertisement */ 340 struct icmp6_hdr nadv6_hdr; 341 struct in6_addr nadv6_target; 342 }; 344 #define nadv6_flags nadv6_hdr.icmp6_data32[0] 345 #define ND6_NADVERFLAG_ISROUTER 0x80 346 #define ND6_NADVERFLAG_SOLICITED 0x40 347 #define ND6_NADVERFLAG_OVERRIDE 0x20 349 struct nd6_redirect { /* redirect */ 350 struct icmp6_hdr redirect_hdr; 351 struct in6_addr redirect_target; 352 struct in6_addr redirect_destination; 353 }; 355 struct nd6_opt_prefix_info { /* prefix information */ 356 u_int8_t opt_type; 357 u_int8_t opt_length; 358 u_int8_t opt_prefix_length; 359 u_int8_t opt_l_a_res; 360 u_int32_t opt_valid_life; 361 u_int32_t opt_preferred_life; 362 u_int32_t opt_reserved2; 363 struct in6_addr opt_prefix; 364 }; 366 #define ND6_OPT_PI_L_BIT 0x80 367 #define ND6_OPT_PI_A_BIT 0x40 369 struct nd6_opt_mtu { /* MTU option */ 370 u_int8_t opt_type; 371 u_int8_t opt_length; 372 u_int16_t opt_reserved; 373 u_int32_t opt_mtu; 374 }; 376 2.3. Address Testing Macros 378 Some basic macros are needed for testing IPv6 addresses for certain 379 properties. Many applications, both elementary and advanced, can 380 benefit from these macros. 382 int IN6_IS_ADDR_UNSPECIFIED(const struct in6_addr *); 383 int IN6_IS_ADDR_LOOPBACK(const struct in6_addr *); 384 int IN6_IS_ADDR_MULTICAST(const struct in6_addr *); 385 int IN6_IS_ADDR_LINKLOCAL(const struct in6_addr *); 386 int IN6_IS_ADDR_SITELOCAL(const struct in6_addr *); 387 int IN6_IS_ADDR_V4MAPPED(const struct in6_addr *); 388 int IN6_IS_ADDR_V4COMPAT(const struct in6_addr *); 390 int IN6_IS_ADDR_MC_NODELOCAL(const struct in6_addr *); 391 int IN6_IS_ADDR_MC_LINKLOCAL(const struct in6_addr *); 392 int IN6_IS_ADDR_MC_SITELOCAL(const struct in6_addr *); 393 int IN6_IS_ADDR_MC_ORGLOCAL(const struct in6_addr *); 394 int IN6_IS_ADDR_MC_GLOBAL(const struct in6_addr *); 396 3. IPv6 Raw Sockets 398 Raw sockets are used to bypass the transport layer (TCP or UDP). 399 With IPv4, raw sockets are used to access ICMPv4, IGMPv4, and to 400 read and write IPv4 datagrams containing a protocol field that the 401 kernel does not process. An example of the latter is a routing dae- 402 mon for OSPF, since it uses IPv4 protocol field 89. With IPv6 raw 403 sockets will be used for ICMPv6 and to read and write IPv6 datagrams 404 containing a next header field that the kernel does not process. An 405 example of the latter is a routing daemon for IDRP. 407 All data sent via raw sockets MUST be in network byte order and all 408 data received received via raw sockets will be in network byte 409 order. This differs from the IPv4 raw sockets, which did not spec- 410 ify a byte ordering and typically used the host's byte order. 412 Another difference from IPv4 raw sockets is that complete packets 413 (that is, IPv6 packets with extension headers) cannot be transferred 414 via the IPv6 raw sockets API. Instead, ancillary data objects are 415 used to transfer the extension headers, as described later in this 416 document. 418 All fields in the IPv6 header that an application might want to 419 change (i.e., everything other than the version number) can be modi- 420 fied by the application. Hence there is probably no need for a 421 socket option similar to the IPv4 IP_HDRINCL socket option. 423 When we say "an ICMPv6 raw socket" we mean a socket created by call- 424 ing the socket function with the three arguments PF_INET6, SOCK_RAW, 425 and IPPROTO_ICMPV6. 427 3.1. Checksums 429 The kernel will calculate and insert the ICMPv6 checksum for ICMPv6 430 raw sockets. 432 For other raw IPv6 sockets (that is, for raw IPv6 sockets created 433 with a third argument other than IPPROTO_ICMPV6), the application 434 must set the new IPV6_CHECKSUM socket option to have the kernel com- 435 pute and store a checksum. This option prevents applications from 436 having to perform source address selection on the packets they send. 437 The checksum will incorporate the IPv6 pseudo-header. This new 438 socket option also specifies an integer offset into the user data of 439 where the checksum is to be placed. 441 int offset = 2; 442 setsockopt(fd, IPPROTO_IPV6, IPV6_CHECKSUM, offset, sizeof(offset)); 444 By default, this socket option is disabled, which means the kernel 445 will not calculate and store a checksum. 447 3.2. ICMPv6 Type Filtering 449 ICMPv4 raw sockets receive most ICMPv4 messages received by the ker- 450 nel. (We say "most" and not "all" because Berkeley-derived kernels 451 never pass echo requests, timestamp requests, or address mask 452 requests to a raw socket. Instead these three messages are pro- 453 cessed entirely by the kernel.) But ICMPv6 is a superset of ICMPv4, 454 also including the functionality of IGMPv4 and ARPv4. This means 455 that an ICMPv6 raw socket can potentially receive many more messages 456 than would be received with an ICMPv4 raw socket: ICMP messages sim- 457 ilar to ICMPv4, along with neighbor solicitations, neighbor adver- 458 tisements, and the three group membership messages. 460 Most applications using an ICMPv6 raw socket care about only a small 461 subset of the ICMPv6 message types. To transfer extraneous ICMPv6 462 messages from the kernel to user can incur a significant overhead. 463 Therefore this API includes a method of filtering ICMPv6 messages by 464 the ICMPv6 type field. 466 Each ICMPv6 raw socket has an associated filter whose datatype is 467 defined as 468 struct icmpv6_filter; 470 The current filter is fetched and stored using getsockopt() and set- 471 sockopt() with a level of IPPROTO_ICMPV6 and an option name of 472 ICMPV6_FILTER. 474 Six macros operate on an icmp6_filter structure: 476 void ICMPV6_FILTER_SETPASSALL (struct icmp6_filter *); 477 void ICMPV6_FILTER_SETBLOCKALL(struct icmp6_filter *); 479 void ICMPV6_FILTER_SETPASS ( int, struct icmp6_filter *); 480 void ICMPV6_FILTER_SETBLOCK( int, struct icmp6_filter *); 482 int ICMPV6_FILTER_WILLPASS (int, const struct icmp6_filter *); 483 int ICMPV6_FILTER_WILLBLOCK(int, const struct icmp6_filter *); 485 The first argument to the last four macros (an integer) is an ICMPv6 486 message type, between 0 and 255. The pointer argument to all six 487 macros is a pointer to a filter that is modified by the first four 488 macros and examined by the first two macros. 490 The first two macros, SETPASSALL and SETBLOCKALL, let us specify 491 that all ICMPv6 messages are passed to the application or that all 492 ICMPv6 messages are blocked from being passed to the application. 494 The next two macros, SETPASS and SETBLOCK, let us specify that mes- 495 sages of a given ICMPv6 type should be passed to the application or 496 not passed to the application (blocked). 498 The final two macros, WILLPASS and WILLBLOCK, return true or false 499 depending whether the specified message type is passed to the appli- 500 cation or blocked from being passed to the application by the filter 501 pointed to by the second argument. 503 When an ICMPv6 raw socket is created, it will by default pass all 504 ICMPv6 message types to the application. 506 As an example, a Ping program could execute the following: 508 struct icmp6_filter myfilt; 510 fd = socket(PF_INET6, SOCK_RAW, IPPROTO_ICMPV6); 512 ICMPV6_FILTER_SETBLOCKALL(&myfilt); 513 ICMPV6_FILTER_SETPASS(ICMPV6_ECHO_REPLY, &myfilt); 514 setsockopt(fd, IPPROTO_ICMPV6, ICMPV6_FILTER, &myfilt, sizeof(myfilt)); 516 The filter structure is declared and then initialized to block all 517 messages types. The filter structure is then changed to allow 518 ICMPv6 echo reply messages to be passed to the application and the 519 filter is installed using setsockopt(). 521 The icmp6_filter structure is similar to the fd_set datatype used 522 with the select() function in the sockets API. The icmp6_filter 523 structure is an opaque datatype and the application should not care 524 how it is implemented. All the application does with this datatype 525 is allocate a variable of this type, pass a pointer to a variable of 526 this type to getsockopt() and setsockopt(), and operate on a vari- 527 able of this type using the six macros that we just defined. 529 Nevertheless, it is worth showing a simple implementation of this 530 datatype and the six macros. 532 struct icmp6_filter { 533 u_int32m_t data[8]; /* 8*32 = 256 bits */ 534 }; 536 #define ICMPV6_FILTER_WILLPASS(type, filterp) \ 537 ((((filterp)->data[(type) >> 5]) & (1 << ((type) & 31))) != 0) 538 #define ICMPV6_FILTER_WILLBLOCK(type, filterp) \ 539 ((((filterp)->data[(type) >> 5]) & (1 << ((type) & 31))) == 0) 540 #define ICMPV6_FILTER_SETPASS(type, filterp) \ 541 ((((filterp)->data[(type) >> 5]) |= (1 << ((type) & 31)))) 542 #define ICMPV6_FILTER_SETBLOCK(type, filterp) \ 543 ((((filterp)->data[(type) >> 5]) &= ~(1 << ((type) & 31)))) 544 #define ICMPV6_FILTER_SETPASSALL(filterp) \ 545 memset((filterp), 0xFF, sizeof(struct icmp6_filter)) 546 #define ICMPV6_FILTER_SETBLOCKALL(filterp) \ 547 memset((filterp), 0, sizeof(struct icmp6_filter)) 549 4. Ancillary Data 551 4.2BSD allowed file descriptors to be transferred between separate 552 processes across a UNIX domain socket using the sendmsg() and 553 recvmsg() functions. Two members of the msghdr structure, 554 msg_accrights and msg_accrightslen, were used to send and receive 555 the descriptors. When the OSI protocols were added to 4.3BSD Reno 556 in 1990 the names of these two fields in the msghdr structure were 557 changed to msg_control and msg_controllen, because they were used by 558 the OSI protocols for "control information", although the comments 559 in the source code call this "ancillary data". 561 Other than the OSI protocols, the use of ancillary data has been 562 rare. In 4.4BSD, for example, the only use of ancillary data with 563 IPv4 is to return the destination address of a received UDP datagram 564 if the IP_RECVDSTADDR socket option is set. With Unix domain sock- 565 ets ancillary data is still used to send and receive descriptors. 567 Nevertheless the ancillary data fields of the msghdr structure pro- 568 vide a clean way to pass information in addition to the data that is 569 being read or written. The inclusion of the msg_control and 570 msg_controllen members of the msghdr structure along with the cms- 571 ghdr structure that is pointed to by the msg_control member is 572 required by the Posix.1g sockets API standard (which should be com- 573 pleted during 1997). 575 Ancillary data is used to exchange the following optional informa- 576 tion between the application and the kernel: 578 1. specify the outgoing interface and/or source address, 579 2. receive the incoming interface and destination address, 580 3. send and receive hop-by-hop options, 581 4. send and receive destination options, and 582 5. send and receive routing headers. 584 Before describing these uses in detail, we review the definition of 585 the msghdr structure itself, the cmsghdr structure that defines an 586 ancillary data object, and some macros that operate on the ancillary 587 data objects. 589 4.1. The msghdr Structure 591 The msghdr structure is used by the recvmsg() and sendmsg() func- 592 tions. Its Posix.1g definition is: 594 struct msghdr { 595 void *msg_name; /* ptr to socket address structure */ 596 size_t msg_namelen; /* size of socket address structure */ 597 struct iovec *msg_iov; /* scatter/gather array */ 598 size_t msg_iovlen; /* # elements in msg_iov */ 599 void *msg_control; /* ancillary data */ 600 size_t msg_controllen; /* ancillary data buffer length */ 601 int msg_flags; /* flags on received message */ 602 }; 604 The structure is declared as a result of including . 606 Most Berkeley-derived implementations limit the amount of ancillary 607 data in a call to sendmg() to no more than 108 bytes (an mbuf). 608 This API requires a minimum of 10240 bytes of ancillary data, but it 609 is recommended that the amount be limited only by the buffer space 610 reserved by the socket (which can be modified by the SO_SNDBUF 611 socket option). 613 4.2. The cmsghdr Structure 615 The cmsghdr structure describes ancillary data objects transferred 616 by recvmsg() and sendmsg(): 618 struct cmsghdr { 619 size_t cmsg_len; /* #bytes, including this header */ 620 int cmsg_level; /* originating protocol */ 621 int cmsg_type; /* protocol-specific type */ 622 /* followed by unsigned char cmsg_data[]; */ 623 }; 625 This structure is declared as a result of including . 627 When ancillary data is sent or received, any number of ancillary 628 data objects can be specified by the msg_control and msg_controllen 629 members of the msghdr structure, because each object is preceded by 630 its length (the cmsg_len member). Historically Berkeley-derived 631 implementations have passed only one object at a time, but this API 632 allows multiple objects to be passed in a single call to sendmsg() 633 or recvmsg(). The following example shows two ancillary data 634 objects in a control buffer. 636 |<---------------------------- msg_controllen -------------------------->| 637 | | 638 |<------ ancillary data object ------>|<----- ancillary data object ---->| 639 | | | 640 |<------------- cmsg_len ------------>|<---------- cmsg_len ------------>| 641 | | | 642 +------------------------------------------------------------------------+ 643 |cmsg_ |cmsg_ |cmsg_ | |cmsg_ |cmsg_ |cmsg_ | | 644 |len |level |type | cmsg_data[] |len |level |type | cmsg_data[] | 645 +------------------------------------------------------------------------+ 646 ^ 647 | 648 msg_control 649 points here 651 To aid in the manipulation of ancillary data objects, three macros 652 from 4.4BSD are defined by Posix.1g: CMSG_DATA(), CMSG_NXTHDR(), and 653 CMSG_FIRSTHDR(). Before describing these macros, we show the fol- 654 lowing example of how they might be used with a call to recvmsg(). 656 struct msghdr msg; 657 struct cmsghdr *cmsgptr; 659 /* fill in msg */ 661 /* call recvmsg() */ 663 if (msg.msg_controllen > 0) { 664 for (cmsgptr = CMSG_FIRSTHDR(&msg); cmsgptr != NULL; 665 cmsgptr = CMSG_NXTHDR(&msg, cmsgptr)) { 666 if (cmsgptr->cmsg_level == ... && cmsgptr->cmsg_type == ... ) { 667 u_char *ptr; 669 ptr = CMSG_DATA(cmsgptr); 670 /* process data pointed to by ptr */ 671 } 672 } 673 } 675 We now describe the three Posix.1g macros, followed by two more that 676 are new with this API: CMSG_SPACE and CMSG_LENGTH. 678 4.2.1. CMSG_FIRSTHDR 680 struct cmsghdr *CMSG_FIRSTHDR(struct msghdr *mhdr); 682 CMSG_FIRSTHDR returns a pointer to the first cmsghdr structure in 683 the msghdr structure pointed to by mhdr. The macro returns NULL if 684 there is no ancillary data pointed to the by msghdr structure. 686 The application must check that msg_controllen is greater than 0 687 before calling CMSG_FIRSTHDR, because if the application asks for 688 control information (by setting msg_control nonnull and 689 msg_controllen greater than 0 when calling recvmsg()), but there is 690 none to pass back, the kernel just sets msg_control to 0 upon 691 return. 693 4.2.2. CMSG_NXTHDR 695 struct cmsghdr *CMSG_NXTHDR(struct msghdr *mhdr, 696 struct cmsghdr *cmsg); 698 CMSG_NXTHDR returns a pointer to the cmsghdr structure describing 699 the next ancillary data object. mhdr is a pointer to a msghdr 700 structure and cmsg is a pointer to a cmsghdr structure. If there is 701 not another ancillary data object, the return value is NULL. 703 The following behavior of this macro is new to this API specifica- 704 tion. If the value of the cmsg pointer is NULL, a pointer to the 705 cmsghdr structure describing the first ancillary data object is 706 returned. If there are no ancillary data objects, the return value 707 is NULL. 709 4.2.3. CMSG_DATA 711 unsigned char *CMSG_DATA(struct cmsghdr *cmsg); 713 CMSG_DATA returns a pointer to the data (what is called the 714 cmsg_data[] member, even though such a member is not defined in the 715 structure) following a cmsghdr structure. 717 4.2.4. CMSG_SPACE 719 unsigned int CMSG_SPACE(unsigned int length); 721 This function is new with this API. Given the length of an ancil- 722 lary data object, CMSG_SPACE returns the space required by the 723 object and its cmsghdr structure, including any padding needed to 724 satisfy alignment requirements. This function should not be used to 725 initialize the cmsg_len member of a cmsghdr structure; instead use 726 the CMSG_LENGTH function. 728 4.2.5. CMSG_LENGTH 730 unsigned int CMSG_LENGTH(unsigned int length); 732 This macro is new with this API. Given the length of an ancillary 733 data object, CMSG_LENGTH returns the value to store in the cmsg_len 734 member of the cmsghdr structure, taking into account any padding 735 needed to satisfy alignment requirements. 737 4.3. Summary of Options Described Using Ancillary Data 739 We mentioned that five pieces of optional information are passed 740 between the application and the kernel using ancillary data: 742 1. specify the outgoing interface and/or source address, 743 2. receive the incoming interface and destination address, 744 3. send and receive hop-by-hop options, 745 4. send and receive destination options, and 746 5. send and receive routing headers. 748 First, to receive the optional information (items 2-5) the applica- 749 tion must call setsockopt() to turn on a flag: 751 int on = 1; 753 setsockopt(fd, IPPROTO_IPV6, IPV6_RXINFO, &on, sizeof(on)); 754 setsockopt(fd, IPPROTO_IPV6, IPV6_RXHOPOPTS, &on, sizeof(on)); 755 setsockopt(fd, IPPROTO_IPV6, IPV6_RXDSTOPTS, &on, sizeof(on)); 756 setsockopt(fd, IPPROTO_ROUTING, IPV6_RXSRCRT, &on, sizeof(on)); 758 When any of these options are enabled, the corresponding data is 759 returned as control information by recvmsg(), as one or more ancil- 760 lary data objects. 762 Nothing special need be done to send any of this optional informa- 763 tion (items 1 and 3-5 in the list above); the application just calls 764 sendmsg() and specifies one or more ancillary data objects as con- 765 trol information. 767 We also summarize the three cmsghdr fields that describe each of the 768 five ancillary data objects: 770 cmsg_level cmsg_type cmsg_data[] 771 --------------- ------------ ------------------------ 772 IPPROTO_IPV6 IPV6_RXINFO in6_pktinfo structure 773 IPPROTO_IPV6 IPV6_TXINFO in6_pktinfo structure 774 IPPROTO_HOPOPTS option_type actual option 775 IPPROTO_DSTOPTS option_type actual option 776 IPPROTO_ROUTING routing_type implementation dependent 778 These are described in detail in following sections. 780 4.4. TCP Access to Ancillary Data 782 The summary in the previous section assumes a UDP socket. Sending 783 and receiving ancillary data is easy for with UDP: the application 784 calls sendmsg() and recvmsg() instead of sendto() and recvfrom(). 786 But there might be cases where a TCP application wants to send or 787 receive this optional information. For example, a TCP client might 788 want to specify a source route and this needs to be done before 789 calling connect(). Similarly a TCP server might want to know the 790 received interface after accept() returns along with any destination 791 options. 793 One new socket option is defined to allow easy TCP access to these 794 optional fields. Setting the socket option specifies any of the 795 optional output fields: 797 setsockopt(fd, IPPROTO_IPV6, IPV6_PKTOPTIONS, &buf, len); 799 The fourth argument points to a buffer containing one or more ancil- 800 lary data objects, and the fifth argument is the total length of all 801 these objects. The application fills in this buffer exactly as if 802 the buffer were being passed to sendmsg() as control information. 804 The corresponding receive option 806 getsockopt(fd, IPPROTO_IPV6, IPV6_PKTOPTIONS, &buf, &lenptr); 808 returns a buffer with one or more ancillary data objects for all the 809 optional receive information that the application has previously 810 specified that it wants to receive. The fourth argument points to 811 the buffer that is filled in by the call. The fifth argument is a 812 pointer to a value-result integer: when the function is called the 813 integer specifies the size of the buffer pointed to by the fourth 814 argument, and upon return this integer contains the actual number of 815 bytes that were returned. The application processes this buffer 816 exactly as if the buffer were returned by recvmsg() as control 817 information. 819 The options set by calling setsockopt() for IPV6_PKTOPTIONS are 820 called "sticky" options because once set they apply to all packets 821 sent on that socket. They may, however, be overridden with ancil- 822 lary data specified in a call to sendmsg(). 824 But the following three options are considered a set: hop-by-hop, 825 destination, and routing header options. If any of these three 826 options are specified in a call to sendmsg(), then none of these 827 three from the socket's sticky options are sent for this packet. 828 For example, if the application calls setsockopt() for 829 IPV6_PKTOPTIONS and sets sticky values for the hop-by-hop and desti- 830 nation options, but then calls sendmsg() specifying just a routing 831 header as an ancillary data object, then only the routing header is 832 sent with this packet. The two sticky options, hop-by-hop and des- 833 tination, are not sent for this packet. 835 5. Interface Identification 836 Some applications need to know the interface on which a packet was 837 received and some applications need to specify the interface on 838 which a packet is to be transmitted. Thus a technique is required 839 to identify the interfaces on a system. 841 On Berkeley-derived implementations, when an interface is made known 842 to the system, the kernel assigns a unique positive integer value 843 (called the interface index) to that interface. These are small 844 positive integers that start at 1. There may be gaps so that there 845 is no current interface for a particular interface index. 847 5.1. Obtaining the Interface Index 849 Currently, there is no simple way to get the index of the interface. 850 (4.4BSD returns the index as part of the datalink socket address 851 structures returned by the ioctl() of SIOGCIFCONF, but not all sys- 852 tems support the AF_LINK socket address structure.) Since the 853 interface index is widely used throughout this API a new ioctl() 854 command is defined to retrieve it: SIOCGIFINDEX. This command uses 855 the standard ifreq structure (shown below) and when supplied with a 856 interface name it returns the interface index in the ifr_ifindex 857 member of the ifreq structure. Note that the ifr_ifindex is a new 858 addition to the ifreq structure and should have a type of "int". 860 5.2. The ifreq Structure 862 The ifreq structure is used by many of the existing interface ioctls 863 to specify or obtain information or attributes of an interface. For 864 example, given the name of an interface (e.g., "de0" or "le0") the 865 SIOCGIFADDR command returns the primary IPv4 address of the inter- 866 face. The ifreq structure is declared as a result of including the 867 header, and on many implementations looks like the fol- 868 lowing: 870 struct ifreq { 871 #define IFNAMSIZ 16 872 char ifr_name[IFNAMSIZ]; /* if name, e.g., "en0" */ 873 union { 874 struct sockaddr ifru_addr; 875 struct sockaddr ifru_dstaddr; 876 struct sockaddr ifru_broadaddr; 877 short ifru_svalue; 878 int ifru_ivalue; 879 caddr_t ifru_data; 880 } ifr_ifru; 881 }; 883 To fetch some property of the interface the application stores the 884 interface name into the ifr_name[] array and calls ioctl() with a 885 command such as SIOCGIFADDR. The returned value is in one member of 886 the union, the exact member depending on the specific command. 887 Numerous names are defined to access the members of the union: 889 #define ifr_addr ifr_ifru.ifru_addr /* address */ 890 #define ifr_dstaddr ifr_ifru.ifru_dstaddr /* other end of p-to-p link */ 891 #define ifr_broadaddr ifr_ifru.ifru_broadaddr /* broadcast address */ 892 #define ifr_flags ifr_ifru.ifru_svalue /* flags */ 893 #define ifr_metric ifr_ifru.ifru_value /* metric */ 894 #define ifr_mtu ifr_ifru.ifru_value /* mtu */ 895 #define ifr_ifindex ifr_ifru.ifru_value /* interface index */ 896 #define ifr_data ifr_ifru.ifru_data /* for use by interface */ 898 What this API specifies is the new command of SIOCGIFINDEX and a 899 definition of the name ifr_ifindex. Issuing this command for an 900 interface returns the index of the interface. For example, 902 struct ifreq ifr; 904 strcpy(ifr.ifr_name, "de0"); 905 ioctl(fd, SIOGIFINDEX, &ifr); 907 5.3. Returning Received Interface and Destination IPv6 Address 909 An application may need to know the destination IPv6 address and the 910 received interface. This information is returned in an in6_pktinfo 911 structure as ancillary data if the IPV6_RXINFO socket option is 912 enabled. This structure is defined as a result of including the 913 header. 915 struct in6_pktinfo { 916 int ipi6_ifindex; /* interface index */ 917 struct in6_addr ipi6_addr; /* IPv6 address */ 918 }; 920 In the cmsghdr structure containing this ancillary data, the 921 cmsg_level member will be IPPROTO_IPV6, the cmsg_type member will be 922 IPV6_RXINFO, and the first byte of cmsg_data[] will be the first 923 byte of the in6_pktinfo structure. 925 Note that this structure is defined only with IPv6 address formats. 926 Use of this option with the IPv4 API is beyond the scope of this 927 document. (Note that most 4.4BSD-based implementations support the 928 IP_RECVDSTADDR socket option, which returns the destination IPv4 929 address as ancillary data.) 931 The application must set the IPV6_RXINFO socket option for this 932 information to be returned: 934 int on = 1; 935 setsockopt(fd, IPPROTO_IPV6, IPV6_RXINFO, &on, sizeof(on)); 937 5.4. Specifying Outgoing Interface and Source IPv6 Address 939 An application may need to specify the outgoing interface, the 940 source address, or both. Note that the source address can also be 941 specified by calling bind() before each output operation, but sup- 942 plying the source address together with the data requires less over- 943 head (i.e., system calls) and requires less state to be stored and 944 protected in a multithreaded application. 946 The in6_pktinfo structure defined in the previous section is also 947 used to specify the outgoing interface and the source address. The 948 structure is passed as ancillary data to sendmsg() with a cmsg_level 949 of IPPROTO_IPV6, a cmsg_type of IPV6_TXINFO, and the first byte of 950 cmsg_data[] being the first byte of the in6_pktinfo structure. 952 No socket option need be set to use this feature. 954 If the ipi6_ifindex is 0, the kernel will choose the outgoing inter- 955 face. If ipi6_addr is the unspecified address (IN6ADDR_ANY_INIT), 956 then (a) if an address is currently bound to the socket, it is used 957 as the source address, or (b) if no address is currently bound to 958 the socket, the kernel will choose the source address. 960 5.4.1. Additional Errors with sendmsg() 962 With the IPV6_RXINFO socket option there are no additional errors 963 possible with the call to recvmsg(). But when specifying the outgo- 964 ing interface or the source address, additional errors are possible 965 from sendmsg(): 967 ENXIO The interface specified by ipi6_ifindex does not 968 exist. 970 ENETDOWN The interface specified by ipi6_ifindex is not enabled 971 for IPv6 use. 973 EADDRNOTAVAIL ipi6_ifindex specifies an interface but the address 974 ipi6_addr is not available for use on that interface. 976 EHOSTUNREACH No route to the destination exists over the interface 977 specified by ifi6_ifindex. 979 6. Hop-By-Hop Options 981 A variable number of hop-by-hop options can appear in a single hop- 982 by-hop options header. Each option in the header is TLV-encoded 983 with a type, length, and value. 985 Today only three hop-by-hop options are defined for IPv6 [1]: jumbo 986 payload, pad1, and padN. None of these three should be passed back 987 to an application and an application should receive an error if it 988 attempts to set any of these three options. The jumbo payload 989 option is processed entirely by the kernel. It is indirectly speci- 990 fied by datagram-based applications as the size of the datagram to 991 send and indirectly passed back to these applications as the length 992 of the received datagram. The two pad options are for alignment 993 purposes and are automatically inserted by a sending kernel when 994 needed and ignored by the receiving kernel. This section of the API 995 is therefore defined for future hop-by-hop options that an applica- 996 tion may need to specify and receive. 998 6.1. Receiving Hop-by-Hop Options 1000 To receive hop-by-hop options the application must enable the 1001 IPV6_RXHOPOPTS socket option: 1003 int on = 1; 1004 setsockopt(fd, IPPROTO_IPV6, IPV6_RXHOPOPTS, &on, sizeof(on)); 1006 Each individual option is returned as an ancillary data object 1007 described by a cmsghdr structure. The cmsg_level member will be 1008 IPPROTO_HOPOPTS, the cmsg_type member will be the option type, and 1009 the first byte of cmsg_data[] is the first byte of the option data. 1011 6.2. Sending Hop-by-Hop Options 1013 To send one or more hop-by-hop options, the application just speci- 1014 fies them as ancillary data in a call to sendmsg(). No socket 1015 option need be set. 1017 Each option is specified as an ancillary data object by a cmsghdr 1018 structure. The cmsg_level member is set to IPPROTO_HOPOPTS, the 1019 cmsg_type member is set to the option type, and the first byte of 1020 cmsg_data[] is the first byte of the option data. 1022 Additional errors may be possible from sendmsg() if the specified 1023 option is in error. 1025 7. Destination Options 1027 A variable number of destination options can appear in one or more 1028 destination option headers. As stated in [1], a destination options 1029 header appearing before a routing header is processed by the first 1030 destination plus any subsequent destinations specified in the rout- 1031 ing header, while a destination options header appearing after a 1032 routing header is processed only by the final destination. As with 1033 the hop-by-hop options, each option in a destination options header 1034 is TLV-encoded with a type, length, and value. 1036 Today no destination options are defined for IPv6 [1], although pro- 1037 posals exist to use destination options with mobility and anycast- 1038 ing. 1040 7.1. Receiving Destination Options 1042 To receive destination options the application must enable the 1043 IPV6_RXDSTOPTS socket option: 1045 int on = 1; 1046 setsockopt(fd, IPPROTO_IPV6, IPV6_RXDSTOPTS, &on, sizeof(on)); 1048 Each individual option is returned as an ancillary data object 1049 described by a cmsghdr structure. The cmsg_level member will be 1050 IPPROTO_DSTOPTS, the cmsg_type member will be the option type, and 1051 the first byte of cmsg_data[] is the first byte of the option data. 1053 7.2. Sending Destination Options 1055 To send one or more destination options, the application just speci- 1056 fies them as ancillary data in a call to sendmsg(). No socket 1057 option need be set. 1059 Each option is specified as an ancillary data object by a cmsghdr 1060 structure. The cmsg_level member is set to IPPROTO_DSTOPTS, the 1061 cmsg_type member is set to the option type, and the first byte of 1062 cmsg_data[] is the first byte of the option data. 1064 Additional errors may be possible from sendmsg() if the specified 1065 option is in error. 1067 8. Source Route Option 1069 Source routing in IPv6 is accomplished by specifying a routing 1070 header as an extension header. There can be different types of 1071 routing headers, but IPv6 currently defines only the type 0 routing 1072 header. This type supports up to 24 intermediate destinations, each 1073 of which is defined as a loose or a strict hop. 1075 Source routing with IPv4 sockets API (the IP_OPTIONS socket option) 1076 requires the application to build the source route in the format 1077 that appears as the IPv4 header option, requiring intimate knowledge 1078 of the IPv4 options format. This API, however, defines seven func- 1079 tions that the application calls to build and examine a routing 1080 header. Three functions build a routing header: 1082 inet6_srcrt_space() - return #bytes required for ancillary data 1083 inet6_srcrt_init() - initialize ancillary data for routing header 1084 inet6_srcrt_add() - add IPv6 address & flags to routing header 1086 Four functions deal with a returned routing header: 1088 inet6_srcrt_reverse() - reverse a routing header 1089 inet6_srcrt_segments() - return #segments in a routing header 1090 inet6_srcrt_getaddr() - fetch one address from a routing header 1091 inet6_srcrt_getflags() - fetch one flag from a routing header 1093 A routing header is passed between the application and the kernel as 1094 ancillary data. The cmsg_level member has a value of 1095 IPPROTO_ROUTING and the cmsg_type member specifies the routing 1096 header type (e.g., 0 for a type 0 routing header). The contents of 1097 the cmsg_data[] member is implementation dependent and should not be 1098 accessed directly by the application, but should be accessed using 1099 the seven functions that we are about to describe. The implementa- 1100 tion-dependent contents of the cmsg_data[] member can maintain state 1101 information between successive calls to the functions below when 1102 building a routing header, to make these functions thread safe. For 1103 example, implementations could store the "number of segments left" 1104 field for the type 0 routing header here, initialize it to 0 when 1105 inet_srcrt_init() is called, and increment it each time 1106 inet6_srcrt_add() is called. 1108 The following constants are defined in the header: 1110 #define IPV6_SRCRT_LOOSE 0 /* this hop need not be a neighbor */ 1111 #define IPV6_SRCRT_STRICT 1 /* this hop must be a neighbor */ 1113 #define IPV6_SRCRT_TYPE_0 0 /* IPv6 routing header type 0 */ 1115 We note that when a routing header is specified, the destination 1116 address specified for connect(), sendto(), or sendmsg() is the 1117 address of the first hop in the source route. The routing header 1118 then contains the addresses of all subsequent hops, and the last 1119 entry in the routing header is the address of the final destination. 1121 8.1. inet6_srcrt_space 1123 size_t inet6_srcrt_space(int type, int segments); 1125 This function returns the maximum number of bytes required to hold a 1126 routing header of the specified type containing the specified number 1127 of segments (addresses). The return value includes the size of the 1128 cmsghdr structure that precedes the routing header. 1130 If the return value is 0, then either the type of the routing header 1131 is not supported by this implementation or the number of segments is 1132 invalid for this type of routing header. 1134 8.2. inet6_srcrt_init 1136 struct cmsghdr *inet6_srcrt_init(void *bp, int type); 1138 This function initializes the buffer pointed to by bp to contain a 1139 cmsghdr structure followed by a routing header of the specified 1140 type. The cmsg_len member of the cmsghdr structure is initialized 1141 to the size of the structure plus the amount of space required by 1142 the routing header. The cmsg_level and cmsg_type members are ini- 1143 tialized as required by the type of routing header. 1145 The return value is the pointer to the cmsghdr structure. The 1146 caller must allocate the buffer and its size can be determined by 1147 calling inet6_srcrt_space(). 1149 If the type of routing header is not supported by the implementa- 1150 tion, the return value is NULL. 1152 8.3. inet6_srcrt_add 1153 int inet6_srcrt_add(struct cmsghdr *cmsg, 1154 const struct in6_addr *addr, unsigned int flags); 1156 This function adds the address pointed to by addr to the end of the 1157 routing header being constructed and sets the type of this hop to 1158 the value of flags. For an IPv6 type 0 routing header, flags must 1159 be either IPV6_SRCRT_LOOSE or IPV6_SRCRT_STRICT. 1161 If successful, the cmsg_len member of the cmsghdr structure is 1162 updated to account for the new address in the routing header and the 1163 return value of the function is 0. 1165 If the address would exceed the limits of the routing header, the 1166 return value of the function is ENOSPC. If flags specifies an 1167 invalid value for the routing header, the return value of the func- 1168 tion is EINVAL. 1170 8.4. inet6_srcrt_reverse 1172 int inet6_srcrt_reverse(const struct cmsghdr *in, struct cmsghdr *out); 1174 This function takes a routing header that was received as ancillary 1175 data (pointed to by the first argument) and writes a new routing 1176 header that sends datagrams along the reverse of that route. Both 1177 arguments are allowed to point to the same buffer (that is, the 1178 reversal can occur in place). The return value of the function is 0 1179 on success. 1181 If the type of routing header in not supported by the implementa- 1182 tion, the return value of the function is EOPNOTSUPP. If the rout- 1183 ing header information is invalid, the return value of the function 1184 is EINVAL. 1186 8.5. inet6_srcrt_segments 1188 int inet6_srcrt_segments(const struct cmsghdr *cmsg) 1190 This function returns the number of segments (addresses) contained 1191 in the routing header described by cmsg. The return value is -1 if 1192 the cmsghdr structure does not describe a valid routing header or is 1193 a routing header of an unsupported type. 1195 8.6. inet6_srcrt_getaddr 1196 struct in6_addr *inet6_srcrt_getaddr(struct cmsghdr *cmsg, int offset); 1198 This function returns a pointer to the IPv6 address indexed by off- 1199 set (which starts at 0) in the routing header described by cmsg. An 1200 application should first call inet_srcrt_segments() to obtain the 1201 number of segments in the routing header. 1203 If offset refers to an address beyond the end of the routing header, 1204 the return value is NULL. 1206 8.7. inet6_srcrt_getflags 1208 int inet6_srcrt_getflags(const struct cmsghdr *cmsg, int offset); 1210 This function returns the flags value indexed by offset (which 1211 starts at 0) in the routing header described by cmsg. For an IPv6 1212 type 0 routing header the return value will be either 1213 IPV6_SRCRT_LOOSE or IPV6_SRCRT_STRICT. 1215 If offset refers to a segment beyond the end of the routing header, 1216 the return value is -1. 1218 9. Ordering of Ancillary Data and IPv6 Extension Headers 1220 Three IPv6 extension headers can be specified by the application and 1221 returned to the application using ancillary data with sendmsg() and 1222 recvmsg(): hop-by-hop options, destination options, and the routing 1223 header. When multiple ancillary data objects are transferred via 1224 sendmsg() or recvmsg() and these objects represent any of these 1225 three extension headers, their placement in the control buffer is 1226 directly tied to their location in the corresponding IPv6 datagram. 1227 This API imposes some ordering constraints when using multiple 1228 ancillary objects with sendmsg(). 1230 When multiple IPv6 hop-by-hop options having the same option type 1231 are specified, these options will be inserted into the hop-by-hop 1232 options header in the same order as they appear in the control 1233 buffer. But when multiple hop-by-hop options having different 1234 option types are specified, these options may be reordered by the 1235 kernel to reduce padding in the hop-by-hop options header. Hop-by- 1236 hop options may appear anywhere in the control buffer and will 1237 always be collected by the kernel and placed into a single hop-by- 1238 hop options header that appears immediately following the IPv6 1239 header. 1241 Similar rules apply to the destination options: (1) those of the 1242 same type will appear in the same order as they are specified, and 1243 (2) those of differing types may be reordered. But the kernel will 1244 build up to two destination options headers: one to precede the 1245 routing header and one to follow the routing header. If the appli- 1246 cation specifies a routing header then all destination options that 1247 appear in the control buffer before the routing header will appear 1248 in a destination options header before the routing header and these 1249 options might be reordered, subject to the two rules that we just 1250 stated. Similarly all destination options that appear in the con- 1251 trol buffer after the routing header will appear in a destination 1252 options header after the routing header, and these options might be 1253 reordered, subject to the two rules that we just stated. 1255 As an example, assume that an application specifies control informa- 1256 tion to sendmsg() containing six ancillary data objects: two hop-by- 1257 hop options (both of different types), three destination options 1258 (all of different types), and a routing header. We number these 1-6 1259 corresponding to their order in the control buffer. We then show 1260 the final arrangement of the options in the extension headers built 1261 by the kernel: 1263 Ancillary Data Objects --> IPv6 Extension Headers 1264 HOPOPT-1 (first) HOPHDR(5,1) 1265 DSTOPT-2 DSTHDR(3,2) 1266 DSTOPT-3 RTGHDR(4) 1267 SRCRT-4 DSTHDR(6) 1268 HOPOPT-5 1269 DSTOPT-6 (last) 1271 The two hop-by-hop options are reordered, as are the first two des- 1272 tination options. The first two destination options must appear in 1273 a destination header before the routing header, and the final desti- 1274 nation option must appear in a destination header after the routing 1275 header. 1277 If destination options are specified in the control buffer after a 1278 routing header, or if destination options are specified without a 1279 routing header, the kernel will place those destination options 1280 after an authentication header and/or an encapsulating security pay- 1281 load header, if present. 1283 10. Additional Items 1285 Discussion is needed on whether or not the following items should be 1286 included in this advanced API specification. 1288 10.1. Path MTU Discovery and UDP 1290 Should a standard method be defined for a UDP application to deter- 1291 mine the "maximum send transport-message size" [3; Section 5.1] to a 1292 given destination? This would let the UDP application send smaller 1293 datagrams to the destination, avoiding fragmentation. 1295 10.2. Neighbor Reachability and UDP 1297 Should a standard method be defined for a UDP application to tell 1298 the kernel that it is making forward progress with a given peer [4; 1299 Section 7.3.1]? This could save unneeded neighbor solicitations and 1300 neighbor advertisements. 1302 10.3. Reading the Routing Table 1304 There are currently two techniques used by advanced applications on 1305 Unix systems to read the kernel's routing table. 1307 1. Applications can grovel through the kernel's memory (/dev/kmem 1308 on Unix) to read the routing table. This requires intimate 1309 knowledge of the internal routing table format, requires permis- 1310 sion to read the kernel memory, and is nonportable. (Note that 1311 the two common routing table ioctl() commands, SIOCADDRT and 1312 SIOCDELRT, only add and delete routing table entries. There is 1313 no common ioctl() command to return routing table entries.) 1315 2. 4.3BSD Reno introduced a new function, sysctl(), and one of its 1316 commands, NET_RT_DUMP, returns the routing table. The caller 1317 can optionally specify a protocol family (e.g., AF_INET for IPv4 1318 or AF_INET6 for IPv6) so that only routing table entries for 1319 that address family are returned. 1321 Should this API specify a higher-level set of functions to return 1322 the routing table, that can be implemented on a wide range of sys- 1323 tems? 1325 10.4. Obtaining Interface and Address Information 1327 Most applications that need to obtain a list of all the interfaces 1328 on the system call ioctl() with a command of SIOCGIFCONF after fill- 1329 ing in an ifconf structure with a pointer to a buffer (ifc_buf) in 1330 which the returned information is returned, and the size of that 1331 buffer (ifc_len). On return the buffer is filled in with the inter- 1332 face information and ifc_len is updated to indicate how much data is 1333 in the buffer. 1335 But when using this command there is no way for the application to 1336 know how large a buffer to allocate before calling ioctl(). Also 1337 what is returned is information for all interfaces and for all 1338 addresses across all address families for those interfaces. 1340 Should a new ioctl() command be defined (SIOCGIFINFO) with a new 1341 structure that is similar to the ifconf structure, but with two new 1342 members: ifc_index and ifc_family? The changes from the existing 1343 SIOCGIFCONF would be: 1345 1. If the ifc_buf member is NULL then nothing is returned other 1346 than setting the ifc_len member to the amount of space required 1347 to hold the requested interface information. 1349 2. If the ifc_index member is nonzero, information is returned for 1350 only the interface with that index. 1352 3. If the ifc_family member is not AF_UNSPEC, the only addresses 1353 returned are those for the specified address family (AF_INET or 1354 AF_INET6, for example). 1356 11. References 1358 [1] Deering, S., Hinden, R., "Internet Protocol, Version 6 (IPv6), 1359 Specification", RFC 1883, Dec. 1995. 1361 [2] Gilligan, R. E., Thomson, S., Bound, J., "Basic Socket Inter- 1362 face Extensions for IPv6", Internet-Draft, draft-ietf-ipngwg- 1363 bsd-api-05.txt, April 1996. 1365 [3] McCann, J., Deering, S., Mogul, J, "Path MTU Discovery for IP 1366 version 6", RFC 1981, Aug. 1996. 1368 [4] Narten, T., Nordmark, E., Simpson, W., "Neighbor Discovery for 1369 IP Version 6 (IPv6)", RFC 1970, Aug. 1996. 1371 12. Acknowledgments 1373 Matt Thomas and Jim Bound have been working on the technical details 1374 in this draft for over a year. Keith Sklower is the original 1375 implementor of ancillary data in the BSD networking code. 1377 13. Authors' Addresses 1379 W. Richard Stevens 1380 1202 E. Paseo del Zorro 1381 Tucson, AZ 85718 1382 Email: rstevens@kohala.com 1384 Matt Thomas 1385 Digital Equipment Corporation 1386 550 King St, LKG2-2/Q5 1387 Littleton, MA 01460 1388 Email: thomas@lkg.dec.com