idnits 2.17.1 draft-ietf-sipping-service-identification-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 1048. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1059. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1066. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1072. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 4, 2008) is 5745 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-13) exists of draft-ietf-ecrit-framework-05 == Outdated reference: A later version (-04) exists of draft-rosenberg-sip-app-media-tag-02 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SIPPING J. Rosenberg 3 Internet-Draft Cisco 4 Intended status: Informational August 4, 2008 5 Expires: February 5, 2009 7 Identification of Communications Services in the Session Initiation 8 Protocol (SIP) 9 draft-ietf-sipping-service-identification-03 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on February 5, 2009. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2008). 40 Abstract 42 This document considers the problem of service identification in the 43 Session Initiation Protocol (SIP). Service identification is the 44 process of determining the user-level use case that is driving the 45 signaling being utilized by the user agent. This document discusses 46 the uses of service identification, and outlines several 47 architectural principles behind the process. It identifies perils 48 when service identification is not done properly - including fraud, 49 interoperability failures and stifling of innovation. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 54 2. Services and Service Identification . . . . . . . . . . . . . 4 55 3. Example Services . . . . . . . . . . . . . . . . . . . . . . . 5 56 3.1. IPTV vs. Multimedia . . . . . . . . . . . . . . . . . . . 5 57 3.2. Gaming vs. Voice Chat . . . . . . . . . . . . . . . . . . 6 58 3.3. Gaming vs. Voice Chat #2 . . . . . . . . . . . . . . . . . 6 59 3.4. Configuration vs. Pager Messaging . . . . . . . . . . . . 6 60 4. Using Service Identification . . . . . . . . . . . . . . . . . 7 61 4.1. Application Invocation in the User Agent . . . . . . . . . 7 62 4.2. Application Invocation in the Network . . . . . . . . . . 9 63 4.3. Network Quality of Service Authorization . . . . . . . . . 9 64 4.4. Service Authorization . . . . . . . . . . . . . . . . . . 10 65 4.5. Accounting and Billing . . . . . . . . . . . . . . . . . . 10 66 4.6. Negotiation of Service . . . . . . . . . . . . . . . . . . 10 67 4.7. Dispatch to Devices . . . . . . . . . . . . . . . . . . . 11 68 5. Key Principles of Service Identification . . . . . . . . . . . 11 69 5.1. Services are a By-Product of Signaling . . . . . . . . . . 11 70 5.2. Identical Signaling Produces Identical Services . . . . . 12 71 5.3. Do What I Say, not What I Mean . . . . . . . . . . . . . . 14 72 5.4. Explicit Service Identifiers are Redundant . . . . . . . . 14 73 5.5. URIs are Key for Differentiated Signaling . . . . . . . . 14 74 6. Perils of Declarative Service Identification . . . . . . . . . 15 75 6.1. Fraud . . . . . . . . . . . . . . . . . . . . . . . . . . 15 76 6.2. Systematic Interoperability Failures . . . . . . . . . . . 16 77 6.3. Stifling of Service Innovation . . . . . . . . . . . . . . 18 78 7. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 19 79 7.1. Use Derived Service Identification . . . . . . . . . . . . 19 80 7.2. Design for Heterogeneity . . . . . . . . . . . . . . . . . 19 81 7.3. Presence . . . . . . . . . . . . . . . . . . . . . . . . . 20 82 7.4. Intra-Domain . . . . . . . . . . . . . . . . . . . . . . . 20 83 7.5. Device Dispatch . . . . . . . . . . . . . . . . . . . . . 20 84 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21 85 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 86 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 21 87 11. Informational References . . . . . . . . . . . . . . . . . . . 21 88 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 22 89 Intellectual Property and Copyright Statements . . . . . . . . . . 24 91 1. Introduction 93 The Session Initiation Protocol (SIP) [RFC3261] defines mechanisms 94 for initiating and managing communications sessions between agents. 95 SIP allows for a broad array of session types between agents. It can 96 manage audio sessions, ranging from low bitrate voice-only up to 97 multi-channel hi fidelity music. It can manage video sessions, 98 ranging from small, "talking-head" style video chat, up to high 99 definition multipoint video conferencing, to low bandwidth user- 100 generated content, up to high definition movie and TV content. SIP 101 endpoints can be anything - adaptors that convert an old analog 102 telephone to Voice over IP (VoIP), dedicated hardphones, fancy 103 hardphones with rich displays and user entry capabilities, softphones 104 on a PC, buddylist and presence applications on a PC, dedicated 105 videoconferencing peripherals, and speakerphones. 107 This breadth of applicability is SIP's greatest asset, but it also 108 introduces numerous challenges. One of these is that, when an 109 endpoint generates a SIP INVITE for a session, or receives one, that 110 session can potentially be within the context of any number of 111 different use cases and endpoint types. For example, a SIP INVITE 112 with a single audio stream could represent a Push-To-Talk session 113 between mobile devices, a VoIP session between softphones, or audio- 114 based access to stored content on a server. 116 Each of these different use cases represents a different service. 117 The service is the user-visible use case that is driving the behavior 118 of the user-agents and servers in the SIP network. 120 The differing services possible with SIP has driven implementors and 121 system designers to seek techniques for service identification. 122 Service identification is the process of determining and/or signaling 123 the specific use case that is driving the signaling being generated 124 by a user agent. At first glance, this seems harmless and easy 125 enough. It is tempting to define a new header, "Service-ID", for 126 example, and have a user agent populate it with any number of well- 127 known tokens which define what the service is. It could then be 128 consumed for any number of purposes. A service identifier placed 129 into the signaling is called a service identifier. 131 Service identification and service identifiers, when used properly, 132 can be beneficial. However, when done improperly, service 133 identification can lead to fraud, systemic interoperability failures, 134 and a complete stifling of the innovation that SIP was meant to 135 achieve. The purpose of this document is to describe service 136 identification in more detail and describe how these problems arise. 138 Section 2 begins by defining a service and the service identification 139 problem. Section 3 gives some concrete examples of services and why 140 they can be challenging to identify. Section 4 explores the ways in 141 which a service identification can be utilized within a network. 142 Next, Section 5 discusses the key architectural principles of service 143 identification. Section 6 describes what explicit service invocation 144 is, and how it can lead to fraud, interoperability failures, and 145 stifling of service innovation. 147 2. Services and Service Identification 149 The problem of identifying services within SIP is not a new one. The 150 problem has been considered extensively in the context of presence. 151 In particular, the presence data model for SIP [RFC4479] defines the 152 concept of a service as one of the core notions that presence 153 describes. Services are described in Section 3.3 of RFC 4479. 155 Essentially, the service is the user-visible use case that is driving 156 the behavior of the user-agents and servers in the SIP network. 157 Being user-visible means that there is a difference in user 158 experience between two services that are different. That user 159 experience can be part of the call, or outside of the call. Within a 160 call, the user experience can be based on different media types (an 161 audio call vs. a video chat), different content within a particular 162 media type (stored content, such as a movie or TV session), different 163 devices (a wireless device for "telephony" vs. a PC application for 164 "voice-chat"), different user interfaces (a buddy list view of voice 165 on a PC application vs. a software emulation of a hard phone), 166 different communities that can be accessed (voice chat with other 167 users that have the same voice chat client, vs. voice communications 168 with any endpoint on the PSTN), or different applications that are 169 invoked by the user (manually selecting a push-to-talk application 170 from a wireless phone vs. a telephony application). Outside of a 171 call, the difference in user experience can be a billing one (cheaper 172 for one service than other), a notification feature for one and not 173 another (for example, an IM that gets sent whenever a user makes a 174 call), and so on. 176 In some cases, there is very little difference in the underlying 177 technology that will support two different services, and in other 178 cases, there are big differences. However, for purposes of this 179 discussion, the key definition is that two services are distinct when 180 there is a perceived difference by the user in the two services. 182 This leads naturally to the desire to perform service identification. 183 Service identification is defined as the process of: 185 1. determination of the underlying service which is driving a 186 particular signaling exchange, 188 2. associating that service with a service identifier, and 190 3. attaching that moniker to a signaling message (typically a SIP 191 INVITE) 193 Once service identification is performed, the service identifier can 194 then be used for various purposes within the network. Service 195 identification can be done in the endpoints, in which case the UA 196 would insert the moniker directly into the signaling message based on 197 its awareness of the service. Or, it can be done within a server in 198 the network (such as a proxy), based on inspection of the SIP 199 message, or based on hints placed into the message by the user. 201 When service identification is performed entirely by inspecting the 202 signaling, this is called derived service identification. When it is 203 done based on knowledge known only by the invoking user agent, it is 204 called declarative service identification. Declarative service 205 identification can only be done in user agents, by definition. 207 3. Example Services 209 It is very useful to consider several example services, especially 210 ones that appear difficult to differentiate from each other. In 211 cases where it is hard to differentiate, service identification - and 212 in particular, declarative service identification - appears highly 213 attractive (and indeed, required). 215 3.1. IPTV vs. Multimedia 217 IP Television (IPTV) is the usage of IP networks to access 218 traditional television content, such as movies and shows. SIP can be 219 utilized to establish a session to a media server in a network, which 220 then serves up multimedia content and streams it as an audio and 221 video stream towards the client. Whether SIP is ideal for IPTV is, 222 in itself, a good question. However, such a discussion is outside 223 the scope of this document. 225 Consider multimedia conferencing. The user accesses a voice and 226 video conference at a conference server. The user might join in 227 listen-only mode, in which case the user receives audio and video 228 streams, but does not send. 230 These two services - IPTV and listen-only multimedia conferencing, 231 clearly appear as different services. They have different user 232 experiences and applications. A user is unlikely to ever be confused 233 about whether a session is IPTV or listen-only multimedia 234 conferencing. Indeed, they are likely to have different software 235 applications or endpoints for the two services. 237 However, these two services look remarkably alike based on the 238 signaling. Both utilize audio and video. Both could utilize the 239 same codecs. Both are unidirectional streams (from a server in the 240 network to the client). Thus, it would appear on the surface that 241 there is no way to differentiate them, based on inspection of the 242 signaling alone. 244 3.2. Gaming vs. Voice Chat 246 Consider an interactive game, played between two users from their 247 mobile devices. The game involves the users sending each other game 248 moves, using a messaging channel, in addition to voice. In another 249 service, users have a voice and IM chat conversation using a buddy 250 list application on their PC. 252 In both services, there are two media streams - audio and messaging. 253 The audio uses the same codecs. Both use the Message Session Relay 254 Protocol (MSRP) [RFC4975]. In both cases, the caller would send an 255 INVITE to the Address of Record (AOR) of the target user. However, 256 these represent fairly different services, in terms of user 257 experience. 259 3.3. Gaming vs. Voice Chat #2 261 Consider a variation on the example in Section 3.2. In this 262 variation, two users are playing an interactive game between their 263 phones. However, the game itself is set up and controlled using a 264 proprietary mechanism - not using SIP at all. However, the client 265 application allows the user to chat with their opponent. The chat 266 session is a simple voice session setup between the players. 268 Compare this with a basic telephone call between the two users. Both 269 involve a single audio session. Both use the same codecs. They 270 appear to be identical. However, different user experiences are 271 needed. For example, we desire traditional telephony features (such 272 as call forwarding and call screening) to be applied in the telephone 273 service, but not in the gaming chat service. 275 3.4. Configuration vs. Pager Messaging 277 The SIP MESSAGE method [RFC3428] provides a way to send one-shot 278 messages to a particular AOR. This specification is primarily aimed 279 at Short Message Service (SMS) style messaging, commonly found in 280 wireless phones. Receipt of a MESSAGE request would cause the 281 messaging application on a phone to launch, allowing the user to 282 browse message history and respond. 284 However, MESSAGE is sometimes used for the delivery of content to a 285 device for other purposes. For example, some providers use it to 286 deliver configuration updates, such as new phone settings or 287 parameters, or to indicate that a new version of firmware is 288 available. Though not designed for this purpose, MESSAGE gets used 289 since, in existing wireless networks, SMS is used for this purpose, 290 and MESSAGE is the SIP equivalent of SMS. 292 Consequently, the MESSAGE request sent to a phone can be for two 293 different services. One would require invocation of a messaging app, 294 whereas the other would be consumed by the software in the phone, 295 without any user interaction at all. 297 4. Using Service Identification 299 It is important to understand what the service identity would be 300 utilized for, if known. This section discusses the primary uses. 301 These are application invocation in user agents and the network, 302 Quality of Service authorization, service authorization, accounting 303 and billing, service negotiation, and device dispatch. 305 4.1. Application Invocation in the User Agent 307 In some of the examples above, there were multiple software 308 applications executing on the host. One common way of achieving this 309 is to utilize a common SIP user agent implementation that listens for 310 requests on a single port. When an incoming INVITE or MESSAGE 311 arrives, it must be delivered to the appropriate application 312 software. When each service is bound to a distinct software 313 application, it would seem that the service identity is needed to 314 dispatch the message to the appropriate piece of software. This is 315 shown in Figure 1. 317 +---------------------------------+ 318 | | 319 | +-------------+ +-------------+ | 320 | | UI | | UI | | 321 | +-------------+ +-------------+ | 322 | +-------------+ +-------------+ | 323 | | | | | | 324 | | Service 1 | | Service 2 | | 325 | | | | | | 326 | +-------------+ +-------------+ | 327 | +-----------------------------+ | 328 | | | | 329 | | SIP | | 330 | | Layer | | 331 | | | | 332 | +-----------------------------+ | 333 | | 334 +---------------------------------+ 336 Physical Device 338 Figure 1 340 The role of the SIP layer is to parse incoming messages, handle the 341 SIP state machinery for transactions and dialogs, and then dispatch 342 request to the appropriate service. This software architecture is 343 analagous to the way web servers frequently work. An HTTP server 344 listens on port 80 for requests, and based on the HTTP Request-URI, 345 dispatches the request to a number of disparate applications. The 346 same is happening here. For the example services in Section 3.2, an 347 incoming INVITE for the gaming service would be delivered to the 348 gaming application software. An incoming INVITE for the voice chat 349 service would be delivered to the voice chat application software. 350 The example in Section 3.3 is similar. For the examples in 351 Section 3.4, a MESSAGE request for user to user messaging would be 352 delivered to the messaging or SMS app, and a MESSAGE request 353 containing configuration data would be delivered to a configuration 354 update application. 356 Unlike the web, however, in all three use cases, the user initiating 357 communications has (or appears to have - more below) only a single 358 identifier for the recipient - their AOR. Consequently, the SIP 359 Request-URI cannot be used for dispatching, as it is identical in all 360 three cases. 362 4.2. Application Invocation in the Network 364 Another usage of a service identifier would be to cause servers in 365 the SIP network to provide additional processing, based on the 366 service. For example, an INVITE issued by a user agent for IPTV 367 would pass through a server that does some kind of content rights 368 management, authorizing whether the user is allowed to access that 369 content. On the other hand, an INVITE issued by a user for 370 multimedia conferencing would pass through a server providing 371 "traditional" telephony features, such as outbound call screening and 372 call recording. It would make no sense for the INVITE associated 373 with IPTV to have outbound call screening and call recording applied, 374 and it would make no sense for the multimedia conferencing INVITE to 375 be processed by the content rights management server. Indeed, in 376 these cases, it's not just an efficiency issue (invoking servers when 377 not needed), but rather, truly incorrect behavior can occur. For 378 example, if an outbound call screening application is set to block 379 outbound calls to everything except for the phone numbers of friends 380 and family, an IPTV request that gets processed by such a server 381 would be blocked (as it's not targeted to the AOR of a friend or 382 family member). This would block a user's attempt to access IPTV 383 services, when that was not the goal at all. 385 Similarly, a MESSAGE request from Section 3.4 might need to pass 386 through a message server for filtering when it is associated with 387 chat, but not when it is associated with config. Consider a filter 388 which gets applied to MESSAGE requests, and that filter runs in a 389 server in the network. The filter operation prevents user Joe from 390 sending messages to user Bob that contain the words "stock" or 391 "purchase", due to some regulations that disallow Joe and Bob from 392 discussing stock trading. However, a MESSAGE for configuration 393 purposes might contain an XML document that uses the token "stock" as 394 some kind of attribute. This configuration update would be discarded 395 by the filtering server, when it should not have been. 397 4.3. Network Quality of Service Authorization 399 The IP network can provide differing levels of Quality of Service 400 (QoS) to IP packets. This service can include guaranteed throughput, 401 latency, or loss characteristics. Typically, the user agent will 402 make some kind of QoS request, either using explicit signaling 403 protocols (such as RSVP) or through marking of Diffserv value in 404 packets. The network will need to make a policy decision based on 405 whether these QoS treatments are authorized or not. One common 406 authorization policy is to check if the user has invoked a service 407 using SIP that they are authorized to invoke, and that this service 408 requires the level of QoS treatment the user has requested. 410 For example, consider IPTV and multimedia conferencing as described 411 in Section 3.1. IPTV is a non-real time service. Consequently, 412 media traffic for IPTV would be authorized for bandwidth guarantees, 413 but not for latency or loss guarantees. On the other hand, 414 multimedia conferencing is real time. Its traffic would require 415 bandwidth, loss and latency guarantees from the network. 417 Consequently, if a user should make an RSVP reservation for a media 418 stream, and ask for latency guarantees for that stream, the network 419 would like to be able to authorize it if the service was multimedia 420 conferencing, but not if it was IPTV. This would require the server 421 performing the QoS authorization to know the service associated with 422 the INVITE that set up the session. 424 4.4. Service Authorization 426 Frequently, a network administrator will want to authorize whether a 427 user is allowed to invoke a particular service. Not all users will 428 be authorized to use all services that are provided. For example, a 429 user may not be authorized to access IPTV services, whereas they are 430 authorized to utilize multimedia processing. A user might not be 431 able to utilize a multiplayer gaming service, whereas they are 432 authorized to utilize voice chat services. 434 Consequently, when an INVITE arrives at a server in the network, the 435 server will need to determine what the requested service is, so that 436 the server can make an authorization decision. 438 4.5. Accounting and Billing 440 Service authorization and accounting/billing go hand in hand. One of 441 the primary reasons for authorizing that a user can utilize a service 442 is that they are being billed differently based on the type of 443 service. Consequently, one of the goals of a service identity is to 444 be able to include it in accounting records, so that the appropriate 445 billing model can be applied. 447 For example, in the case of IPTV, a service provider can bill based 448 on the content (US $5 per movie, perhaps), whereas for multimedia 449 conferencing, they can bill by the minute. This requires the 450 accounting streams to indicate which service was invoked for the 451 particular session. 453 4.6. Negotiation of Service 455 In some cases, when the caller initiates a session, they don't 456 actually know which service will be utilized. Rather, they might 457 like to offer up all of the services they have available to the 458 called party, and then let the called party decide, or let the system 459 make a decision based on overlapping service capabilities. 461 As an example, a user can do both the game and the voice chat service 462 of Section 3.2. They initiate a session to a target AOR, but the 463 devices used by that user can only support voice chat. The called 464 device returns, in its call acceptance, an indication that only voice 465 chat can be used. Consequently, voice chat gets utilized for the 466 session. 468 4.7. Dispatch to Devices 470 When a user has multiple devices, each with varying capabilities in 471 terms of service, it is useful to dispatch an incoming request to the 472 right device based on whether the device can support the service that 473 has been requested. 475 For example, if a user initiates a gaming session with voice chat, 476 and the target user has two devices - one that can support the gaming 477 service, and the other that cannot, the INVITE should be dispatched 478 to the device which supports the gaming session. 480 5. Key Principles of Service Identification 482 In this section, we describe several key principles of service 483 identification: 485 1. Services are a by-product of signaling 487 2. Identical signaling produces identical services 489 3. Declarative service identification is an example of Do-What-I- 490 Mean (DWIM) 492 4. Explicit service identifiers are redundant 494 5. URIs are a key mechanism for producing differentiated signaling 496 5.1. Services are a By-Product of Signaling 498 Declarative service identification - the addition of a service 499 identifier by clients in order to inform other entities what the 500 service is - is a very compelling solution to solving the use cases 501 described above. It provides a clear way for each of the use cases 502 to be differentiated. On the other hand, derived service 503 identification appears "hard" since the signaling appears to be the 504 same for these different services. 506 Declarative service identification misses a key point, which cannot 507 be stressed enough, and which represents the core architectural 508 principle to be understood here: 510 A service is the by-product of the signaling and the context 511 around it (the user profile, time-of-day and so on) - the effects 512 of the signaling message once launched into the network. The 513 service identity is therefore always derivable from the signaling 514 and its context without additional identifiers. In other words, 515 derived service identification is always possible when signaling 516 is being properly handled. 518 When a user sends an INVITE request to the network, and targets that 519 request at an IPTV server, and includes SDP for audio and video 520 streaming, the *result* of sending such an INVITE is that an IPTV 521 session occurs. The entire purpose of the INVITE is to establish 522 such a session, and therefore, invoke the service. Thus, a service 523 is not something that is different from the rest of the signaling 524 message. A service is what the user gets after the network and other 525 user agents have processed a signaling message. 527 It may seem that delayed offers (SIP INVITE requests that lack SDP) 528 make it impossible to perform derived service identification. After 529 all, in some of the cases above, the differentiation was done using 530 the SDP in the request. What if its not there? The answer is simple 531 - if its not there, and the SDP is being offered by the called party, 532 you cannot in fact know the service at the time of the INVITE. Thats 533 the whole point of delayed offer - to give the called party the 534 chance to offer up what it wants for the session. In cases where 535 service identification is needed at request time, delayed offer 536 cannot be used. 538 5.2. Identical Signaling Produces Identical Services 540 This principle is a natural conclusion of the previous assertion. If 541 a service is the byproduct of signaling, how can a user have 542 different experiences and different services when the signaling 543 message is the same? They cannot. 545 But how can that be? From the examples in Section 3, it would seem 546 that there are services which are different, but have identical 547 signaling. If we hold true to the assertion, there is in fact only 548 one logical conclusion: 550 If two services are different, but their signaling appears to be 551 the same, it is because one or more of the following is true: 553 1. there is in fact something different that has been overlooked 555 2. something has been implied from the signaling which should 556 have been signaled explicitly 558 3. the signaling mechanism should be changed so that there is, in 559 fact, something that is different 561 To illustrate this, let us take each of the example services in 562 Section 3 and investigate whether there is, or should be, something 563 different in the signaling in each case. 565 IPTV vs. Multimedia Conferencing: The two services in Section 3.1 566 appear to have identical signaling. They both involve audio and 567 video streams, both of which are unidirectional. Both might 568 utilize the same codecs. However, there is another important 569 difference in the signaling - the target URI. In the case of 570 IPTV, the request is targeted at a media server or to a particular 571 piece of content to be viewed. In the case of multimedia 572 conferencing, the target is a conference server. The 573 administrator of the domain can therefore examine the two Request- 574 URI, and figure out whether it is targeted for a conference server 575 or a content server, and use that to derive the service associated 576 with the request. 578 Gaming vs. Voice Chat: Though both sessions involve MSRP and voice, 579 and both are targeted to the same AOR of the called user, there is 580 a difference. The MSRP messages for the gaming session carry 581 content which is game specific, whereas the MSRP messages for the 582 voice chat are just regular text, meant for rendering to a user. 583 Thus, the MSRP session in the SDP will indicate the specific 584 content type that MSRP is carrying, and this type will differ in 585 both cases. Even if the game moves look like text, since they are 586 being consumed by an automata there is an underlying schema that 587 dictates their content, and therefore, this schema represents the 588 actual content type that should be signaled. 590 Gaming vs. Voice CHat #2: In this case, both sessions involve only 591 voice, and both are targeted at the same AOR. Indeed, there truly 592 is nothing different - if indeed the signaling works this way. 593 However, there is an alternative mechanism for performing the 594 signaling. For the gaming session, the proprietary protocol can 595 be used to exchange a URI that can be used to identify the voice 596 chat function on the phone that is associated with the game (for 597 example, a GRUU can be used [I-D.ietf-sip-gruu]). Indeed, the 598 gaming chat is not targeting the USER - its targeting the gaming 599 instance on the phone. Thus, if a special GRUU is used for the 600 gaming chat, this makes the signaling different between these two 601 services. 603 Configuration vs. Pager Messaging: Just as in the case of gaming vs. 604 voice chat, the content type of the messages differentiates the 605 service that occurs as a consequence of the messages. 607 5.3. Do What I Say, not What I Mean 609 "Do What I Mean", abbreviated as DWIM, is a concept in computer 610 science. It is sometimes used to describe a function which tries to 611 intelligently guess at what the user intended. It is contrast to "Do 612 What I Say", or DWIS, which describes a function that behaves 613 concretely based on the inputs provided. Systems built on the DWIM 614 concept can have unexpected behaviors because they are driven by 615 unstated rules. 617 Declarative service identification is an example of DWIM. The 618 service identifier has no well-defined impact on the state machinery 619 or protocols in the system; it has various side-effects based on an 620 assumption of what is meant by the service identifier. Derived 621 service identification, on the other hand, is an expression of the 622 principle of DWIS - the behavior of the system is based entirely on 623 the specifics of the protocol and are well defined by the protocol 624 specification. The service identifier is just a short hand for 625 summarizing things that are well defined by signaling. 627 As a litmus test to differentiate the two cases, consider the 628 following question. If a request contained a service identifier, and 629 that request were processed by a domain which didn't understand the 630 concept of service identifiers at all, would the request be rejected 631 if that service were not supported, or would it complete but do the 632 wrong thing? If it is the latter case, its DWIM. If its the former, 633 its DWIS. 635 5.4. Explicit Service Identifiers are Redundant 637 Because an explicit service identifier is, by definition, inside of 638 the signaling message, and because the signaling itself completely 639 defines the behavior of the service, another natural conclusion is 640 that an explicit service identifier is redundant with the signaling 641 itself. It says nothing that could not or should not otherwise be 642 derived from examination of the signaling. 644 5.5. URIs are Key for Differentiated Signaling 646 In the IPTV example and in the second gaming example, it was 647 ultimately the Request-URI that was (or should be) different between 648 the two services. This is important. In many cases where services 649 appear the same, it is because the resource which is being targeted 650 is not, in fact, the user. Rather, it is a resource that is linked 651 with the user. This resource might be an instance of a software 652 application on the particular device of a user, or a resource in the 653 network which acts on behalf of the user. 655 The Request-URI is an infinitely large namespace for identifying 656 these resources. It is an ideal mechanism for providing 657 differentiation when there would otherwise be none. 659 Returning again to the example in Section 3.3, we can see that it 660 does make more sense to target the gaming chat session at a software 661 instance on the user's phone, rather than at the user themselves. 662 The gaming chat session should really only go to the phone on which 663 the user is playing the game. The software instance does indeed live 664 only on that phone, whereas the user themselves can be contacted many 665 ways. We don't want telephony features invoked for the gaming chat 666 session because those features only make sense when someone is trying 667 to communicate with the USER. When someone is trying to communicate 668 with a software instance that acts on behalf of the user, a different 669 set of rules apply since the target of the request is completely 670 different. 672 6. Perils of Declarative Service Identification 674 Based on these principles, several perils of declarative service 675 identification can be described. They are: 677 1. Declarative service identification can be used for fraud 679 2. Declarative service identification can hurt interoperability 681 3. Declarative service identification can stifle service innovation 683 6.1. Fraud 685 Declarative service identification can lead to fraud. If a provider 686 uses the service identifier for billing and accounting purposes, or 687 for authorization purposes, it opens an avenue for attack. The user 688 can construct the signaling message so that its actual effect (which 689 is the service the user will receive), is what the user desires, but 690 the user places a service identifier into the request (which is what 691 is used for billing and authorization) that identifies a cheaper 692 service, or one that the user is authorized to receive. In such a 693 case, the user will be billed for something they did not receive. 695 If, however, the domain administrator derived the service identifier 696 from the signaling itself (derived service identification), the user 697 cannot lie. If they did lie, they wouldn't get the desired service. 699 Consider the example of IPTV vs. multimedia conferencing. If 700 multimedia conferencing is cheaper, the user could send an INVITE for 701 an IPTV session, but include a service identifier which indicates 702 multimedia conferencing. The user gets the service associated with 703 IPTV, but at the cost of multimedia conferencing. 705 This same principle shows up in other places. For example, in the 706 identification of an emergency services call 707 [I-D.ietf-ecrit-framework]. It is desirable to give emergency 708 services calls special treatment, such as being free, authorized even 709 when the user cannot otherwise make calls, and to give them priority. 710 If emergency calls where indicated through something other than the 711 target of the call being an emergency services URN [RFC5031], it 712 would open an avenue for fraud. The user could place any desired URI 713 in the request-URI, and indicate that the call is an emergency 714 services call. This could would then get special treatment, but of 715 course get routed to the target URI. The only way to prevent this 716 fraud is to consider an emergency call as any call whose target is an 717 emergency services URN. Thus, the service identification here is 718 based on the target of the request. When the target is an emergency 719 services URN, the request can get special treatment. The user cannot 720 lie, since there is no way to separately indicate this is an 721 emergency call, besides targeting it to an emergency URN. 723 6.2. Systematic Interoperability Failures 725 How can declarative service identification cause loss of 726 interoperability? When an identifier is used to drive functionality 727 - such as dispatch on the phones, in the network, or QoS 728 authorization, it means that the wrong thing can happen when this 729 field is not set properly. Consider a user in domain 1, calling a 730 user in domain 2. Domain 1 provides the user with a service they 731 call "voice chat", which utilizes voice and IM for real time 732 conversation, driven off of a buddy list application on a PC. Domain 733 2 provides their users with a service they call, "text telephony", 734 which is a voice service on a wireless device that also allows the 735 user to send text messages. Consider the case where domain 1 and 736 domain 2 both have their user agents insert a service identifiers 737 into the request, and then use that to perform QoS authorization, 738 accounting, and invocation of applications in the network and in the 739 device. The user in domain 1 calls the user in domain 2, and inserts 740 the identifier "Voice Chat" into the INVITE. When this arrives at 741 the server in domain 2, the service identifier is unknown. 742 Consequently, the request does not get the proper QoS treatment, even 743 if the call itself will succeed. 745 If, on the other hand, derived service identification were used, the 746 service identifier could be removed by domain 2, and then recomputed 747 based on the signaling to match its own notion of services. In this 748 case, domain 2 could derive the "text telephony" identifier, and the 749 request completes successfully. 751 declarative service identification, used between domains, causes 752 interoperability failures unless all interconnected domains agree on 753 exactly the same set of services and how to name them. Of course, 754 lack of service identifiers does not guarantee service 755 interoperability. However, SIP was built with rich tools for 756 negotiation of capabilities at a finely granular level. One user 757 agent can make a call using audio and video, but if the receiving UA 758 only supports audio, SIP allows both sides to negotiate down to the 759 lowest common denominator. Thus, communications is still provided. 760 As another example, if one agent initiates a Push-To-Talk session 761 (which is audio with a companion floor control mechanism), and the 762 other side only did regular audio, SIP would be able to negotiate 763 back down to a regular voice call. As another example, if a calling 764 user agent is running a high-definition video conferencing endpoint, 765 and the called user agent supports just a regular video endpoint, the 766 codecs themselves can negotiate downward to a lower rate, picture 767 size, and so on. Thus, interoperability is achieved. Interestingly, 768 the final "service" may no longer be well characterized by the 769 service identifier that would have been placed in the original 770 INVITE. For example, in this case, of the original INVITE from the 771 caller had contained the service identifier, "hi-fi video", but the 772 video gets negotiated down to a lower rate and picture size, the 773 service identifier is no longer really appropriate. That is why 774 services need to be derived by signaling - because the signaling 775 itself provides negotiation and interoperability between different 776 domains. 778 This illustrates another key aspect of the interoperability problem. 779 Declarative service identification will result in inconsistencies 780 between its service identifiers and the results of any SIP 781 negotiation that might otherwise be applied in the session. 783 When a service identifier becomes something that both proxies and the 784 user agent need to understand in order to properly treat a request 785 (which is the case for declarative service identification), it 786 becomes equivalent to including a token in the Proxy-Require and 787 Require header fields of every single SIP request. The very reason 788 that [RFC4485] frowns upon usage of Require and certainly Proxy- 789 Require is the huge impact on interoperability it causes. It is for 790 this same reason that declarative service identification needs to be 791 avoided. 793 6.3. Stifling of Service Innovation 795 The probability that any two pair of service providers end up with 796 the same set of services, and give them the same names, becomes 797 decreasingly small as the number of providers grow. Indeed, it would 798 almost certainly require a centralized authority to identify what the 799 services are, how they work, and what they are named. This, in turn, 800 leads to a requirement for complete homogeneity in order to 801 facilitate interconnection. Two providers cannot usefully 802 interconnect unless they agree on the set of services they are 803 offering to their customers, and each do the same thing. This is 804 because each provider has become dependent on inclusion of the proper 805 service identifier in the request, in order for the overall treatment 806 of the request to proceed correctly. This is, in a very real sense, 807 anathema to the entire notion of SIP, which is built on the idea that 808 heterogeneous domains can interconnect and still get 809 interoperability. 811 Declarative service identification leads to a requirement for 812 homogeneity in service definitions across providers that 813 interconnect, ruining the very service heterogeneity that SIP was 814 meant to bring. 816 Indeed, Metcalfe's law says that the value of a network grows with 817 the square of the number of participants. As a consequence of this, 818 once a bunch of large domains did get together, agree on a set of 819 services, and then a set of well-known identifiers for those 820 services, it would force other providers to also deploy the same 821 services, in order to obtain the value that interconnection brings. 822 This, in turn, will stifle innovation, and quickly force the set of 823 services in SIP to become fixed and never expand beyond the ones 824 initially agreed upon. This, too, is anathema to the very framework 825 on which SIP is built, and defeats much of the purpose of why 826 providers have chosen to deploy SIP in their own networks: 828 Consider the following example. Several providers get together, and 829 standardize on a bunch of service identifiers. One of these uses 830 audio and video (say, "multimedia conversation"). This service is 831 successful, and is widely utilized. Endpoints look for this 832 identifier to dispatch calls to the right software applications, and 833 the network looks for it to invoke features, perform accouting, and 834 QoS. A new provider gets the idea for a new service, say, avatar- 835 enhanced multimedia conversation. In this service, there is audio 836 and video, but there is a third stream, which renders an avatar. A 837 caller can press buttons on their phone, to cause the avatar on the 838 other person's device to show emotion, make noise, and so on. This 839 is similar to the way emoticons are used today in IM. This service 840 is enabled by adding a third media stream (and consequently, third 841 m-line) to the SDP. 843 Normally, this service would be backwards compatible with a regular 844 audio-video endpoint, which would just reject the third media stream. 845 However, because a large network has been deployed that is expecting 846 to see the token, "multimedia conversation" and its associated audio+ 847 video service, it is nearly impossible for the new provider to roll 848 out this new service. If they did, it would fail completely, or 849 partially fail, when their users call users in other provider 850 domains. 852 7. Recommendations 854 From these principles, several recommendations can be made. 856 7.1. Use Derived Service Identification 858 Derived service identification - where an identifier for a service is 859 obtained by inspection of the signaling and other contextual data 860 (such as subscriber profile) is reasonable, and when done properly, 861 does not lead to the perils described above. However, declarative 862 service identification - where user agents indicate what the service 863 is, separate from the rest of the signaling - leads to the perils 864 described above. 866 If it appears that the signaling currently defined in standards is 867 not sufficient to identify the service, it may be due to lack of 868 sufficient signaling to convey what is needed, or may be because 869 request URIs should be used for differentiation and they are not 870 being used. By applying the litmus tests described in Section 5.3, 871 network designers can determine if the system is attempting to 872 perform declarative service identification or not. 874 7.2. Design for Heterogeneity 876 When performing derived service identification, domains should be 877 aware that sessions may arrive from different networks and different 878 endpoints. Consequently, the service identification algorithm must 879 be complete - meaning it computes the best answer for any possible 880 signaling message that might be received. 882 In a homogeneous environment, the process of service identification 883 is easy. The service provider will know the set of services they are 884 providing, and based on the specific calls flows for each specific 885 service, can construct rules to differentiate one service from 886 another. However, when two different providers interconnect, 887 assumptions about what services are used, and how they are signaled, 888 no longer apply. To provide the best user experience possible, a 889 provider doing service identification needs to perform a 'best-match' 890 operation, such that any legal SIP signaling - not just the specific 891 call flows running within their own network - is mapped to the 892 appropriate service. 894 7.3. Presence 896 Presence can help a great deal with providing unique URIs for 897 different services. When a user wishes to contact another user, and 898 knows only the AOR for the target (which is usually the case), the 899 user can fetch the presence document for the target. That document, 900 in turn, can contain numerous service URI for contacting the target 901 with different services. Those URI can then be used in the Request- 902 URI for differentiation. When possible, this is the best solution to 903 the problem. 905 7.4. Intra-Domain 907 Service identifiers themselves are not bad; derived service 908 identification allows each domain to cache the results of the service 909 identification process for usage by another network element within 910 the same domain. However, service identifiers are fundamentally 911 useful within a particular domain, and any such header must be 912 stripped at a network boundary. Consequently, the process of service 913 identification and their associated service identifiers is always an 914 intra-domain operation. 916 7.5. Device Dispatch 918 Device dispatch should be done following the principles of [RFC3841], 919 using implicit preferences based on the signaling. For example, 920 [I-D.rosenberg-sip-app-media-tag] defines a new UA capability that 921 can be used to dispatch requests based on different types of 922 application media streams. 924 However, it is is a mistake to try and use a service identifier as a 925 UA capability. Consider a service called "multimedia telephony" 926 which adds video to the existing PSTN experience. A user has two 927 devices, one of which is used for multimedia telephony, and the other 928 is used strictly for a voice-assisted game. It is tempting to have 929 the telephony device include a UA capability [RFC3840] called 930 "multimedia telephony" in its registration. Then, a calling 931 multimedia telephony device can then include the Accept-Contact 932 header field [RFC3841] containing this feature tag. The proxy 933 serving the called party, applying the basic algorithms of [RFC3841] 934 will correctly route the call to the terminating device. 936 However, if the calling party is not within the same domain, and the 937 calling domain does not know about or use this feature tag, there 938 will be no Accept-Contact header field, even if the calling party was 939 using a service that is a good match for 'multimedia telephony'. In 940 such a case, the call may be delivered to both devices, yielded a 941 poorer user experience. Thats because device dispatch was done using 942 declarative service identification. 944 The best way to avoid this problem is to use feature tags which can 945 be matched to well defined signaling features - media types, required 946 SIP extensions and so on. In particular, the golden rule is that the 947 granularity of feature tags must be equivalent to the granularity of 948 individual features that can be signaled in SIP. 950 8. Security Considerations 952 Oftentimes, the service associated with a request is utilized for 953 purposes such as authorization, accounting, and billing. When 954 service identification is not done properly, the possibility of 955 unauthorized service use and network fraud is introduced. It is for 956 this reason, discussed extensively in Section 6.1, that the usage of 957 explicit service identifiers inserted by a UA is not recommended. 959 9. IANA Considerations 961 There are no IANA considerations associated with this specification. 963 10. Acknowledgements 965 This document is based on discussions with Paul Kyzivat and Andrew 966 Allen, who contributed significantly to the ideas here. Much of the 967 content in this draft is a result of discussions amongst participants 968 in the SIPPING mailing list, including Dean Willis, Tom Taylor, Eric 969 Burger, Dale Worley, Christer Holmberg, and John Elwell, amongst many 970 others. Thanks to Spencer Dawkins, Tolga Asveren, Mahesh Anjanappa 971 and Claudio Allochio for reviews of this document. 973 11. Informational References 975 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 976 A., Peterson, J., Sparks, R., Handley, M., and E. 977 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 978 June 2002. 980 [RFC4479] Rosenberg, J., "A Data Model for Presence", RFC 4479, 981 July 2006. 983 [RFC4485] Rosenberg, J. and H. Schulzrinne, "Guidelines for Authors 984 of Extensions to the Session Initiation Protocol (SIP)", 985 RFC 4485, May 2006. 987 [RFC4975] Campbell, B., Mahy, R., and C. Jennings, "The Message 988 Session Relay Protocol (MSRP)", RFC 4975, September 2007. 990 [RFC5031] Schulzrinne, H., "A Uniform Resource Name (URN) for 991 Emergency and Other Well-Known Services", RFC 5031, 992 January 2008. 994 [I-D.ietf-ecrit-framework] 995 Rosen, B., Schulzrinne, H., Polk, J., and A. Newton, 996 "Framework for Emergency Calling using Internet 997 Multimedia", draft-ietf-ecrit-framework-05 (work in 998 progress), February 2008. 1000 [I-D.ietf-sip-gruu] 1001 Rosenberg, J., "Obtaining and Using Globally Routable User 1002 Agent (UA) URIs (GRUU) in the Session Initiation Protocol 1003 (SIP)", draft-ietf-sip-gruu-15 (work in progress), 1004 October 2007. 1006 [I-D.rosenberg-sip-app-media-tag] 1007 Rosenberg, J., "A Session Initiation Protocol (SIP) Media 1008 Feature Tag for MIME Application Sub-Types", 1009 draft-rosenberg-sip-app-media-tag-02 (work in progress), 1010 November 2007. 1012 [RFC3428] Campbell, B., Rosenberg, J., Schulzrinne, H., Huitema, C., 1013 and D. Gurle, "Session Initiation Protocol (SIP) Extension 1014 for Instant Messaging", RFC 3428, December 2002. 1016 [RFC3841] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller 1017 Preferences for the Session Initiation Protocol (SIP)", 1018 RFC 3841, August 2004. 1020 [RFC3840] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, 1021 "Indicating User Agent Capabilities in the Session 1022 Initiation Protocol (SIP)", RFC 3840, August 2004. 1024 Author's Address 1026 Jonathan Rosenberg 1027 Cisco 1028 Edison, NJ 1029 US 1031 Email: jdrosen@cisco.com 1032 URI: http://www.jdrosen.net 1034 Full Copyright Statement 1036 Copyright (C) The IETF Trust (2008). 1038 This document is subject to the rights, licenses and restrictions 1039 contained in BCP 78, and except as set forth therein, the authors 1040 retain all their rights. 1042 This document and the information contained herein are provided on an 1043 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1044 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 1045 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 1046 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1047 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1048 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1050 Intellectual Property 1052 The IETF takes no position regarding the validity or scope of any 1053 Intellectual Property Rights or other rights that might be claimed to 1054 pertain to the implementation or use of the technology described in 1055 this document or the extent to which any license under such rights 1056 might or might not be available; nor does it represent that it has 1057 made any independent effort to identify any such rights. Information 1058 on the procedures with respect to rights in RFC documents can be 1059 found in BCP 78 and BCP 79. 1061 Copies of IPR disclosures made to the IETF Secretariat and any 1062 assurances of licenses to be made available, or the result of an 1063 attempt made to obtain a general license or permission for the use of 1064 such proprietary rights by implementers or users of this 1065 specification can be obtained from the IETF on-line IPR repository at 1066 http://www.ietf.org/ipr. 1068 The IETF invites any interested party to bring to its attention any 1069 copyrights, patents or patent applications, or other proprietary 1070 rights that may cover technology that may be required to implement 1071 this standard. Please address the information to the IETF at 1072 ietf-ipr@ietf.org. 1074 Acknowledgment 1076 Funding for the RFC Editor function is provided by the IETF 1077 Administrative Support Activity (IASA).