idnits 2.17.1 draft-ietf-sipping-service-identification-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 860. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 871. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 878. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 884. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 30 instances of too long lines in the document, the longest one being 3 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 1, 2007) is 6110 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '14' is mentioned on line 199, but not defined == Missing Reference: '15' is mentioned on line 201, but not defined == Outdated reference: A later version (-13) exists of draft-ietf-ecrit-framework-01 == Outdated reference: A later version (-07) exists of draft-ietf-ecrit-service-urn-06 Summary: 2 errors (**), 0 flaws (~~), 6 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SIPPING J. Rosenberg 3 Internet-Draft Cisco 4 Intended status: Best Current August 1, 2007 5 Practice 6 Expires: February 2, 2008 8 Identification of Communications Services in the Session Initiation 9 Protocol (SIP) 10 draft-ietf-sipping-service-identification-00 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware 16 have been or will be disclosed, and any of which he or she becomes 17 aware will be disclosed, in accordance with Section 6 of BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on February 2, 2008. 37 Copyright Notice 39 Copyright (C) The IETF Trust (2007). 41 Abstract 43 This document considers the problem of service identification in the 44 Session Initiation Protocol (SIP). Service identification is the 45 process of determining the user-level use case that is driving the 46 signaling being utilized by the user agent. While seemingly simple, 47 this process is quite complex, and when not addressed properly, can 48 lead to fraud, interoperability problems, and stifling of innovation. 50 This document discusses these problems and makes recommendations on 51 how to address them. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 57 3. Services and Service Identification . . . . . . . . . . . . . 4 58 4. Example Services . . . . . . . . . . . . . . . . . . . . . . . 6 59 4.1. IPTV vs. Multimedia . . . . . . . . . . . . . . . . . . . 6 60 4.2. Gaming vs. Voice Chat . . . . . . . . . . . . . . . . . . 7 61 4.3. Configuration vs. Pager Messaging . . . . . . . . . . . . 7 62 5. Using Service Identification . . . . . . . . . . . . . . . . . 7 63 5.1. Application Invocation in the User Agent . . . . . . . . . 8 64 5.2. Application Invocation in the Network . . . . . . . . . . 9 65 5.3. Network Quality of Service Authorization . . . . . . . . . 9 66 5.4. Service Authorization . . . . . . . . . . . . . . . . . . 10 67 5.5. Accounting and Billing . . . . . . . . . . . . . . . . . . 10 68 5.6. Negotiation of Service . . . . . . . . . . . . . . . . . . 10 69 5.7. Dispatch to Devices . . . . . . . . . . . . . . . . . . . 11 70 6. Key Principles of Service Identification . . . . . . . . . . . 11 71 6.1. Services are a By-Product of Signaling . . . . . . . . . . 11 72 6.2. Perils of Explicit Identifiers . . . . . . . . . . . . . . 13 73 6.2.1. Fraud . . . . . . . . . . . . . . . . . . . . . . . . 13 74 6.2.2. Systematic Interoperability Failures . . . . . . . . . 14 75 6.2.3. Stifling of Service Innovation . . . . . . . . . . . . 16 76 7. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 17 77 8. Security Considerations . . . . . . . . . . . . . . . . . . . 17 78 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 79 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18 80 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 81 11.1. Normative References . . . . . . . . . . . . . . . . . . . 18 82 11.2. Informational References . . . . . . . . . . . . . . . . . 18 83 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 18 84 Intellectual Property and Copyright Statements . . . . . . . . . . 20 86 1. Introduction 88 The Session Initiation Protocol (SIP) [2] defines mechanisms for 89 initiating and managing communications sessions between agents. SIP 90 allows for a broad array of session types between agents. It can 91 manage audio sessions, ranging from low bitrate voice-only up to 92 multi-channel hi fidelity music. It can manage video sessions, 93 ranging from small, "talking-head" style video chat, up to high 94 definition multipoint video conferencing, to low bandwidth user- 95 generated content, up to high definition movie and TV content. SIP 96 endpoints can be anything - adaptors that convert an old analog 97 telephone to Voice over IP (VoIP), dedicated hardphones, fancy 98 hardphones with rich displays and user entry capabilities, softphones 99 on a PC, buddylist and presence applications on a PC, dedicated 100 videoconferencing peripherals, and speakerphones. 102 This breadth of applicability is SIPs greatest asset, but it also 103 introduces numerous challenges. One of these is that, when an 104 endpoint generates a SIP INVITE for a session, or receives one, that 105 session can potentially be within the context of any number of 106 different use cases and endpoint types. For example, a SIP INVITE 107 with a single audio stream could represent a Push-To-Talk session 108 between mobile devices, a VoIP session between softphones, or audio- 109 based access to stored content on a server. 111 These differing use cases have driven implementors and system 112 designers to seek techniques for service identification. Service 113 identification is the process of determining and/or signaling the 114 specific use case that is driving the signaling being generated by a 115 user agent. At first glance, this seems harmless and easy enough. 116 It is tempting to define a new header, "Service-ID", for example, and 117 have a user agent populate it with any number of well-known tokens 118 which define what the service is. This information could then be 119 consumed for any number of purposes. 121 However, as this document will demonstrate, service identification is 122 a very complex and difficult process, and can very easily lead to 123 fraud, systemic interoperability failures, and a complete stifling of 124 the innovation that SIP was meant to achieve. 126 Section 3 begins by defining a service and the service identification 127 problem. Section 4 gives some concrete examples of services and why 128 they can be challenging to identify. Section 5 explores the ways in 129 which a service identification can be utilized within a network. 130 Next, Section 6 discusses the key architectural principles of service 131 identification, and how explicit service identifiers can lead to 132 fraud, interoperability failures, and stifling of service innovation. 134 2. Terminology 136 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 137 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 138 document are to be interpreted as described in RFC 2119 [1]. 140 3. Services and Service Identification 142 The problem of identifying services within SIP is not a new one. The 143 problem has been considered extensively in the context of presence. 144 In particular, the presence data model for SIP [3] defines the 145 concept of a service as one of the core notions that presence 146 describes. Services are described in Section 3.3 of RFC 4479, which 147 has this to say on the topic: 149 3.3. Service 151 Each presentity has access to a number of services. Each of these 152 represents a point of reachability for communications that can be 153 used to interact with the user. Examples of services are telephony 154 (that is, traditional circuit-based telephone service), push-to-talk, 155 instant messaging, Short Message Service (SMS), and Multimedia 156 Message Service (MMS). 158 It is difficult to give a precise definition for service. One 159 reasonable approach is to model each software or hardware agent in 160 the system as a service. If a user starts a softphone application on 161 their PC, then that represents a service. If a user has a videophone 162 device, then that represents another service. This is effectively a 163 physical view of services. This definition, however, starts to fall 164 apart when a service is spread across multiple software agents or 165 devices. For example, a SIP URI representing an address-of-record 166 can be routed to a softphone or a videophone, or both. In that case, 167 one might attempt instead to define a service based on its address on 168 the network. This definition also falls apart when modeling devices 169 or applications that receive calls and dispatch them to different 170 "helpers" based on potentially complex logic. For example, a 171 cellular telephone might house multiple SIP applications, each of 172 which can "register" different handlers based on the method or even 173 body type of the request. Each of those applications or handlers can 174 rightfully be considered a service, but it doesn't have an address on 175 the network distinct from the others. 177 Because of this inherent difficulty in precisely defining a service, 178 the data model doesn't try to constrain what can be considered a 179 service. Rather, anything can be considered a service so long as it 180 exhibits a set of key properties defined by this model. In 181 particular, each service is associated with characteristics that 182 identify the nature and capabilities of that service, with reach 183 information that indicates how to connect to the service, with status 184 information representing the state of that service, and relative 185 information that describes the ways in which that service relates to 186 others associated with the presentity. 188 As a consequence, in this model, services are not explicitly 189 enumerated. There is no central registry where one finds identifiers 190 for each service. Consequently, each service does not have a single 191 "service" attribute with values such as "ptt" or "telephony". That 192 doesn't mean that these consolidated monikers aren't useful; indeed, 193 they represent an essential summary of what the service is. Such 194 summarization is useful in creating icons that allow a user to choose 195 one service over another. A watcher is free to create such 196 summarization information from any of the information associated with 197 a service. The reach information often provides valuable information 198 for creating such a summarization. Oftentimes, the scheme of the URI 199 is synonymous with the view of what a service is. An "sms" URI [14] 200 clearly indicates SMS, for example. For some URIs, there may be many 201 services available, for example, SIP or tel [15], in which case the 202 scheme is less meaningful as a way of creating a summary. The reach 203 information could also indicate that certain application software has 204 to be invoked (such as a videogame), in which case that aspect of the 205 reach information would be useful for generating an iconic 206 representation of the game. 208 Essentially, the service is the user-visible use case that is driving 209 the behavior of the user-agents and servers in the SIP network. 210 Being user-visible means that there is a difference in user 211 experience between two services that are different. That user 212 experience can be part of the call, or outside of the call. Within a 213 call, the user experience can be based on different media types (an 214 audio call vs. a video chat), different content within a particular 215 media type (stored content, such as a movie or TV session), different 216 devices (a wireless device for "telephony" vs. a PC application for 217 "voice-chat"), different user interfaces (a buddy list view of voice 218 on a PC application vs. a software emulation of a hard phone), 219 different communities that can be accessed (voice chat with other 220 users that have the same voice chat client, vs. voice communications 221 with any endpoint on the PSTN), or different applications that are 222 invoked by the user (manually selecting a push-to-talk application 223 from a wireless phone vs. a telephony application). Outside of a 224 call, the difference in user experience can be a billing one (cheaper 225 for one service than other), a notification feature for one and not 226 another (for example, an IM that gets sent whenever a user makes a 227 call), and so on. 229 In some cases, there is very little difference in the underlying 230 technology that will support two different services, and in other 231 cases, there are big differences. However, for purposes of this 232 discussion, the key definition is that two services are distinct when 233 there is a perceived difference by the user in the two services. 235 This leads naturally to the desire to perform service identification. 236 Service identification is defined as the process of (1) determination 237 of the underlying service which is driving a particular signaling 238 exchange, (2) associating that service with some kind of moniker, and 239 (3) attaching that moniker to a signaling message (typically a SIP 240 INVITE), and then utilizing it for various purposes within the 241 network. Service identification can be done in the endpoints, in 242 which case the UA would insert the moniker directly into the 243 signaling message based on its awareness of the service. Or, it can 244 be done within a proxy in the network, based on inspection of the SIP 245 message, or based on hints placed into the message by the user. 247 4. Example Services 249 It is very useful to consider several example services, especially 250 ones that appear difficult to differentiate from each other. 252 4.1. IPTV vs. Multimedia 254 IP Television (IPTV) is the usage of IP networks to access 255 traditional television content, such as movies and shows. SIP can be 256 utilized to establish a session to a media server in a network, which 257 then serves up multimedia content and streams it as an audio and 258 video stream towards the client. Whether SIP is ideal for IPTV is, 259 in itself, a good question. However, such a discussion is outside 260 the scope of this document. 262 Consider multimedia conferencing. The user accesses a voice and 263 video conference at a conference server. The user might join in 264 listen-only mode, in which case the user receives audio and video 265 streams, but does not send. 267 These two services - IPTV and multimedia conferencing, clearly appear 268 as different services. They have different user experiences and 269 applications. A user is unlikely to ever be confused about whether a 270 session is IPTV or multimedia conferencing. Indeed, they are likely 271 to have different software applications or endpoints for the two 272 services. 274 However, these two services look remarkably alike based on the 275 signaling. Both utilize audio and video. Both could utilize the 276 same codecs. Both are unidirectional streams (from a server in the 277 network to the client). Thus, it would appear on the surface that 278 there is no way to differentiate them, based on inspection of the 279 signaling alone. 281 4.2. Gaming vs. Voice Chat 283 Consider an interactive game, played between two users from their 284 mobile devices. The game involves the users sending each other game 285 moves, using a messaging channel, in addition to voice. In another 286 service, users have a voice and IM chat conversation using a buddy 287 list application on their PC. 289 In both services, there are two media streams - audio and messaging. 290 The audio uses the same codecs. Both use the Message Session Relay 291 Protocol (MSRP) [5]. In both cases, the caller would send an INVITE 292 to the AOR of the target user. However, these represent fairly 293 different services, in terms of user experience. 295 4.3. Configuration vs. Pager Messaging 297 The SIP MESSAGE method [8] provides a way to send one-shot messages 298 to a particular AOR. This specification is primarily aimed at Short 299 Message Service (SMS) style messaging, commonly found in wireless 300 phones. Receipt of a MESSAGE request would cause the messaging 301 application on a phone to launch, allowing the user to browse message 302 history and respond. 304 However, MESSAGE is sometimes used for the delivery of content to a 305 device for other purposes. For example, some providers use it to 306 deliver configuration updates, such as new phone settings or 307 parameters, or to indicate that a new version of firmware is 308 available. Though not designed for this purpose, MESSAGE gets used 309 since, in existing wireless networks, SMS are used for this purpose, 310 and MESSAGE is the SIP equivalent of SMS. 312 Consequently, the MESSAGE request sent to a phone can be for two 313 different services. One would require invocation of a messaging app, 314 whereas the other would be consumed by the software in the phone, 315 without any user interaction at all. 317 5. Using Service Identification 319 It is important to understand what the service identity would be 320 utilized for, if known. The discussions in Section 4 give some hints 321 to the possible usages. Here, we explicitly discuss them. 323 5.1. Application Invocation in the User Agent 325 In some of the examples above, there were multiple software 326 applications running within a single user agent. When an incoming 327 INVITE or MESSAGE arrives, it must be delivered to the appropriate 328 application software. When each service is bound to a distinct 329 software application, it would seem that the service identity is 330 needed to dispatch the message to the appropriate piece of software. 331 This is shown in Figure 2. 333 +---------------------------------+ 334 | | 335 | +-------------+ +-------------+ | 336 | | UI | | UI | | 337 | +-------------+ +-------------+ | 338 | +-------------+ +-------------+ | 339 | | | | | | 340 | | Service 1 | | Service 2 | | 341 | | | | | | 342 | +-------------+ +-------------+ | 343 | +-----------------------------+ | 344 | | | | 345 | | SIP | | 346 | | Layer | | 347 | | | | 348 | +-----------------------------+ | 349 | | 350 +---------------------------------+ 352 Physical Device 354 Figure 2 356 The role of the SIP layer is to parse incoming messages, handle the 357 SIP state machinery for transactions and dialogs, and then dispatch 358 request to the appropriate service. For the example services in 359 Section 4.2, an incoming INVITE for the gaming service would be 360 delivered to the gaming application software. An incoming INVITE for 361 the voice chat service would be delivered to the voice chat 362 application software. For the examples in Section 4.3, a MESSAGE 363 request for user to user messaging would be delivered to the 364 messaging or SMS app, and a MESSAGE request containing configuration 365 data would be delivered to a configuration update application. 367 5.2. Application Invocation in the Network 369 Another usage of a service identifier would be to cause servers in 370 the SIP network to provide additional processing, based on the 371 service. For example, an INVITE issued by a user agent for IPTV 372 would pass through a server that does some kind of content rights 373 management, authorizing whether the user is allowed to access that 374 content. On the other hand, an INVITE issued by a user for 375 multimedia conferencing would pass through a server providing 376 "traditional" telephony features, such as outbound call screening and 377 call recording. It would make no sense for the INVITE associated 378 with IPTV to have outbound call screening and call recording applied, 379 and it would make no sense for the multimedia conferencing INVITE to 380 be processed by the content rights management server. Indeed, in 381 these cases, its not just an efficiency issue (invoking servers when 382 not needed), but rather, truly incorrect behavior can occur. For 383 example, if an outbound call screening application is set to block 384 outbound calls to everything except for the phone numbers of friends 385 and family, an IPTV request that gets processed by such a server 386 would be blocked (as its not targeted to the AOR of a friend or 387 family member). This would block a user's attempt to access IPTV 388 services, when that was not the goal at all. 390 Similarly, a MESSAGE request from Section 4.3 might need to pass 391 through a message server for filtering when it is associated with 392 chat, but not when it is associated with config. Consider a filter 393 which gets applied to MESSAGE requests, and that filter runs in a 394 server in the network. The filter operation prevents user Joe from 395 sending messages to user Bob that contain the words "stock" or 396 "purchase", due to some regulations that disallow Joe and Bob from 397 discussing stock trading. However, a MESSAGE for configuration 398 purposes might contain an XML document that uses the token "stock" as 399 some kind of attribute. This configuration update would be discarded 400 by the filtering server, when it should not have been. 402 5.3. Network Quality of Service Authorization 404 The IP network can provide differing levels of Quality of Service 405 (QoS) to IP packets. This service can include guaranteed throughput, 406 latency, or loss characteristics. Typically, the user agent will 407 make some kind of QoS request, either using explicit signaling 408 protocols (such as RSVP) or through marking of Diffserv value in 409 packets. The network will need to make a policy decision based on 410 whether these QoS treatments are authorized or not. One common 411 authorization policy is to check if the user has invoked a service 412 using SIP that they are authorized to invoke, and that this service 413 requires the level of QoS treatment the user has requested. 415 For example, consider IPTV and multimedia conferencing as described 416 in Section 4.1. IPTV is a non-real time service. Consequently, 417 media traffic for IPTV would be authorized for bandwidth guarantees, 418 but not for latency or loss guarantees. On the other hand, 419 multimedia conferencing is real time. Its traffic would require 420 bandwidth, loss and latency guarantees from the network. 422 Consequently, if a user should make an RSVP reservation for a media 423 stream, and ask for latency guarantees for that stream, the network 424 would like to be able to authorize it if the service was multimedia 425 conferencing, but not if it was IPTV. This would require the server 426 performing the QoS authorization to know the service associated with 427 the INVITE that set up the session. 429 5.4. Service Authorization 431 Frequently, a network administrator will want to authorize whether a 432 user is allowed to invoke a particular service. Not all users will 433 be authorized to use all services that are provided. For example, a 434 user may not be authorized to access IPTV services, whereas they are 435 authorized to utilize multimedia processing. A user might not be 436 able to utilize a multiplayer gaming service, whereas they are 437 authorized to utilize voice chat services. 439 Consequently, when an INVITE arrives at a proxy in the network, the 440 proxy will need to determine what the requested service is, so that 441 the proxy can make an authorization decision. 443 5.5. Accounting and Billing 445 Service authorization and accounting/billing go hand in hand. 446 Presumably, one of the primary reasons for authorizing that a user 447 can utilize a service is that they are being billed differently based 448 on the type of service. Consequently, one of the goals of a service 449 identity is to be able to include it in accounting records, so that 450 the appropriate billing model can be applied. 452 For example, in the case of IPTV, a service provider can bill based 453 on the content (US $5 per movie, perhaps), whereas for multimedia 454 conferencing, they can bill by the minute. This requires the 455 accounting streams to indicate which service was invoked for the 456 particular session. 458 5.6. Negotiation of Service 460 In some cases, when the caller initiates a session, they don't 461 actually know which service will be utilized. Rather, they might 462 like to offer up all of the services they have available to the 463 called party, and then let the called party decide, or let the system 464 make a decision based on overlapping service capabilities. 466 As an example, s user can do both the game and the voice chat service 467 of Section 4.2. They initiate a session to a target AOR, but the 468 devices used by that user can only support voice chat. Consequently, 469 voice chat gets utilized for the session. 471 5.7. Dispatch to Devices 473 When a user has multiple devices, each with varying capabilities in 474 terms of service, it is useful to dispatch an incoming request to the 475 right device based on whether the device can support the service that 476 has been requested. 478 For example, if a user initiates a gaming session with voice chat, 479 and the target user has two devices - one that can support the gaming 480 service, and the other that cannot, the INVITE should be dispatched 481 to the device which supports the gaming session. 483 6. Key Principles of Service Identification 485 In this section, we describe some of the key principles of performing 486 service identification. 488 6.1. Services are a By-Product of Signaling 490 Almost always, the first solution that people consider is to add some 491 kind of field to the signaling messages which indicates what the 492 service is. This field would then be inserted by the user agent, and 493 then can be used by the proxies and other user agent as a service 494 identifier. 496 This approach, however, misses a key point, which cannot be stressed 497 enough, and which represents the core architectural principle to be 498 understood here: 500 A service is the by-product of the signaling and the context 501 around it (the user profile, time-of-day and so on) - the effects 502 of the signaling message once launched into the network. The 503 service identity is therefore always derivable from the signaling 504 and its context without additional identifiers. 506 When a user sends an INVITE request to the network, and targets that 507 request at an IPTV server, and includes SDP for audio and video 508 streaming, the *result* of sending such an INVITE is that an IPTV 509 session occurs. The entire purpose of the INVITE is to establish 510 such a session, and therefore, invoke the service. Thus, a service 511 is not something that is different from the rest of the signaling 512 message. A service is what the user gets after the network and other 513 user agents have processed a signaling message. 515 This principle leads to another important conclusion: 517 If two services are different, but their signaling appears to be 518 the same, it is because there is in fact something different that 519 has been overlooked, or something has been implied from the 520 signaling which should have been signaled explicitly. 522 This makes sense; if a service is the byproduct of signaling, how can 523 a user have different experiences and different services when the 524 signaling message is the same? There has to be something different 525 in the messages, if the user experience was in fact different. 527 To illustrate this, let us take each of the example services in 528 Section 4 and investigate whether there is, or should be, something 529 different in the signaling in each case. 531 IPTV vs. Multimedia Conferencing: The two services in Section 4.1 532 appear to have identical signaling. They both involve audio and 533 video streams, both of which are unidirectional. Both might 534 utilize the same codecs. However, there is another important 535 difference in the signaling - the target URI. In the case of 536 IPTV, the request is targeted at a media server or to a particular 537 piece of content to be viewed. In the case of multimedia 538 conferencing, the target is a conference server. The 539 administrator of the domain can therefore examine the two Request- 540 URI, and figure out whether it is targeted for a conference server 541 or a content server, and use that to derive the service associated 542 with the request. 544 Gaming vs. Voice Chat: Though both sessions involve MSRP and voice, 545 and both are targeted to the same AOR of the called user, there is 546 a difference. The MSRP messages for the gaming session carry 547 content which is game specific, whereas the MSRP messages for the 548 voice chat are just regular text, meant for rendering to a user. 549 Thus, the MSRP session in the SDP will indicate the specific 550 content type that MSRP is carrying, and this type will differ in 551 both cases. Even if the game moves look like text, since they are 552 being consumed by an automata there is an underlying schema that 553 dictates their content, and therefore, this schema represents the 554 actual content type that should be signaled. 556 Configuration vs. Pager Messaging: Just as in the case of gaming vs. 557 voice chat, the content type of the messages differentiates the 558 service that occurs as a consequence of the messages. 560 This is ultimately an expression of the principle of DWIM vs. DWIS 561 (Do-What-I-Mean vs. Do-What-I-Say). Explicit signaling is DWIS - the 562 user is asking for a service by invoking the signaling that results 563 in the desired effect. A service identifier is DWIM - an unspecific 564 request for something that is ill-defined and non-interoperable. 566 6.2. Perils of Explicit Identifiers 568 Given that the information in the signaling message always conveys 569 enough information to identify the service, another important 570 conclusion can be drawn: 572 Inclusion of an explicit service identifier within a message is, 573 at best, redundant, and at worst, an avenue for fraud, loss of 574 interoperability, and stifling of service innovation. 576 By "explicit service identifier", we mean a field included in the 577 signaling message that contains a token whose value indicates the 578 specific service invoked by the calling user. This would be "IPTV" 579 or "voice chat" or "shoot-em game" or "short message service". This 580 explicit identifier would typically be inserted by the originating 581 user agent, and carried in the signaling message. 583 Clearly, if the signaling message itself contains enough information 584 to identify the service, inclusion of an extra field to say the same 585 thing is going to be redundant. Redundancy by itself is not a big 586 deal. However, redundancy can lead to other,more significant 587 problems. 589 6.2.1. Fraud 591 First and foremost, it can lead to fraud. If a provider uses the 592 service identifier for billing and accounting purposes, or for 593 authorization purposes, it opens an avenue for attack. The user can 594 construct the signaling message so that its actual effect (which is 595 the service the user will receive), is what the user desires, but the 596 service identity (which is what is used for billing and 597 authorization) doesn't match, and indicates a cheaper service, or one 598 that the user is authorized to receive. If, however, the service 599 identity used by the domain admistrator is derived from the signaling 600 itself, the user cannot lie. If they did lie, they wouldn't get the 601 desired service. 603 Consider the example of IPTV vs. multimedia conferencing. If 604 multimedia conferencing is cheaper, the user could send an INVITE for 605 an IPTV session, but include a service identifier which indicates 606 multimedia conferencing. They get the service associated with IPTV, 607 but at the cost of multimedia conferencing. 609 This same principle shows up in other places. For example, in the 610 identification of an emergency services call [6]. It is desirable to 611 give emergency services calls special treatment, such as being free, 612 authorized even when the user cannot otherwise make calls, and to 613 give them priority. If emergency calls where indicated through 614 something other than the target of the call being an emergency 615 services URN [7], it would open an avenue for fraud. The user could 616 place any desired URI in the request-URI, and indicate that the call 617 is an emergency services call. This could would then get special 618 treatment, but of course get routed to the target URI. The only way 619 to prevent this fraud is to consider an emergency call as any call 620 whose target is an emergency services URN. Thus, the service 621 identification here is based on the target of the request. When the 622 target is an emergency services URN, the request can get special 623 treatment. The user cannot lie, since there is no way to separately 624 indicate this is an emergency call, besides targeting it to an 625 emergency URN. 627 6.2.2. Systematic Interoperability Failures 629 How can inclusion of an explicit service identifier cause loss of 630 interoperability? When such an identifier is used to drive 631 functionality - such as dispatch on the phones, in the network, or 632 QoS authorization, it means that the wrong thing can happen when this 633 field is not set properly. Consider a user in domain 1, calling a 634 user in domain 2. Domain 1 provides the user with a service they 635 call "voice chat", which utilizes voice and IM for real time 636 conversation, driven off of a buddy list application on a PC. Domain 637 2 provides their users with a service they call, "text telephony", 638 which is a voice service on a wireless device that also allows the 639 user to send text messages. Consider the case where domain 1 and 640 domain 2 both have their user agents insert a service identifiers 641 into the request, and then use that to derive QoS authorization, 642 accounting, and invocation of applications in the network and in the 643 device. The user in domain 1 calls the user in domain 2, and inserts 644 the identifier "Voice Chat" into the INVITE. When this arrives at 645 the proxy in domain 2, the service is unknown. Consequently, the 646 request does not get the proper QoS treatment. When it gets 647 delivered to the User Agent of the user in domain 2, the user agent 648 does not see a service it understands, and so consequently, does not 649 know to dispatch the request to the right application software. 650 Thus, this call has completely failed, even when it could have 651 succeeded. This illustrates the following key point: 653 Explicit service identifiers, used between domains, cause 654 interoperability failures unless all interconnected domains agree 655 on exactly the same set of services and how to name them. 657 Of course, lack of service identifiers does not guarantee service 658 interoperability. However, SIP was built with rich tools for 659 negotiation of capabilities at a finely granular level. One user 660 agent can make a call using audio and video, but if the receiving UA 661 only supports audio, SIP allows both sides to negotiate down to the 662 lowest common denominator. Thus, communications is still provided. 663 As another example, if one agent initiates a Push-To-Talk session 664 (which is audio with a companion floor control mechanism), and the 665 other side only did regular audio, SIP would be able to negotiate 666 back down to a regular voice call. As another example, if a calling 667 user agent is running a high-definition video conferencing endpoint, 668 and the called user agent supports just a regular video endpoint, the 669 codecs themselves can negotiate downward to a lower rate, picture 670 size, and so on. Thus, interoperability is achieved. Interestingly, 671 the final "service" may no longer be well characterized by the 672 service identifier that would have been placed in the original 673 INVITE. For example, in this case, of the original INVITE from the 674 caller had contained the service identifier, "hi-fi video", but the 675 video gets negotiated down to a lower rate and picture size, the 676 service identifier is no longer really appropriate. 678 This illustrates another key aspect of the interoperability problem: 680 Usage of explicit service identifiers in the request will result 681 in inconsistencies with results of any SIP negotiation that might 682 otherwise be applied in the session. 684 Of course, there are cases where negotiating to a common baseline is 685 not what is desired. SIP provides tools (such as Require), to force 686 the call to fail unless the desired capabilities are supported. 687 However, this is not recommended as a general rule [4]. 689 When a service identifier becomes something that both proxies and the 690 user agent need to understand in order to properly treat a request, 691 it becomes equivalent to including a token in the Proxy-Require and 692 Require header fields of every single SIP request. The very reason 693 that RFC 4485 frowns upon usage of Require and certainly Proxy- 694 Require is the huge impact on interoperability it causes. It is for 695 this same reason that explicit service identifiers need to be 696 avoided: 698 The usage of explicit service identifiers is equivalent to the 699 usage of Require and Proxy-Require in the request, and has the 700 same negative impact on interoperability as those headers have. 702 6.2.3. Stifling of Service Innovation 704 The probability that any two pair of service providers end up with 705 the same set of services, and give them the same names, becomes 706 decreasingly small as the number of providers grow. Indeed, it would 707 almost certainly require a centralized authority to identify what the 708 services are, how they work, and what they are named. This, in turn, 709 leads to a requirement for complete homogeneity in order to 710 facilitate interconnection. Two providers cannot usefully 711 interconnect unless they agree on the set of services they are 712 offering to their customers, and each do the same thing. This is, in 713 a very real sense, anathema to the entire notion of SIP, which is 714 built on the idea that heterogeneous domains can interconnect and 715 still get interoperability: 717 Explicit service identifiers lead to a requirement for homogeneity 718 in service definitions across providers that interconnect, ruining 719 the very service heterogeneity that SIP was meant to bring. 721 Indeed, Metcalfe's law says that the value of a network grows with 722 the square of the number of participants. As a consequence of this, 723 once a bunch of large domains did get together, agree on a set of 724 services, and then a set of well-known identifiers for those 725 services, it would force other providers to also deploy the same 726 services, in order to obtain the value that interconnection brings. 727 This, in turn, will stifle innovation, and quickly force the set of 728 services in SIP to become fixed and never expand beyond the ones 729 initially agreed upon. This, too, is anathema to the very framework 730 on which SIP is built, and defeats much of the purpose of why 731 providers have chosen to deploy SIP in their own networks: 733 Metcalfe's law, when combined with explicit service identifiers, 734 will stifle the ability of providers to develop new SIP services, 735 since they have no hope of interconnecting them with anyone else. 737 Consider the following example. Several providers get together, and 738 standardize on a bunch of service identifiers. One of these uses 739 audio and video (say, "multimedia conversation"). This service is 740 successful, and is widely utilized. Endpoints look for this 741 identifier to dispatch calls to the right software applications, and 742 the network looks for it to invoke features, perform accouting, and 743 QoS. A new provider gets the idea for a new service, say, avatar- 744 enhanced multimedia conversation. In this service, there is audio 745 and video, but there is a third stream, which renders an avatar. A 746 caller can press buttons on their phone, to cause the avatar on the 747 other person's device to show emotion, make noise, and so on. This 748 is similar to the way emoticons are used today in IM. This service 749 is enabled by adding a third media stream (and consequently, third 750 m-line) to the SDP. 752 Normally, this service would be backwards compatible with a regular 753 audio-video endpoint, which would just reject the third media stream. 754 However, because a large network has been deployed that is expecting 755 to see the token, "multimedia conversation" and its associated audio+ 756 video service, it is nearly impossible for the new provider to roll 757 out this new service. If they did, it would fail completely, or 758 partially fail, when their users call users in other provider 759 domains. 761 7. Recommendations 763 From these principles, several recommendations can be made: 765 o Systems needing to perform service identification must examine 766 existing signaling constructs to identify the service based on 767 fields that exist within the signaling message already. 769 o If it appears that the signaling currently defined in standards is 770 not sufficient to identify the service, it may be due to lack of 771 sufficient signaling to convey what is needed, and new standards 772 work should be undertaken to fill this gap. 774 o The usage of an explicit service identifier does make sense as a 775 way to cache a decision made by a network element, for usage by 776 another network element within the same domain. However, service 777 identifiers are fundamentally useful within a particular domain, 778 and any such header must be stripped at a network boundary. 780 8. Security Considerations 782 Oftentimes, the service associated with a request is utilized for 783 purposes such as authorization, accounting, and billing. When 784 service identification is not done properly, the possibility of 785 network fraud is introduced. It is for this reason, discussed 786 extensively in Section 6.2.1, that the usage of explicit service 787 identifiers inserted by a UA is NOT RECOMMENDED. 789 9. IANA Considerations 791 There are no IANA considerations associated with this specification. 793 10. Acknowledgements 795 This document is based on discussions with Paul Kyzivat and Andrew 796 Allen, who contributed significantly to the ideas here. Much of the 797 content in this draft is a result of discussions amongst participants 798 in the SIPPING mailing list, including Dean Willis, Tom Taylor, Eric 799 Burger, Dale Worley, Christer Holmberg, and John Elwell, amongst many 800 others. 802 11. References 804 11.1. Normative References 806 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 807 Levels", BCP 14, RFC 2119, March 1997. 809 [2] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., 810 Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: 811 Session Initiation Protocol", RFC 3261, June 2002. 813 11.2. Informational References 815 [3] Rosenberg, J., "A Data Model for Presence", RFC 4479, July 2006. 817 [4] Rosenberg, J. and H. Schulzrinne, "Guidelines for Authors of 818 Extensions to the Session Initiation Protocol (SIP)", RFC 4485, 819 May 2006. 821 [5] Campbell, B., "The Message Session Relay Protocol", 822 draft-ietf-simple-message-sessions-19 (work in progress), 823 February 2007. 825 [6] Rosen, B., "Framework for Emergency Calling in Internet 826 Multimedia", draft-ietf-ecrit-framework-01 (work in progress), 827 March 2007. 829 [7] Schulzrinne, H., "A Uniform Resource Name (URN) for Services", 830 draft-ietf-ecrit-service-urn-06 (work in progress), March 2007. 832 [8] Campbell, B., Rosenberg, J., Schulzrinne, H., Huitema, C., and 833 D. Gurle, "Session Initiation Protocol (SIP) Extension for 834 Instant Messaging", RFC 3428, December 2002. 836 Author's Address 838 Jonathan Rosenberg 839 Cisco 840 Edison, NJ 841 US 843 Email: jdrosen@cisco.com 844 URI: http://www.jdrosen.net 846 Full Copyright Statement 848 Copyright (C) The IETF Trust (2007). 850 This document is subject to the rights, licenses and restrictions 851 contained in BCP 78, and except as set forth therein, the authors 852 retain all their rights. 854 This document and the information contained herein are provided on an 855 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 856 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 857 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 858 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 859 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 860 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 862 Intellectual Property 864 The IETF takes no position regarding the validity or scope of any 865 Intellectual Property Rights or other rights that might be claimed to 866 pertain to the implementation or use of the technology described in 867 this document or the extent to which any license under such rights 868 might or might not be available; nor does it represent that it has 869 made any independent effort to identify any such rights. Information 870 on the procedures with respect to rights in RFC documents can be 871 found in BCP 78 and BCP 79. 873 Copies of IPR disclosures made to the IETF Secretariat and any 874 assurances of licenses to be made available, or the result of an 875 attempt made to obtain a general license or permission for the use of 876 such proprietary rights by implementers or users of this 877 specification can be obtained from the IETF on-line IPR repository at 878 http://www.ietf.org/ipr. 880 The IETF invites any interested party to bring to its attention any 881 copyrights, patents or patent applications, or other proprietary 882 rights that may cover technology that may be required to implement 883 this standard. Please address the information to the IETF at 884 ietf-ipr@ietf.org. 886 Acknowledgment 888 Funding for the RFC Editor function is provided by the IETF 889 Administrative Support Activity (IASA).