idnits 2.17.1 draft-rosenberg-sipping-service-identification-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 850. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 861. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 868. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 874. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 30 instances of too long lines in the document, the longest one being 3 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 7, 2007) is 6171 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '14' is mentioned on line 199, but not defined == Missing Reference: '15' is mentioned on line 201, but not defined == Outdated reference: A later version (-13) exists of draft-ietf-ecrit-framework-00 == Outdated reference: A later version (-07) exists of draft-ietf-ecrit-service-urn-05 Summary: 2 errors (**), 0 flaws (~~), 6 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SIPPING J. Rosenberg 3 Internet-Draft Cisco 4 Intended status: Best Current May 7, 2007 5 Practice 6 Expires: November 8, 2007 8 Identification of Communications Services in the Session Initiation 9 Protocol (SIP) 10 draft-rosenberg-sipping-service-identification-02 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware 16 have been or will be disclosed, and any of which he or she becomes 17 aware will be disclosed, in accordance with Section 6 of BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on November 8, 2007. 37 Copyright Notice 39 Copyright (C) The IETF Trust (2007). 41 Abstract 43 This document considers the problem of service identification in the 44 Session Initiation Protocol (SIP). Service identification is the 45 process of determining the user-level use case that is driving the 46 signaling being utilized by the user agent. While seemingly simple, 47 this process is quite complex, and when not addressed properly, can 48 lead to fraud, interoperability problems, and stifling of innovation. 50 This document discusses these problems and makes recommendations on 51 how to address them. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 57 3. Services and Service Identification . . . . . . . . . . . . . 4 58 4. Example Services . . . . . . . . . . . . . . . . . . . . . . . 6 59 4.1. IPTV vs. Multimedia . . . . . . . . . . . . . . . . . . . 6 60 4.2. Gaming vs. Voice Chat . . . . . . . . . . . . . . . . . . 7 61 4.3. Configuration vs. Pager Messaging . . . . . . . . . . . . 7 62 5. Using Service Identification . . . . . . . . . . . . . . . . . 7 63 5.1. Application Invocation in the User Agent . . . . . . . . . 8 64 5.2. Application Invocation in the Network . . . . . . . . . . 8 65 5.3. Network Quality of Service Authorization . . . . . . . . . 9 66 5.4. Service Authorization . . . . . . . . . . . . . . . . . . 10 67 5.5. Accounting and Billing . . . . . . . . . . . . . . . . . . 10 68 5.6. Negotiation of Service . . . . . . . . . . . . . . . . . . 10 69 5.7. Dispatch to Devices . . . . . . . . . . . . . . . . . . . 11 70 6. Key Principles of Service Identification . . . . . . . . . . . 11 71 6.1. Services are a By-Product of Signaling . . . . . . . . . . 11 72 6.2. Perils of Explicit Identifiers . . . . . . . . . . . . . . 13 73 6.2.1. Fraud . . . . . . . . . . . . . . . . . . . . . . . . 13 74 6.2.2. Systematic Interoperability Failures . . . . . . . . . 14 75 6.2.3. Stifling of Service Innovation . . . . . . . . . . . . 15 76 7. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 17 77 8. Security Considerations . . . . . . . . . . . . . . . . . . . 17 78 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 79 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17 80 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 81 11.1. Normative References . . . . . . . . . . . . . . . . . . . 18 82 11.2. Informational References . . . . . . . . . . . . . . . . . 18 83 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 18 84 Intellectual Property and Copyright Statements . . . . . . . . . . 19 86 1. Introduction 88 The Session Initiation Protocol (SIP) [2] defines mechanisms for 89 initiating and managing communications sessions between agents. SIP 90 allows for a broad array of session types between agents. It can 91 manage audio sessions, ranging from low bitrate voice-only up to 92 multi-channel hi fidelity music. It can manage video sessions, 93 ranging from small, "talking-head" style video chat, up to high 94 definition multipoint video conferencing, to low bandwidth user- 95 generated content, up to high definition movie and TV content. SIP 96 endpoints can be anything - adaptors that convert an old analog 97 telephone to Voice over IP (VoIP), dedicated hardphones, fancy 98 hardphones with rich displays and user entry capabilities, softphones 99 on a PC, buddylist and presence applications on a PC, dedicated 100 videoconferencing peripherals, and speakerphones. 102 This breadth of applicability is SIPs greatest asset, but it also 103 introduces numerous challenges. One of these is that, when an 104 endpoint generates a SIP INVITE for a session, or receives one, that 105 session can potentially be within the context of any number of 106 different use cases and endpoint types. For example, a SIP INVITE 107 with a single audio stream could represent a Push-To-Talk session 108 between mobile devices, a VoIP session between softphones, or audio- 109 based access to stored content on a server. 111 These differing use cases have driven implementors and system 112 designers to seek techniques for service identification. Service 113 identification is the process of determining and/or signaling the 114 specific use case that is driving the signaling being generated by a 115 user agent. At first glance, this seems harmless and easy enough. 116 It is tempting to define a new header, "Service-ID", for example, and 117 have a user agent populate it with any number of well-known tokens 118 which define what the service is. This information could then be 119 consumed for any number of purposes. 121 However, as this document will demonstrate, service identification is 122 a very complex and difficult process, and can very easily lead to 123 fraud, systemic interoperability failures, and a complete stifling of 124 the innovation that SIP was meant to achieve. 126 Section 3 begins by defining a service and the service identification 127 problem. Section 4 gives some concrete examples of services and why 128 they can be challenging to identify. Section 5 explores the ways in 129 which a service identification can be utilized within a network. 130 Next, Section 6 discusses the key architectural principles of service 131 identification, and how explicit service identifiers can lead to 132 fraud, interoperability failures, and stifling of service innovation. 134 2. Terminology 136 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 137 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 138 document are to be interpreted as described in RFC 2119 [1]. 140 3. Services and Service Identification 142 The problem of identifying services within SIP is not a new one. The 143 problem has been considered extensively in the context of presence. 144 In particular, the presence data model for SIP [3] defines the 145 concept of a service as one of the core notions that presence 146 describes. Services are described in Section 3.3 of RFC 4479, which 147 has this to say on the topic: 149 3.3. Service 151 Each presentity has access to a number of services. Each of these 152 represents a point of reachability for communications that can be 153 used to interact with the user. Examples of services are telephony 154 (that is, traditional circuit-based telephone service), push-to-talk, 155 instant messaging, Short Message Service (SMS), and Multimedia 156 Message Service (MMS). 158 It is difficult to give a precise definition for service. One 159 reasonable approach is to model each software or hardware agent in 160 the system as a service. If a user starts a softphone application on 161 their PC, then that represents a service. If a user has a videophone 162 device, then that represents another service. This is effectively a 163 physical view of services. This definition, however, starts to fall 164 apart when a service is spread across multiple software agents or 165 devices. For example, a SIP URI representing an address-of-record 166 can be routed to a softphone or a videophone, or both. In that case, 167 one might attempt instead to define a service based on its address on 168 the network. This definition also falls apart when modeling devices 169 or applications that receive calls and dispatch them to different 170 "helpers" based on potentially complex logic. For example, a 171 cellular telephone might house multiple SIP applications, each of 172 which can "register" different handlers based on the method or even 173 body type of the request. Each of those applications or handlers can 174 rightfully be considered a service, but it doesn't have an address on 175 the network distinct from the others. 177 Because of this inherent difficulty in precisely defining a service, 178 the data model doesn't try to constrain what can be considered a 179 service. Rather, anything can be considered a service so long as it 180 exhibits a set of key properties defined by this model. In 181 particular, each service is associated with characteristics that 182 identify the nature and capabilities of that service, with reach 183 information that indicates how to connect to the service, with status 184 information representing the state of that service, and relative 185 information that describes the ways in which that service relates to 186 others associated with the presentity. 188 As a consequence, in this model, services are not explicitly 189 enumerated. There is no central registry where one finds identifiers 190 for each service. Consequently, each service does not have a single 191 "service" attribute with values such as "ptt" or "telephony". That 192 doesn't mean that these consolidated monikers aren't useful; indeed, 193 they represent an essential summary of what the service is. Such 194 summarization is useful in creating icons that allow a user to choose 195 one service over another. A watcher is free to create such 196 summarization information from any of the information associated with 197 a service. The reach information often provides valuable information 198 for creating such a summarization. Oftentimes, the scheme of the URI 199 is synonymous with the view of what a service is. An "sms" URI [14] 200 clearly indicates SMS, for example. For some URIs, there may be many 201 services available, for example, SIP or tel [15], in which case the 202 scheme is less meaningful as a way of creating a summary. The reach 203 information could also indicate that certain application software has 204 to be invoked (such as a videogame), in which case that aspect of the 205 reach information would be useful for generating an iconic 206 representation of the game. 208 Essentially, the service is the user-visible use case that is driving 209 the behavior of the user-agents and servers in the SIP network. 210 Being user-visible means that there is a difference in user 211 experience between two services that are different. This user 212 experience can be based on different media types (an audio call vs. a 213 video chat), different content within a particular media type (stored 214 content, such as a movie or TV session), different devices (a 215 wireless device for "telephony" vs. a PC application for "voice- 216 chat"), different user interfaces (a buddy list view of voice on a PC 217 application vs. a software emulation of a hard phone), different 218 communities that can be accessed (voice chat with other users that 219 have the same voice chat client, vs. voice communications with any 220 endpoint on the PSTN), or different applications that are invoked by 221 the user (manually selecting a push-to-talk application from a 222 wireless phone vs. a telephony application). 224 In some cases, there is very little difference in the underlying 225 technology that will support two different services, and in other 226 cases, there are big differences. However, for purposes of this 227 discussion, the key definition is that two services are distinct when 228 there is a perceived difference by the user in the two services. 230 This leads naturally to the desire to perform service identification. 231 Service identification is defined as the process of (1) determination 232 of the underlying service which is driving a particular signaling 233 exchange, (2) associating that service with some kind of moniker, and 234 (3) attaching that moniker to a signaling message (typically a SIP 235 INVITE), and then utilizing it for various purposes within the 236 network. Service identification can be done in the endpoints, in 237 which case the UA would insert the moniker directly into the 238 signaling message based on its awareness of the service. Or, it can 239 be done within a proxy in the network, based on inspection of the SIP 240 message, or based on hints placed into the message by the user. 242 4. Example Services 244 It is very useful to consider several example services, especially 245 ones that appear difficult to differentiate from each other. 247 4.1. IPTV vs. Multimedia 249 IP Television (IPTV) is the usage of IP networks to access 250 traditional television content, such as movies and shows. SIP can be 251 utilized to establish a session to a media server in a network, which 252 then serves up multimedia content and streams it as an audio and 253 video stream towards the client. Whether SIP is ideal for IPTV is, 254 in itself, a good question. However, such a discussion is outside 255 the scope of this document. 257 Consider multimedia conferencing. The user accesses a voice and 258 video conference at a conference server. The user might join in 259 listen-only mode, in which case the user receives audio and video 260 streams, but does not send. 262 These two services - IPTV and multimedia conferencing, clearly appear 263 as different services. They have different user experiences and 264 applications. A user is unlikely to ever be confused about whether a 265 session is IPTV or multimedia conferencing. Indeed, they are likely 266 to have different software applications or endpoints for the two 267 services. 269 However, these two services look remarkably alike based on the 270 signaling. Both utilize audio and video. Both could utilize the 271 same codecs. Both are unidirectional streams (from a server in the 272 network to the client). Thus, it would appear on the surface that 273 there is no way to differentiate them, based on inspection of the 274 signaling alone. 276 4.2. Gaming vs. Voice Chat 278 Consider an interactive game, played between two users from their 279 mobile devices. The game involves the users sending each other game 280 moves, using a messaging channel, in addition to voice. In another 281 service, users have a voice and IM chat conversation using a buddy 282 list application on their PC. 284 In both services, there are two media streams - audio and messaging. 285 The audio uses the same codecs. Both use the Message Session Relay 286 Protocol (MSRP) [5]. In both cases, the caller would send an INVITE 287 to the AOR of the target user. However, these represent fairly 288 different services, in terms of user experience. 290 4.3. Configuration vs. Pager Messaging 292 The SIP MESSAGE method [8] provides a way to send one-shot messages 293 to a particular AOR. This specification is primarily aimed at Short 294 Message Service (SMS) style messaging, commonly found in wireless 295 phones. Receipt of a MESSAGE request would cause the messaging 296 application on a phone to launch, allowing the user to browse message 297 history and respond. 299 However, MESSAGE is sometimes used for the delivery of content to a 300 device for other purposes. For example, some providers use it to 301 deliver configuration updates, such as new phone settings or 302 parameters, or to indicate that a new version of firmware is 303 available. Though not designed for this purpose, MESSAGE gets used 304 since, in existing wireless networks, SMS are used for this purpose, 305 and MESSAGE is the SIP equivalent of SMS. 307 Consequently, the MESSAGE request sent to a phone can be for two 308 different services. One would require invocation of a messaging app, 309 whereas the other would be consumed by the software in the phone, 310 without any user interaction at all. 312 5. Using Service Identification 314 It is important to understand what the service identity would be 315 utilized for, if known. The discussions in Section 4 give some hints 316 to the possible usages. Here, we explicitly discuss them. 318 5.1. Application Invocation in the User Agent 320 In some of the examples above, there were multiple software 321 applications running within a single user agent. When an incoming 322 INVITE or MESSAGE arrives, it must be delivered to the appropriate 323 application software. When each service is bound to a distinct 324 software application, it would seem that the service identity is 325 needed to dispatch the message to the appropriate piece of software. 326 This is shown in Figure 2. 328 +---------------------------------+ 329 | | 330 | +-------------+ +-------------+ | 331 | | UI | | UI | | 332 | +-------------+ +-------------+ | 333 | +-------------+ +-------------+ | 334 | | | | | | 335 | | Service 1 | | Service 2 | | 336 | | | | | | 337 | +-------------+ +-------------+ | 338 | +-----------------------------+ | 339 | | | | 340 | | SIP | | 341 | | Layer | | 342 | | | | 343 | +-----------------------------+ | 344 | | 345 +---------------------------------+ 347 Physical Device 349 Figure 2 351 The role of the SIP layer is to parse incoming messages, handle the 352 SIP state machinery for transactions and dialogs, and then dispatch 353 request to the appropriate service. For the example services in 354 Section 4.2, an incoming INVITE for the gaming service would be 355 delivered to the gaming application software. An incoming INVITE for 356 the voice chat service would be delivered to the voice chat 357 application software. For the examples in Section 4.3, a MESSAGE 358 request for user to user messaging would be delivered to the 359 messaging or SMS app, and a MESSAGE request containing configuration 360 data would be delivered to a configuration update application. 362 5.2. Application Invocation in the Network 364 Another usage of a service identifier would be to cause servers in 365 the SIP network to provide additional processing, based on the 366 service. For example, an INVITE issued by a user agent for IPTV 367 would pass through a server that does some kind of content rights 368 management, authorizing whether the user is allowed to access that 369 content. On the other hand, an INVITE issued by a user for 370 multimedia conferencing would pass through a server providing 371 "traditional" telephony features, such as outbound call screening and 372 call recording. It would make no sense for the INVITE associated 373 with IPTV to have outbound call screening and call recording applied, 374 and it would make no sense for the multimedia conferencing INVITE to 375 be processed by the content rights management server. Indeed, in 376 these cases, its not just an efficiency issue (invoking servers when 377 not needed), but rather, truly incorrect behavior can occur. For 378 example, if an outbound call screening application is set to block 379 outbound calls to everything except for the phone numbers of friends 380 and family, an IPTV request that gets processed by such a server 381 would be blocked (as its not targeted to the AOR of a friend or 382 family member). This would block a user's attempt to access IPTV 383 services, when that was not the goal at all. 385 Similarly, a MESSAGE request from Section 4.3 might need to pass 386 through a message server for filtering when it is associated with 387 chat, but not when it is associated with config. Consider a filter 388 which gets applied to MESSAGE requests, and that filter runs in a 389 server in the network. The filter operation prevents user Joe from 390 sending messages to user Bob that contain the words "stock" or 391 "purchase", due to some regulations that disallow Joe and Bob from 392 discussing stock trading. However, a MESSAGE for configuration 393 purposes might contain an XML document that uses the token "stock" as 394 some kind of attribute. This configuration update would be discarded 395 by the filtering server, when it should not have been. 397 5.3. Network Quality of Service Authorization 399 The IP network can provide differing levels of Quality of Service 400 (QoS) to IP packets. This service can include guaranteed throughput, 401 latency, or loss characteristics. Typically, the user agent will 402 make some kind of QoS request, either using explicit signaling 403 protocols (such as RSVP) or through marking of Diffserv value in 404 packets. The network will need to make a policy decision based on 405 whether these QoS treatments are authorized or not. One common 406 authorization policy is to check if the user has invoked a service 407 using SIP that they are authorized to invoke, and that this service 408 requires the level of QoS treatment the user has requested. 410 For example, consider IPTV and multimedia conferencing as described 411 in Section 4.1. IPTV is a non-real time service. Consequently, 412 media traffic for IPTV would be authorized for bandwidth guarantees, 413 but not for latency or loss guarantees. On the other hand, 414 multimedia conferencing is real time. Its traffic would require 415 bandwidth, loss and latency guarantees from the network. 417 Consequently, if a user should make an RSVP reservation for a media 418 stream, and ask for latency guarantees for that stream, the network 419 would like to be able to authorize it if the service was multimedia 420 conferencing, but not if it was IPTV. This would require the server 421 performing the QoS authorization to know the service associated with 422 the INVITE that set up the session. 424 5.4. Service Authorization 426 Frequently, a network administrator will want to authorize whether a 427 user is allowed to invoke a particular service. Not all users will 428 be authorized to use all services that are provided. For example, a 429 user may not be authorized to access IPTV services, whereas they are 430 authorized to utilize multimedia processing. A user might not be 431 able to utilize a multiplayer gaming service, whereas they are 432 authorized to utilize voice chat services. 434 Consequently, when an INVITE arrives at a proxy in the network, the 435 proxy will need to determine what the requested service is, so that 436 the proxy can make an authorization decision. 438 5.5. Accounting and Billing 440 Service authorization and accounting/billing go hand in hand. 441 Presumably, one of the primary reasons for authorizing that a user 442 can utilize a service is that they are being billed differently based 443 on the type of service. Consequently, one of the goals of a service 444 identity is to be able to include it in accounting records, so that 445 the appropriate billing model can be applied. 447 For example, in the case of IPTV, a service provider can bill based 448 on the content (US $5 per movie, perhaps), whereas for multimedia 449 conferencing, they can bill by the minute. This requires the 450 accounting streams to indicate which service was invoked for the 451 particular session. 453 5.6. Negotiation of Service 455 In some cases, when the caller initiates a session, they don't 456 actually know which service will be utilized. Rather, they might 457 like to offer up all of the services they have available to the 458 called party, and then let the called party decide, or let the system 459 make a decision based on overlapping service capabilities. 461 As an example, s user can do both the game and the voice chat service 462 of Section 4.2. They initiate a session to a target AOR, but the 463 devices used by that user can only support voice chat. Consequently, 464 voice chat gets utilized for the session. 466 5.7. Dispatch to Devices 468 When a user has multiple devices, each with varying capabilities in 469 terms of service, it is useful to dispatch an incoming request to the 470 right device based on whether the device can support the service that 471 has been requested. 473 For example, if a user initiates a gaming session with voice chat, 474 and the target user has two devices - one that can support the gaming 475 service, and the other that cannot, the INVITE should be dispatched 476 to the device which supports the gaming session. 478 6. Key Principles of Service Identification 480 In this section, we describe some of the key principles of performing 481 service identification. 483 6.1. Services are a By-Product of Signaling 485 Almost always, the first solution that people consider is to add some 486 kind of field to the signaling messages which indicates what the 487 service is. This field would then be inserted by the user agent, and 488 then can be used by the proxies and other user agent as a service 489 identifier. 491 This approach, however, misses a key point, which cannot be stressed 492 enough, and which represents the core architectural principle to be 493 understood here: 495 A service is the by-product of the signaling - the effects of the 496 signaling message once launched into the network. The service 497 identity is therefore always derivable from the signaling without 498 additional identifiers. 500 When a user sends an INVITE request to the network, and targets that 501 request at an IPTV server, and includes SDP for audio and video 502 streaming, the *result* of sending such an INVITE is that an IPTV 503 session occurs. The entire purpose of the INVITE is to establish 504 such a session, and therefore, invoke the service. Thus, a service 505 is not something that is different from the rest of the signaling 506 message. A service is what the user gets after the network and other 507 user agents have processed a signaling message. 509 This principle leads to another important conclusion: 511 If two services are different, but their signaling appears to be 512 the same, it is because there is in fact something different that 513 has been overlooked, or something has been implied from the 514 signaling which should have been signaled explicitly. 516 This makes sense; if a service is the byproduct of signaling, how can 517 a user have different experiences and different services when the 518 signaling message is the same? There has to be something different 519 in the messages, if the user experience was in fact different. 521 To illustrate this, let us take each of the example services in 522 Section 4 and investigate whether there is, or should be, something 523 different in the signaling in each case. 525 IPTV vs. Multimedia Conferencing: The two services in Section 4.1 526 appear to have identical signaling. They both involve audio and 527 video streams, both of which are unidirectional. Both might 528 utilize the same codecs. However, there is another important 529 difference in the signaling - the target URI. In the case of 530 IPTV, the request is targeted at a media server or to a particular 531 piece of content to be viewed. In the case of multimedia 532 conferencing, the target is a conference server. The 533 administrator of the domain can therefore examine the two Request- 534 URI, and figure out whether it is targeted for a conference server 535 or a content server, and use that to derive the service associated 536 with the request. 538 Gaming vs. Voice Chat: Though both sessions involve MSRP and voice, 539 and both are targeted to the same AOR of the called user, there is 540 a difference. The MSRP messages for the gaming session carry 541 content which is game specific, whereas the MSRP messages for the 542 voice chat are just regular text, meant for rendering to a user. 543 Thus, the MSRP session in the SDP will indicate the specific 544 content type that MSRP is carrying, and this type will differ in 545 both cases. Even if the game moves look like text, since they are 546 being consumed by an automata there is an underlying schema that 547 dictates their content, and therefore, this schema represents the 548 actual content type that should be signaled. 550 Configuration vs. Pager Messaging: Just as in the case of gaming vs. 551 voice chat, the content type of the messages differentiates the 552 service that occurs as a consequence of the messages. 554 This is ultimately an expression of the principle of DWIM vs. DWIS 555 (Do-What-I-Mean vs. Do-What-I-Say). Explicit signaling is DWIS - the 556 user is asking for a service by invoking the signaling that results 557 in the desired effect. A service identifier is DWIM - an unspecific 558 request for something that is ill-defined and non-interoperable. 560 6.2. Perils of Explicit Identifiers 562 Given that the information in the signaling message always conveys 563 enough information to identify the service, another important 564 conclusion can be drawn: 566 Inclusion of an explicit service identifier within a message is, 567 at best, redundant, and at worst, an avenue for fraud, loss of 568 interoperability, and stifling of service innovation. 570 By "explicit service identifier", we mean a field included in the 571 signaling message that contains a token whose value indicates the 572 specific service invoked by the calling user. This would be "IPTV" 573 or "voice chat" or "shoot-em game" or "short message service". This 574 explicit identifier would typically be inserted by the originating 575 user agent, and carried in the signaling message. 577 Clearly, if the signaling message itself contains enough information 578 to identify the service, inclusion of an extra field to say the same 579 thing is going to be redundant. Redundancy by itself is not a big 580 deal. However, redundancy can lead to other,more significant 581 problems. 583 6.2.1. Fraud 585 First and foremost, it can lead to fraud. If a provider uses the 586 service identifier for billing and accounting purposes, or for 587 authorization purposes, it opens an avenue for attack. The user can 588 construct the signaling message so that its actual effect (which is 589 the service the user will receive), is what the user desires, but the 590 service identity (which is what is used for billing and 591 authorization) doesn't match, and indicates a cheaper service, or one 592 that the user is authorized to receive. If, however, the service 593 identity used by the domain admistrator is derived from the signaling 594 itself, the user cannot lie. If they did lie, they wouldn't get the 595 desired service. 597 Consider the example of IPTV vs. multimedia conferencing. If 598 multimedia conferencing is cheaper, the user could send an INVITE for 599 an IPTV session, but include a service identifier which indicates 600 multimedia conferencing. They get the service associated with IPTV, 601 but at the cost of multimedia conferencing. 603 This same principle shows up in other places. For example, in the 604 identification of an emergency services call [6]. It is desirable to 605 give emergency services calls special treatment, such as being free, 606 authorized even when the user cannot otherwise make calls, and to 607 give them priority. If emergency calls where indicated through 608 something other than the target of the call being an emergency 609 services URN [7], it would open an avenue for fraud. The user could 610 place any desired URI in the request-URI, and indicate that the call 611 is an emergency services call. This could would then get special 612 treatment, but of course get routed to the target URI. The only way 613 to prevent this fraud is to consider an emergency call as any call 614 whose target is an emergency services URN. Thus, the service 615 identification here is based on the target of the request. When the 616 target is an emergency services URN, the request can get special 617 treatment. The user cannot lie, since there is no way to separately 618 indicate this is an emergency call, besides targeting it to an 619 emergency URN. 621 6.2.2. Systematic Interoperability Failures 623 How can inclusion of an explicit service identifier cause loss of 624 interoperability? When such an identifier is used to drive 625 functionality - such as dispatch on the phones, in the network, or 626 QoS authorization, it means that the wrong thing can happen when this 627 field is not set properly. Consider a user in domain 1, calling a 628 user in domain 2. Domain 1 provides the user with a service they 629 call "voice chat", which utilizes voice and IM for real time 630 conversation, driven off of a buddy list application on a PC. Domain 631 2 provides their users with a service they call, "text telephony", 632 which is a voice service on a wireless device that also allows the 633 user to send text messages. Consider the case where domain 1 and 634 domain 2 both have their user agents insert a service identifiers 635 into the request, and then use that to derive QoS authorization, 636 accounting, and invocation of applications in the network and in the 637 device. The user in domain 1 calls the user in domain 2, and inserts 638 the identifier "Voice Chat" into the INVITE. When this arrives at 639 the proxy in domain 2, the service is unknown. Consequently, the 640 request does not get the proper QoS treatment. When it gets 641 delivered to the User Agent of the user in domain 2, the user agent 642 does not see a service it understands, and so consequently, does not 643 know to dispatch the request to the right application software. 644 Thus, this call has completely failed, even when it could have 645 succeeded. This illustrates the following key point: 647 Explicit service identifiers, used between domains, cause 648 interoperability failures unless all interconnected domains agree 649 on exactly the same set of services and how to name them. 651 Of course, lack of service identifiers does not guarantee service 652 interoperability. However, SIP was built with rich tools for 653 negotiation of capabilities at a finely granular level. One user 654 agent can make a call using audio and video, but if the receiving UA 655 only supports audio, SIP allows both sides to negotiate down to the 656 lowest common denominator. Thus, communications is still provided. 657 As another example, if one agent initiates a Push-To-Talk session 658 (which is audio with a companion floor control mechanism), and the 659 other side only did regular audio, SIP would be able to negotiate 660 back down to a regular voice call. As another example, if a calling 661 user agent is running a high-definition video conferencing endpoint, 662 and the called user agent supports just a regular video endpoint, the 663 codecs themselves can negotiate downward to a lower rate, picture 664 size, and so on. Thus, interoperability is achieved. Interestingly, 665 the final "service" may no longer be well characterized by the 666 service identifier that would have been placed in the original 667 INVITE. For example, in this case, of the original INVITE from the 668 caller had contained the service identifier, "hi-fi video", but the 669 video gets negotiated down to a lower rate and picture size, the 670 service identifier is no longer really appropriate. 672 This illustrates another key aspect of the interoperability problem: 674 Usage of explicit service identifiers in the request will result 675 in inconsistencies with results of any SIP negotiation that might 676 otherwise be applied in the session. 678 Of course, there are cases where negotiating to a common baseline is 679 not what is desired. SIP provides tools (such as Require), to force 680 the call to fail unless the desired capabilities are supported. 681 However, this is not recommended as a general rule [4]. 683 When a service identifier becomes something that both proxies and the 684 user agent need to understand in order to properly treat a request, 685 it becomes equivalent to including a token in the Proxy-Require and 686 Require header fields of every single SIP request. The very reason 687 that RFC 4485 frowns upon usage of Require and certainly Proxy- 688 Require is the huge impact on interoperability it causes. It is for 689 this same reason that explicit service identifiers need to be 690 avoided: 692 The usage of explicit service identifiers is equivalent to the 693 usage of Require and Proxy-Require in the request, and has the 694 same negative impact on interoperability as those headers have. 696 6.2.3. Stifling of Service Innovation 698 The probability that any two pair of service providers end up with 699 the same set of services, and give them the same names, becomes 700 decreasingly small as the number of providers grow. Indeed, it would 701 almost certainly require a centralized authority to identify what the 702 services are, how they work, and what they are named. This, in turn, 703 leads to a requirement for complete homogeneity in order to 704 facilitate interconnection. Two providers cannot usefully 705 interconnect unless they agree on the set of services they are 706 offering to their customers, and each do the same thing. This is, in 707 a very real sense, anathema to the entire notion of SIP, which is 708 built on the idea that heterogeneous domains can interconnect and 709 still get interoperability: 711 Explicit service identifiers lead to a requirement for homogeneity 712 in service definitions across providers that interconnect, ruining 713 the very service heterogeneity that SIP was meant to bring. 715 Indeed, Metcalfe's law says that the value of a network grows with 716 the square of the number of participants. As a consequence of this, 717 once a bunch of large domains did get together, agree on a set of 718 services, and then a set of well-known identifiers for those 719 services, it would force other providers to also deploy the same 720 services, in order to obtain the value that interconnection brings. 721 This, in turn, will stifle innovation, and quickly force the set of 722 services in SIP to become fixed and never expand beyond the ones 723 initially agreed upon. This, too, is anathema to the very framework 724 on which SIP is built, and defeats much of the purpose of why 725 providers have chosen to deploy SIP in their own networks: 727 Metcalfe's law, when combined with explicit service identifiers, 728 will stifle the ability of providers to develop new SIP services, 729 since they have no hope of interconnecting them with anyone else. 731 Consider the following example. Several providers get together, and 732 standardize on a bunch of service identifiers. One of these uses 733 audio and video (say, "multimedia conversation"). This service is 734 successful, and is widely utilized. Endpoints look for this 735 identifier to dispatch calls to the right software applications, and 736 the network looks for it to invoke features, perform accouting, and 737 QoS. A new provider gets the idea for a new service, say, avatar- 738 enhanced multimedia conversation. In this service, there is audio 739 and video, but there is a third stream, which renders an avatar. A 740 caller can press buttons on their phone, to cause the avatar on the 741 other person's device to show emotion, make noise, and so on. This 742 is similar to the way emoticons are used today in IM. This service 743 is enabled by adding a third media stream (and consequently, third 744 m-line) to the SDP. 746 Normally, this service would be backwards compatible with a regular 747 audio-video endpoint, which would just reject the third media stream. 748 However, because a large network has been deployed that is expecting 749 to see the token, "multimedia conversation" and its associated audio+ 750 video service, it is nearly impossible for the new provider to roll 751 out this new service. If they did, it would fail completely, or 752 partially fail, when their users call users in other provider 753 domains. 755 7. Recommendations 757 From these principles, several recommendations can be made: 759 o Systems needing to perform service identification must examine 760 existing signaling constructs to identify the service based on 761 fields that exist within the signaling message already. 763 o If it appears that the signaling currently defined in standards is 764 not sufficient to identify the service, it may be due to lack of 765 sufficient signaling to convey what is needed, and new standards 766 work should be undertaken to fill this gap. 768 o The usage of an explicit service identifier does make sense as a 769 way to cache a decision made by a network element, for usage by 770 another network element within the same domain. However, service 771 identifiers are fundamentally useful within a particular domain, 772 and any such header must be stripped at a network boundary. 774 8. Security Considerations 776 Oftentimes, the service associated with a request is utilized for 777 purposes such as authorization, accounting, and billing. When 778 service identification is not done properly, the possibility of 779 network fraud is introduced. It is for this reason, discussed 780 extensively in Section 6.2.1, that the usage of explicit service 781 identifiers inserted by a UA is NOT RECOMMENDED. 783 9. IANA Considerations 785 There are no IANA considerations associated with this specification. 787 10. Acknowledgements 789 This document is based on discussions with Paul Kyzivat and Andrew 790 Allen, who contributed significantly to the ideas here. 792 11. References 794 11.1. Normative References 796 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 797 Levels", BCP 14, RFC 2119, March 1997. 799 [2] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., 800 Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: 801 Session Initiation Protocol", RFC 3261, June 2002. 803 11.2. Informational References 805 [3] Rosenberg, J., "A Data Model for Presence", RFC 4479, July 2006. 807 [4] Rosenberg, J. and H. Schulzrinne, "Guidelines for Authors of 808 Extensions to the Session Initiation Protocol (SIP)", RFC 4485, 809 May 2006. 811 [5] Campbell, B., "The Message Session Relay Protocol", 812 draft-ietf-simple-message-sessions-19 (work in progress), 813 February 2007. 815 [6] Rosen, B., "Framework for Emergency Calling in Internet 816 Multimedia", draft-ietf-ecrit-framework-00 (work in progress), 817 October 2006. 819 [7] Schulzrinne, H., "A Uniform Resource Name (URN) for Services", 820 draft-ietf-ecrit-service-urn-05 (work in progress), August 2006. 822 [8] Campbell, B., Rosenberg, J., Schulzrinne, H., Huitema, C., and 823 D. Gurle, "Session Initiation Protocol (SIP) Extension for 824 Instant Messaging", RFC 3428, December 2002. 826 Author's Address 828 Jonathan Rosenberg 829 Cisco 830 Edison, NJ 831 US 833 Email: jdrosen@cisco.com 834 URI: http://www.jdrosen.net 836 Full Copyright Statement 838 Copyright (C) The IETF Trust (2007). 840 This document is subject to the rights, licenses and restrictions 841 contained in BCP 78, and except as set forth therein, the authors 842 retain all their rights. 844 This document and the information contained herein are provided on an 845 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 846 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 847 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 848 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 849 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 850 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 852 Intellectual Property 854 The IETF takes no position regarding the validity or scope of any 855 Intellectual Property Rights or other rights that might be claimed to 856 pertain to the implementation or use of the technology described in 857 this document or the extent to which any license under such rights 858 might or might not be available; nor does it represent that it has 859 made any independent effort to identify any such rights. Information 860 on the procedures with respect to rights in RFC documents can be 861 found in BCP 78 and BCP 79. 863 Copies of IPR disclosures made to the IETF Secretariat and any 864 assurances of licenses to be made available, or the result of an 865 attempt made to obtain a general license or permission for the use of 866 such proprietary rights by implementers or users of this 867 specification can be obtained from the IETF on-line IPR repository at 868 http://www.ietf.org/ipr. 870 The IETF invites any interested party to bring to its attention any 871 copyrights, patents or patent applications, or other proprietary 872 rights that may cover technology that may be required to implement 873 this standard. Please address the information to the IETF at 874 ietf-ipr@ietf.org. 876 Acknowledgment 878 Funding for the RFC Editor function is provided by the IETF 879 Administrative Support Activity (IASA).