idnits 2.17.1 draft-ietf-sipping-service-identification-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 23, 2010) is 5142 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-13) exists of draft-ietf-ecrit-framework-10 Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SIPPING J. Rosenberg 3 Internet-Draft jdrosen.net 4 Intended status: Informational March 23, 2010 5 Expires: September 24, 2010 7 Identification of Communications Services in the Session Initiation 8 Protocol (SIP) 9 draft-ietf-sipping-service-identification-04 11 Abstract 13 This document considers the problem of service identification in the 14 Session Initiation Protocol (SIP). Service identification is the 15 process of determining the user-level use case that is driving the 16 signaling being utilized by the user agent. This document discusses 17 the uses of service identification, and outlines several 18 architectural principles behind the process. It identifies perils 19 when service identification is not done properly - including fraud, 20 interoperability failures and stifling of innovation. It then 21 outlines a set of reccomended practices for service identification. 23 Status of this Memo 25 This Internet-Draft is submitted to IETF in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF), its areas, and its working groups. Note that 30 other groups may also distribute working documents as Internet- 31 Drafts. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt. 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html. 44 This Internet-Draft will expire on September 24, 2010. 46 Copyright Notice 48 Copyright (c) 2010 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 64 2. Services and Service Identification . . . . . . . . . . . . . 5 65 3. Example Services . . . . . . . . . . . . . . . . . . . . . . . 7 66 3.1. IPTV vs. Multimedia . . . . . . . . . . . . . . . . . . . 7 67 3.2. Gaming vs. Voice Chat . . . . . . . . . . . . . . . . . . 7 68 3.3. Gaming vs. Voice Chat #2 . . . . . . . . . . . . . . . . . 8 69 3.4. Configuration vs. Pager Messaging . . . . . . . . . . . . 8 70 4. Using Service Identification . . . . . . . . . . . . . . . . . 9 71 4.1. Application Invocation in the User Agent . . . . . . . . . 9 72 4.2. Application Invocation in the Network . . . . . . . . . . 10 73 4.3. Network Quality of Service Authorization . . . . . . . . . 11 74 4.4. Service Authorization . . . . . . . . . . . . . . . . . . 11 75 4.5. Accounting and Billing . . . . . . . . . . . . . . . . . . 12 76 4.6. Negotiation of Service . . . . . . . . . . . . . . . . . . 12 77 4.7. Dispatch to Devices . . . . . . . . . . . . . . . . . . . 12 78 5. Key Principles of Service Identification . . . . . . . . . . . 12 79 5.1. Services are a By-Product of Signaling . . . . . . . . . . 13 80 5.2. Identical Signaling Produces Identical Services . . . . . 14 81 5.3. Do What I Say, not What I Mean . . . . . . . . . . . . . . 15 82 5.4. Declarative Service Identifiers are Redundant . . . . . . 16 83 5.5. URIs are Key for Differentiated Signaling . . . . . . . . 16 84 6. Perils of Declarative Service Identification . . . . . . . . . 17 85 6.1. Fraud . . . . . . . . . . . . . . . . . . . . . . . . . . 17 86 6.2. Systematic Interoperability Failures . . . . . . . . . . . 18 87 6.3. Stifling of Service Innovation . . . . . . . . . . . . . . 19 88 7. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 20 89 7.1. Use Derived Service Identification . . . . . . . . . . . . 20 90 7.2. Design for SIP's Negotiative Expressiveness . . . . . . . 21 91 7.3. Presence . . . . . . . . . . . . . . . . . . . . . . . . . 21 92 7.4. Intra-Domain . . . . . . . . . . . . . . . . . . . . . . . 22 93 7.5. Device Dispatch . . . . . . . . . . . . . . . . . . . . . 22 94 8. Security Considerations . . . . . . . . . . . . . . . . . . . 23 95 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 96 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23 97 11. Informational References . . . . . . . . . . . . . . . . . . . 23 98 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 24 100 1. Introduction 102 The Session Initiation Protocol (SIP) [RFC3261] defines mechanisms 103 for initiating and managing communications sessions between agents. 104 SIP allows for a broad array of session types between agents. It can 105 manage audio sessions, ranging from low bitrate voice-only up to 106 multi-channel hi fidelity music. It can manage video sessions, 107 ranging from small, "talking-head" style video chat, up to high 108 definition multipoint video conferencing, to low bandwidth user- 109 generated content, up to high definition movie and TV content. SIP 110 endpoints can be anything - adaptors that convert an old analog 111 telephone to Voice over IP (VoIP), dedicated hardphones, fancy 112 hardphones with rich displays and user entry capabilities, softphones 113 on a PC, buddylist and presence applications on a PC, dedicated 114 videoconferencing peripherals, and speakerphones. 116 This breadth of applicability is SIP's greatest asset, but it also 117 introduces numerous challenges. One of these is that, when an 118 endpoint generates a SIP INVITE for a session, or receives one, that 119 session can potentially be within the context of any number of 120 different use cases and endpoint types. For example, a SIP INVITE 121 with a single audio stream could represent a Push-To-Talk session 122 between mobile devices, a VoIP session between softphones, or audio- 123 based access to stored content on a server. 125 Each of these different use cases represents a different service. 126 The service is the user-visible use case that is driving the behavior 127 of the user-agents and servers in the SIP network. 129 The differing services possible with SIP has driven implementors and 130 system designers to seek techniques for service identification. 131 Service identification is the process of determining and/or signaling 132 the specific use case that is driving the signaling being generated 133 by a user agent. At first glance, this seems harmless and easy 134 enough. It is tempting to define a new header, "Service-ID", for 135 example, and have a user agent populate it with any number of well- 136 known tokens which define what the service is. It could then be 137 consumed for any number of purposes. A service identifier placed 138 into the signaling is called a service identifier. 140 Service identification and service identifiers, when used properly, 141 can be beneficial. However, when done improperly, service 142 identification can lead to fraud, systemic interoperability failures, 143 and a complete stifling of the innovation that SIP was meant to 144 achieve. The purpose of this document is to describe service 145 identification in more detail and describe how these problems arise. 147 Section 2 begins by defining a service and the service identification 148 problem. Section 3 gives some concrete examples of services and why 149 they can be challenging to identify. Section 4 explores the ways in 150 which a service identification can be utilized within a network. 151 Next, Section 5 discusses the key architectural principles of service 152 identification. Section 6 describes what declarative service 153 invocation is, and how it can lead to fraud, interoperability 154 failures, and stifling of service innovation. 156 Consequently, this document concludes that declarative service 157 identification - the process by which a user agent inserts a moniker 158 into a message that defines the desired service, separate from 159 explicit and well-defiend protocol mechanisms, is harmful. 161 Instead of performing declarative service identification, this 162 document recommends derived service identification, and gives several 163 reccomendations around it in Section 7: 165 1. The identity of a service should always be derived from the 166 explicit signaling in the protocol messages and other contextual 167 information, and never indicated by the user through a separate 168 identifier placed into the message. 170 2. The process of service identification based on signaling messages 171 must be designed to SIP's negotiative expressiveness, and 172 therefore handle heterogeneity and not assume a fixed set of use 173 cases. 175 3. Presence can help in providing URIs that can be utilized to 176 connect to specific services, thereby creating explicit 177 indications in the signaling which can be used to dervice a 178 service identity. 180 4. Service identities placed into signaling messages for the 181 purposes of caching the service identity are strictly for intra- 182 domain usage. 184 5. Device dispatch should be based on feature-tags which map to 185 well-defined SIP extensions and capabilities, and not abstract 186 service identifiers. 188 2. Services and Service Identification 190 The problem of identifying services within SIP is not a new one. The 191 problem has been considered extensively in the context of presence. 192 In particular, the presence data model for SIP [RFC4479] defines the 193 concept of a service as one of the core notions that presence 194 describes. Services are described in Section 3.3 of RFC 4479. 196 Essentially, the service is the user-visible use case that is driving 197 the behavior of the user-agents and servers in the SIP network. 198 Being user-visible means that there is a difference in user 199 experience between two services that are different. That user 200 experience can be part of the call, or outside of the call. Within a 201 call, the user experience can be based on different media types (an 202 audio call vs. a video chat), different content within a particular 203 media type (stored content, such as a movie or TV session), different 204 devices (a wireless device for "telephony" vs. a PC application for 205 "voice-chat"), different user interfaces (a buddy list view of voice 206 on a PC application vs. a software emulation of a hard phone), 207 different communities that can be accessed (voice chat with other 208 users that have the same voice chat client, vs. voice communications 209 with any endpoint on the PSTN), or different applications that are 210 invoked by the user (manually selecting a push-to-talk application 211 from a wireless phone vs. a telephony application). Outside of a 212 call, the difference in user experience can be a billing one (cheaper 213 for one service than other), a notification feature for one and not 214 another (for example, an IM that gets sent whenever a user makes a 215 call), and so on. 217 In some cases, there is very little difference in the underlying 218 technology that will support two different services, and in other 219 cases, there are big differences. However, for purposes of this 220 discussion, the key definition is that two services are distinct when 221 there is a perceived difference by the user in the two services. 223 This leads naturally to the desire to perform service identification. 224 Service identification is defined as the process of: 226 1. determination of the underlying service which is driving a 227 particular signaling exchange, 229 2. associating that service with a service identifier, and 231 3. attaching that moniker to a signaling message (typically a SIP 232 INVITE) 234 Once service identification is performed, the service identifier can 235 then be used for various purposes within the network. Service 236 identification can be done in the endpoints, in which case the UA 237 would insert the moniker directly into the signaling message based on 238 its awareness of the service. Or, it can be done within a server in 239 the network (such as a proxy), based on inspection of the SIP 240 message, or based on hints placed into the message by the user. 242 When service identification is performed entirely by inspecting the 243 signaling, this is called derived service identification. When it is 244 done based on knowledge known only by the invoking user agent, it is 245 called declarative service identification. Declarative service 246 identification can only be done in user agents, by definition. 248 3. Example Services 250 It is very useful to consider several example services, especially 251 ones that appear difficult to differentiate from each other. In 252 cases where it is hard to differentiate, service identification - and 253 in particular, declarative service identification - appears highly 254 attractive (and indeed, required). 256 3.1. IPTV vs. Multimedia 258 IP Television (IPTV) is the usage of IP networks to access 259 traditional television content, such as movies and shows. SIP can be 260 utilized to establish a session to a media server in a network, which 261 then serves up multimedia content and streams it as an audio and 262 video stream towards the client. Whether SIP is ideal for IPTV is, 263 in itself, a good question. However, such a discussion is outside 264 the scope of this document. 266 Consider multimedia conferencing. The user accesses a voice and 267 video conference at a conference server. The user might join in 268 listen-only mode, in which case the user receives audio and video 269 streams, but does not send. 271 These two services - IPTV and listen-only multimedia conferencing, 272 clearly appear as different services. They have different user 273 experiences and applications. A user is unlikely to ever be confused 274 about whether a session is IPTV or listen-only multimedia 275 conferencing. Indeed, they are likely to have different software 276 applications or endpoints for the two services. 278 However, these two services look remarkably alike based on the 279 signaling. Both utilize audio and video. Both could utilize the 280 same codecs. Both are unidirectional streams (from a server in the 281 network to the client). Thus, it would appear on the surface that 282 there is no way to differentiate them, based on inspection of the 283 signaling alone. 285 3.2. Gaming vs. Voice Chat 287 Consider an interactive game, played between two users from their 288 mobile devices. The game involves the users sending each other game 289 moves, using a messaging channel, in addition to voice. In another 290 service, users have a voice and IM chat conversation using a buddy 291 list application on their PC. 293 In both services, there are two media streams - audio and messaging. 294 The audio uses the same codecs. Both use the Message Session Relay 295 Protocol (MSRP) [RFC4975]. In both cases, the caller would send an 296 INVITE to the Address of Record (AOR) of the target user. However, 297 these represent fairly different services, in terms of user 298 experience. 300 3.3. Gaming vs. Voice Chat #2 302 Consider a variation on the example in Section 3.2. In this 303 variation, two users are playing an interactive game between their 304 phones. However, the game itself is set up and controlled using a 305 proprietary mechanism - not using SIP at all. However, the client 306 application allows the user to chat with their opponent. The chat 307 session is a simple voice session setup between the players. 309 Compare this with a basic telephone call between the two users. Both 310 involve a single audio session. Both use the same codecs. They 311 appear to be identical. However, different user experiences are 312 needed. For example, we desire traditional telephony features (such 313 as call forwarding and call screening) to be applied in the telephone 314 service, but not in the gaming chat service. 316 3.4. Configuration vs. Pager Messaging 318 The SIP MESSAGE method [RFC3428] provides a way to send one-shot 319 messages to a particular AOR. This specification is primarily aimed 320 at Short Message Service (SMS) style messaging, commonly found in 321 wireless phones. Receipt of a MESSAGE request would cause the 322 messaging application on a phone to launch, allowing the user to 323 browse message history and respond. 325 However, MESSAGE is sometimes used for the delivery of content to a 326 device for other purposes. For example, some providers use it to 327 deliver configuration updates, such as new phone settings or 328 parameters, or to indicate that a new version of firmware is 329 available. Though not designed for this purpose, MESSAGE gets used 330 since, in existing wireless networks, SMS is used for this purpose, 331 and MESSAGE is the SIP equivalent of SMS. 333 Consequently, the MESSAGE request sent to a phone can be for two 334 different services. One would require invocation of a messaging app, 335 whereas the other would be consumed by the software in the phone, 336 without any user interaction at all. 338 4. Using Service Identification 340 It is important to understand what the service identity would be 341 utilized for, if known. This section discusses the primary uses. 342 These are application invocation in user agents and the network, 343 Quality of Service authorization, service authorization, accounting 344 and billing, service negotiation, and device dispatch. 346 4.1. Application Invocation in the User Agent 348 In some of the examples above, there were multiple software 349 applications executing on the host. One common way of achieving this 350 is to utilize a common SIP user agent implementation that listens for 351 requests on a single port. When an incoming INVITE or MESSAGE 352 arrives, it must be delivered to the appropriate application 353 software. When each service is bound to a distinct software 354 application, it would seem that the service identity is needed to 355 dispatch the message to the appropriate piece of software. This is 356 shown in Figure 1. 358 +---------------------------------+ 359 | | 360 | +-------------+ +-------------+ | 361 | | UI | | UI | | 362 | +-------------+ +-------------+ | 363 | +-------------+ +-------------+ | 364 | | | | | | 365 | | Service 1 | | Service 2 | | 366 | | | | | | 367 | +-------------+ +-------------+ | 368 | +-----------------------------+ | 369 | | | | 370 | | SIP | | 371 | | Layer | | 372 | | | | 373 | +-----------------------------+ | 374 | | 375 +---------------------------------+ 377 Physical Device 379 Figure 1 381 The role of the SIP layer is to parse incoming messages, handle the 382 SIP state machinery for transactions and dialogs, and then dispatch 383 requests to the appropriate service. This software architecture is 384 analagous to the way web servers frequently work. An HTTP server 385 listens on port 80 for requests, and based on the HTTP Request-URI, 386 dispatches the request to a number of disparate applications. The 387 same is happening here. For the example services in Section 3.2, an 388 incoming INVITE for the gaming service would be delivered to the 389 gaming application software. An incoming INVITE for the voice chat 390 service would be delivered to the voice chat application software. 391 The example in Section 3.3 is similar. For the examples in 392 Section 3.4, a MESSAGE request for user to user messaging would be 393 delivered to the messaging or SMS app, and a MESSAGE request 394 containing configuration data would be delivered to a configuration 395 update application. 397 Unlike the web, however, in all three use cases, the user initiating 398 communications has (or appears to have - more below) only a single 399 identifier for the recipient - their AOR. Consequently, the SIP 400 Request-URI cannot be used for dispatching, as it is identical in all 401 three cases. 403 4.2. Application Invocation in the Network 405 Another usage of a service identifier would be to cause servers in 406 the SIP network to provide additional processing, based on the 407 service. For example, an INVITE issued by a user agent for IPTV 408 would pass through a server that does some kind of content rights 409 management, authorizing whether the user is allowed to access that 410 content. On the other hand, an INVITE issued by a user for 411 multimedia conferencing would pass through a server providing 412 "traditional" telephony features, such as outbound call screening and 413 call recording. It would make no sense for the INVITE associated 414 with IPTV to have outbound call screening and call recording applied, 415 and it would make no sense for the multimedia conferencing INVITE to 416 be processed by the content rights management server. Indeed, in 417 these cases, it's not just an efficiency issue (invoking servers when 418 not needed), but rather, truly incorrect behavior can occur. For 419 example, if an outbound call screening application is set to block 420 outbound calls to everything except for the phone numbers of friends 421 and family, an IPTV request that gets processed by such a server 422 would be blocked (as it's not targeted to the AOR of a friend or 423 family member). This would block a user's attempt to access IPTV 424 services, when that was not the goal at all. 426 Similarly, a MESSAGE request from Section 3.4 might need to pass 427 through a message server for filtering when it is associated with 428 chat, but not when it is associated with config. Consider a filter 429 which gets applied to MESSAGE requests, and that filter runs in a 430 server in the network. The filter operation prevents user Joe from 431 sending messages to user Bob that contain the words "stock" or 432 "purchase", due to some regulations that disallow Joe and Bob from 433 discussing stock trading. However, a MESSAGE for configuration 434 purposes might contain an XML document that uses the token "stock" as 435 some kind of attribute. This configuration update would be discarded 436 by the filtering server, when it should not have been. 438 4.3. Network Quality of Service Authorization 440 The IP network can provide differing levels of Quality of Service 441 (QoS) to IP packets. This service can include guaranteed throughput, 442 latency, or loss characteristics. Typically, the user agent will 443 make some kind of QoS request, either using explicit signaling 444 protocols (such as RSVP [RFC2205]) or through marking of Diffserv 445 value in packets. The network will need to make a policy decision 446 based on whether these QoS treatments are authorized or not. One 447 common authorization policy is to check if the user has invoked a 448 service using SIP that they are authorized to invoke, and that this 449 service requires the level of QoS treatment the user has requested. 451 For example, consider IPTV and multimedia conferencing as described 452 in Section 3.1. IPTV is a non-real time service. Consequently, 453 media traffic for IPTV would be authorized for bandwidth guarantees, 454 but not for latency or loss guarantees. On the other hand, 455 multimedia conferencing is real time. Its traffic would require 456 bandwidth, loss and latency guarantees from the network. 458 Consequently, if a user should make an RSVP reservation for a media 459 stream, and ask for latency guarantees for that stream, the network 460 would like to be able to authorize it if the service was multimedia 461 conferencing, but not if it was IPTV. This would require the server 462 performing the QoS authorization to know the service associated with 463 the INVITE that set up the session. 465 4.4. Service Authorization 467 Frequently, a network administrator will want to authorize whether a 468 user is allowed to invoke a particular service. Not all users will 469 be authorized to use all services that are provided. For example, a 470 user may not be authorized to access IPTV services, whereas they are 471 authorized to utilize multimedia processing. A user might not be 472 able to utilize a multiplayer gaming service, whereas they are 473 authorized to utilize voice chat services. 475 Consequently, when an INVITE arrives at a server in the network, the 476 server will need to determine what the requested service is, so that 477 the server can make an authorization decision. 479 4.5. Accounting and Billing 481 Service authorization and accounting/billing go hand in hand. One of 482 the primary reasons for authorizing that a user can utilize a service 483 is that they are being billed differently based on the type of 484 service. Consequently, one of the goals of a service identity is to 485 be able to include it in accounting records, so that the appropriate 486 billing model can be applied. 488 For example, in the case of IPTV, a service provider can bill based 489 on the content (US $5 per movie, perhaps), whereas for multimedia 490 conferencing, they can bill by the minute. This requires the 491 accounting streams to indicate which service was invoked for the 492 particular session. 494 4.6. Negotiation of Service 496 In some cases, when the caller initiates a session, they don't 497 actually know which service will be utilized. Rather, they might 498 like to offer up all of the services they have available to the 499 called party, and then let the called party decide, or let the system 500 make a decision based on overlapping service capabilities. 502 As an example, a user can do both the game and the voice chat service 503 of Section 3.2. They initiate a session to a target AOR, but the 504 devices used by that user can only support voice chat. The called 505 device returns, in its call acceptance, an indication that only voice 506 chat can be used. Consequently, voice chat gets utilized for the 507 session. 509 4.7. Dispatch to Devices 511 When a user has multiple devices, each with varying capabilities in 512 terms of service, it is useful to dispatch an incoming request to the 513 right device based on whether the device can support the service that 514 has been requested. 516 For example, if a user initiates a gaming session with voice chat, 517 and the target user has two devices - one that can support the gaming 518 service, and the other that cannot, the INVITE should be dispatched 519 to the device which supports the gaming session. 521 5. Key Principles of Service Identification 523 In this section, we describe several key principles of service 524 identification: 526 1. Services are a by-product of signaling 528 2. Identical signaling produces identical services 530 3. Declarative service identification is an example of Do-What-I- 531 Mean (DWIM) 533 4. Declarative service identifiers are redundant 535 5. URIs are a key mechanism for producing differentiated signaling 537 5.1. Services are a By-Product of Signaling 539 Declarative service identification - the addition of a service 540 identifier by clients in order to inform other entities what the 541 service is - is a very compelling solution to solving the use cases 542 described above. It provides a clear way for each of the use cases 543 to be differentiated. On the other hand, derived service 544 identification appears "hard" since the signaling appears to be the 545 same for these different services. 547 Declarative service identification misses a key point, which cannot 548 be stressed enough, and which represents the core architectural 549 principle to be understood here: 551 A service is the by-product of the signaling and the context 552 around it (the user profile, time-of-day and so on) - the effects 553 of the signaling message once launched into the network. The 554 service identity is therefore always derivable from the signaling 555 and its context without additional identifiers. In other words, 556 derived service identification is always possible when signaling 557 is being properly handled. 559 When a user sends an INVITE request to the network, and targets that 560 request at an IPTV server, and includes SDP for audio and video 561 streaming, the *result* of sending such an INVITE is that an IPTV 562 session occurs. The entire purpose of the INVITE is to establish 563 such a session, and therefore, invoke the service. Thus, a service 564 is not something that is different from the rest of the signaling 565 message. A service is what the user gets after the network and other 566 user agents have processed a signaling message. 568 It may seem that delayed offers (SIP INVITE requests that lack SDP) 569 make it impossible to perform derived service identification. After 570 all, in some of the cases above, the differentiation was done using 571 the SDP in the request. What if its not there? The answer is simple 572 - if its not there, and the SDP is being offered by the called party, 573 you cannot in fact know the service at the time of the INVITE. Thats 574 the whole point of delayed offer - to give the called party the 575 chance to offer up what it wants for the session. In cases where 576 service identification is needed at request time, delayed offer 577 cannot be used. 579 5.2. Identical Signaling Produces Identical Services 581 This principle is a natural conclusion of the previous assertion. If 582 a service is the byproduct of signaling, how can a user have 583 different experiences and different services when the signaling 584 message is the same? They cannot. 586 But how can that be? From the examples in Section 3, it would seem 587 that there are services which are different, but have identical 588 signaling. If we hold true to the assertion, there is in fact only 589 one logical conclusion: 591 If two services are different, but their signaling appears to be 592 the same, it is because one or more of the following is true: 594 1. there is in fact something different that has been overlooked 596 2. something has been implied from the signaling which should 597 have been signaled explicitly 599 3. the signaling mechanism should be changed so that there is, in 600 fact, something that is different 602 To illustrate this, let us take each of the example services in 603 Section 3 and investigate whether there is, or should be, something 604 different in the signaling in each case. 606 IPTV vs. Multimedia Conferencing: The two services in Section 3.1 607 appear to have identical signaling. They both involve audio and 608 video streams, both of which are unidirectional. Both might 609 utilize the same codecs. However, there is another important 610 difference in the signaling - the target URI. In the case of 611 IPTV, the request is targeted at a media server or to a particular 612 piece of content to be viewed. In the case of multimedia 613 conferencing, the target is a conference server. The 614 administrator of the domain can therefore examine the two Request- 615 URI, and figure out whether it is targeted for a conference server 616 or a content server, and use that to derive the service associated 617 with the request. 619 Gaming vs. Voice Chat: Though both sessions involve MSRP and voice, 620 and both are targeted to the same AOR of the called user, there is 621 a difference. The MSRP messages for the gaming session carry 622 content which is game specific, whereas the MSRP messages for the 623 voice chat are just regular text, meant for rendering to a user. 624 Thus, the MSRP session in the SDP will indicate the specific 625 content type that MSRP is carrying, and this type will differ in 626 both cases. Even if the game moves look like text, since they are 627 being consumed by an automata there is an underlying schema that 628 dictates their content, and therefore, this schema represents the 629 actual content type that should be signaled. 631 Gaming vs. Voice CHat #2: In this case, both sessions involve only 632 voice, and both are targeted at the same AOR. Indeed, there truly 633 is nothing different - if indeed the signaling works this way. 634 However, there is an alternative mechanism for performing the 635 signaling. For the gaming session, the proprietary protocol can 636 be used to exchange a URI that can be used to identify the voice 637 chat function on the phone that is associated with the game (for 638 example, a GRUU can be used [RFC5627]). Indeed, the gaming chat 639 is not targeting the USER - its targeting the gaming instance on 640 the phone. Thus, if a special GRUU is used for the gaming chat, 641 this makes the signaling different between these two services. 643 Configuration vs. Pager Messaging: Just as in the case of gaming vs. 644 voice chat, the content type of the messages differentiates the 645 service that occurs as a consequence of the messages. 647 5.3. Do What I Say, not What I Mean 649 "Do What I Mean", abbreviated as DWIM, is a concept in computer 650 science. It is sometimes used to describe a function which tries to 651 intelligently guess at what the user intended. It is contrast to "Do 652 What I Say", or DWIS, which describes a function that behaves 653 concretely based on the inputs provided. Systems built on the DWIM 654 concept can have unexpected behaviors because they are driven by 655 unstated rules. 657 Declarative service identification is an example of DWIM. The 658 service identifier has no well-defined impact on the state machinery 659 or protocols in the system; it has various side-effects based on an 660 assumption of what is meant by the service identifier. Derived 661 service identification, on the other hand, is an expression of the 662 principle of DWIS - the behavior of the system is based entirely on 663 the specifics of the protocol and are well defined by the protocol 664 specification. The service identifier is just a short hand for 665 summarizing things that are well defined by signaling. 667 As a litmus test to differentiate the two cases, consider the 668 following question. If a request contained a service identifier, and 669 that request were processed by a domain which didn't understand the 670 concept of service identifiers at all, would the request be rejected 671 if that service were not supported, or would it complete but do the 672 wrong thing? If it is the latter case, its DWIM. If its the former, 673 its DWIS. 675 5.4. Declarative Service Identifiers are Redundant 677 Because a declarative service identifier is, by definition, inside of 678 the signaling message, and because the signaling itself completely 679 defines the behavior of the service, another natural conclusion is 680 that a declarative service identifier is redundant with the signaling 681 itself. It says nothing that could not or should not otherwise be 682 derived from examination of the signaling. 684 5.5. URIs are Key for Differentiated Signaling 686 In the IPTV example and in the second gaming example, it was 687 ultimately the Request-URI that was (or should be) different between 688 the two services. This is important. In many cases where services 689 appear the same, it is because the resource which is being targeted 690 is not, in fact, the user. Rather, it is a resource that is linked 691 with the user. This resource might be an instance of a software 692 application on the particular device of a user, or a resource in the 693 network which acts on behalf of the user. 695 The Request-URI is an infinitely large namespace for identifying 696 these resources. It is an ideal mechanism for providing 697 differentiation when there would otherwise be none. 699 Returning again to the example in Section 3.3, we can see that it 700 does make more sense to target the gaming chat session at a software 701 instance on the user's phone, rather than at the user themselves. 702 The gaming chat session should really only go to the phone on which 703 the user is playing the game. The software instance does indeed live 704 only on that phone, whereas the user themselves can be contacted many 705 ways. We don't want telephony features invoked for the gaming chat 706 session because those features only make sense when someone is trying 707 to communicate with the USER. When someone is trying to communicate 708 with a software instance that acts on behalf of the user, a different 709 set of rules apply since the target of the request is completely 710 different. 712 6. Perils of Declarative Service Identification 714 Based on these principles, several perils of declarative service 715 identification can be described. They are: 717 1. Declarative service identification can be used for fraud 719 2. Declarative service identification can hurt interoperability 721 3. Declarative service identification can stifle service innovation 723 6.1. Fraud 725 Declarative service identification can lead to fraud. If a provider 726 uses the service identifier for billing and accounting purposes, or 727 for authorization purposes, it opens an avenue for attack. The user 728 can construct the signaling message so that its actual effect (which 729 is the service the user will receive), is what the user desires, but 730 the user places a service identifier into the request (which is what 731 is used for billing and authorization) that identifies a cheaper 732 service, or one that the user is authorized to receive. In such a 733 case, the user will be billed for something they did not receive. 735 If, however, the domain administrator derived the service identifier 736 from the signaling itself (derived service identification), the user 737 cannot lie. If they did lie, they wouldn't get the desired service. 739 Consider the example of IPTV vs. multimedia conferencing. If 740 multimedia conferencing is cheaper, the user could send an INVITE for 741 an IPTV session, but include a service identifier which indicates 742 multimedia conferencing. The user gets the service associated with 743 IPTV, but at the cost of multimedia conferencing. 745 This same principle shows up in other places. For example, in the 746 identification of an emergency services call 747 [I-D.ietf-ecrit-framework]. It is desirable to give emergency 748 services calls special treatment, such as being free, authorized even 749 when the user cannot otherwise make calls, and to give them priority. 750 If emergency calls where indicated through something other than the 751 target of the call being an emergency services URN [RFC5031], it 752 would open an avenue for fraud. The user could place any desired URI 753 in the request-URI, and indicate separately, through a declarative 754 identifier, that the call is an emergency services call. This could 755 would then get special treatment, but of course get routed to the 756 target URI. The only way to prevent this fraud is to consider an 757 emergency call as any call whose target is an emergency services URN. 758 Thus, the service identification here is based on the target of the 759 request. When the target is an emergency services URN, the request 760 can get special treatment. The user cannot lie, since there is no 761 way to separately indicate this is an emergency call, besides 762 targeting it to an emergency URN. 764 6.2. Systematic Interoperability Failures 766 How can declarative service identification cause loss of 767 interoperability? When an identifier is used to drive functionality 768 - such as dispatch on the phones, in the network, or QoS 769 authorization, it means that the wrong thing can happen when this 770 field is not set properly. Consider a user in domain 1, calling a 771 user in domain 2. Domain 1 provides the user with a service they 772 call "voice chat", which utilizes voice and IM for real time 773 conversation, driven off of a buddy list application on a PC. Domain 774 2 provides their users with a service they call, "text telephony", 775 which is a voice service on a wireless device that also allows the 776 user to send text messages. Consider the case where domain 1 and 777 domain 2 both have their user agents insert a service identifiers 778 into the request, and then use that to perform QoS authorization, 779 accounting, and invocation of applications in the network and in the 780 device. The user in domain 1 calls the user in domain 2, and inserts 781 the identifier "Voice Chat" into the INVITE. When this arrives at 782 the server in domain 2, the service identifier is unknown. 783 Consequently, the request does not get the proper QoS treatment, even 784 if the call itself will succeed. 786 If, on the other hand, derived service identification were used, the 787 service identifier could be removed by domain 2, and then recomputed 788 based on the signaling to match its own notion of services. In this 789 case, domain 2 could derive the "text telephony" identifier, and the 790 request completes successfully. 792 Declarative service identification, used between domains, causes 793 interoperability failures unless all interconnected domains agree on 794 exactly the same set of services and how to name them. Of course, 795 lack of service identifiers does not guarantee service 796 interoperability. However, SIP was built with rich tools for 797 negotiation of capabilities at a finely granular level. One user 798 agent can make a call using audio and video, but if the receiving UA 799 only supports audio, SIP allows both sides to negotiate down to the 800 lowest common denominator. Thus, communications is still provided. 801 As another example, if one agent initiates a Push-To-Talk session 802 (which is audio with a companion floor control mechanism), and the 803 other side only did regular audio, SIP would be able to negotiate 804 back down to a regular voice call. As another example, if a calling 805 user agent is running a high-definition video conferencing endpoint, 806 and the called user agent supports just a regular video endpoint, the 807 codecs themselves can negotiate downward to a lower rate, picture 808 size, and so on. Thus, interoperability is achieved. Interestingly, 809 the final "service" may no longer be well characterized by the 810 service identifier that would have been placed in the original 811 INVITE. For example, in this case, of the original INVITE from the 812 caller had contained the service identifier, "hi-fi video", but the 813 video gets negotiated down to a lower rate and picture size, the 814 service identifier is no longer really appropriate. That is why 815 services need to be derived by signaling - because the signaling 816 itself provides negotiation and interoperability between different 817 domains. 819 This illustrates another key aspect of the interoperability problem. 820 Declarative service identification will result in inconsistencies 821 between its service identifiers and the results of any SIP 822 negotiation that might otherwise be applied in the session. 824 When a service identifier becomes something that both proxies and the 825 user agent need to understand in order to properly treat a request 826 (which is the case for declarative service identification), it 827 becomes equivalent to including a token in the Proxy-Require and 828 Require header fields of every single SIP request. The very reason 829 that [RFC4485] frowns upon usage of Require and certainly Proxy- 830 Require is the huge impact on interoperability it causes. It is for 831 this same reason that declarative service identification needs to be 832 avoided. 834 6.3. Stifling of Service Innovation 836 The probability that any two pair of service providers end up with 837 the same set of services, and give them the same names, becomes 838 decreasingly small as the number of providers grow. Indeed, it would 839 almost certainly require a centralized authority to identify what the 840 services are, how they work, and what they are named. This, in turn, 841 leads to a requirement for complete homogeneity in order to 842 facilitate interconnection. Two providers cannot usefully 843 interconnect unless they agree on the set of services they are 844 offering to their customers, and each do the same thing. This is 845 because each provider has become dependent on inclusion of the proper 846 service identifier in the request, in order for the overall treatment 847 of the request to proceed correctly. This is, in a very real sense, 848 anathema to the entire notion of SIP, which is built on the idea that 849 heterogeneous domains can interconnect and still get 850 interoperability. 852 Declarative service identification leads to a requirement for 853 homogeneity in service definitions across providers that 854 interconnect, ruining the very service heterogeneity that SIP was 855 meant to bring. 857 Indeed, Metcalfe's law says that the value of a network grows with 858 the square of the number of participants. As a consequence of this, 859 once a bunch of large domains did get together, agree on a set of 860 services, and then a set of well-known identifiers for those 861 services, it would force other providers to also deploy the same 862 services, in order to obtain the value that interconnection brings. 863 This, in turn, will stifle innovation, and quickly force the set of 864 services in SIP to become fixed and never expand beyond the ones 865 initially agreed upon. This, too, is anathema to the very framework 866 on which SIP is built, and defeats much of the purpose of why 867 providers have chosen to deploy SIP in their own networks: 869 Consider the following example. Several providers get together, and 870 standardize on a bunch of service identifiers. One of these uses 871 audio and video (say, "multimedia conversation"). This service is 872 successful, and is widely utilized. Endpoints look for this 873 identifier to dispatch calls to the right software applications, and 874 the network looks for it to invoke features, perform accouting, and 875 QoS. A new provider gets the idea for a new service, say, avatar- 876 enhanced multimedia conversation. In this service, there is audio 877 and video, but there is a third stream, which renders an avatar. A 878 caller can press buttons on their phone, to cause the avatar on the 879 other person's device to show emotion, make noise, and so on. This 880 is similar to the way emoticons are used today in IM. This service 881 is enabled by adding a third media stream (and consequently, third 882 m-line) to the SDP. 884 Normally, this service would be backwards compatible with a regular 885 audio-video endpoint, which would just reject the third media stream. 886 However, because a large network has been deployed that is expecting 887 to see the token, "multimedia conversation" and its associated audio+ 888 video service, it is nearly impossible for the new provider to roll 889 out this new service. If they did, it would fail completely, or 890 partially fail, when their users call users in other provider 891 domains. 893 7. Recommendations 895 From these principles, several recommendations can be made. 897 7.1. Use Derived Service Identification 899 Derived service identification - where an identifier for a service is 900 obtained by inspection of the signaling and other contextual data 901 (such as subscriber profile) is reasonable, and when done properly, 902 does not lead to the perils described above. However, declarative 903 service identification - where user agents indicate what the service 904 is, separate from the rest of the signaling - leads to the perils 905 described above. 907 If it appears that the signaling currently defined in standards is 908 not sufficient to identify the service, it may be due to lack of 909 sufficient signaling to convey what is needed, or may be because 910 request URIs should be used for differentiation and they are not 911 being used. By applying the litmus tests described in Section 5.3, 912 network designers can determine if the system is attempting to 913 perform declarative service identification or not. 915 7.2. Design for SIP's Negotiative Expressiveness 917 One of SIP's key strengths is its ability to negotiate a common view 918 of a session between participants. This means that the service that 919 is ultimately received can vary wildly, depending on the type of 920 endpoints in the call and their capabilities. Indeed, this fact 921 becomes even more evident when calls are set up between domains. 923 As such, when performing derived service identification, domains 924 should be aware that sessions may arrive from different networks and 925 different endpoints. Consequently, the service identification 926 algorithm must be complete - meaning it computes the best answer for 927 any possible signaling message that might be received and any session 928 which might be set up. 930 In a homogeneous environment, the process of service identification 931 is easy. The service provider will know the set of services they are 932 providing, and based on the specific calls flows for each specific 933 service, can construct rules to differentiate one service from 934 another. However, when different providers interconnect, or when 935 different endpoitns are introduced, assumptions about what services 936 are used, and how they are signaled, no longer apply. To provide the 937 best user experience possible, a provider doing service 938 identification needs to perform a 'best-match' operation, such that 939 any legal SIP signaling - not just the specific call flows running 940 within their own network amongst a limited set of endpoints - is 941 mapped to the appropriate service. 943 7.3. Presence 945 Presence can help a great deal with providing unique URIs for 946 different services. When a user wishes to contact another user, and 947 knows only the AOR for the target (which is usually the case), the 948 user can fetch the presence document for the target. That document, 949 in turn, can contain numerous service URI for contacting the target 950 with different services. Those URI can then be used in the Request- 951 URI for differentiation. When possible, this is the best solution to 952 the problem. 954 7.4. Intra-Domain 956 Service identifiers themselves are not bad; derived service 957 identification allows each domain to cache the results of the service 958 identification process for usage by another network element within 959 the same domain. However, service identifiers are fundamentally 960 useful within a particular domain, and any such header must be 961 stripped at a network boundary. Consequently, the process of service 962 identification and their associated service identifiers is always an 963 intra-domain operation. 965 7.5. Device Dispatch 967 Device dispatch should be done following the principles of [RFC3841], 968 using implicit preferences based on the signaling. For example, 969 [RFC5688] defines a new UA capability that can be used to dispatch 970 requests based on different types of application media streams. 972 However, it is is a mistake to try and use a service identifier as a 973 UA capability. Consider a service called "multimedia telephony" 974 which adds video to the existing PSTN experience. A user has two 975 devices, one of which is used for multimedia telephony, and the other 976 is used strictly for a voice-assisted game. It is tempting to have 977 the telephony device include a UA capability [RFC3840] called 978 "multimedia telephony" in its registration. Then, a calling 979 multimedia telephony device can then include the Accept-Contact 980 header field [RFC3841] containing this feature tag. The proxy 981 serving the called party, applying the basic algorithms of [RFC3841] 982 will correctly route the call to the terminating device. 984 However, if the calling party is not within the same domain, and the 985 calling domain does not know about or use this feature tag, there 986 will be no Accept-Contact header field, even if the calling party was 987 using a service that is a good match for 'multimedia telephony'. In 988 such a case, the call may be delivered to both devices, yielded a 989 poorer user experience. Thats because device dispatch was done using 990 declarative service identification. 992 The best way to avoid this problem is to use feature tags which can 993 be matched to well defined signaling features - media types, required 994 SIP extensions and so on. In particular, the golden rule is that the 995 granularity of feature tags must be equivalent to the granularity of 996 individual features that can be signaled in SIP. 998 8. Security Considerations 1000 Oftentimes, the service associated with a request is utilized for 1001 purposes such as authorization, accounting, and billing. When 1002 service identification is not done properly, the possibility of 1003 unauthorized service use and network fraud is introduced. It is for 1004 this reason, discussed extensively in Section 6.1, that the usage of 1005 declarative service identifiers inserted by a UA is not recommended. 1007 9. IANA Considerations 1009 There are no IANA considerations associated with this specification. 1011 10. Acknowledgements 1013 This document is based on discussions with Paul Kyzivat and Andrew 1014 Allen, who contributed significantly to the ideas here. Much of the 1015 content in this draft is a result of discussions amongst participants 1016 in the SIPPING mailing list, including Dean Willis, Tom Taylor, Eric 1017 Burger, Dale Worley, Christer Holmberg, and John Elwell, amongst many 1018 others. Thanks to Spencer Dawkins, Tolga Asveren, Mahesh Anjanappa 1019 and Claudio Allochio for reviews of this document. 1021 11. Informational References 1023 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1024 A., Peterson, J., Sparks, R., Handley, M., and E. 1025 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1026 June 2002. 1028 [RFC4479] Rosenberg, J., "A Data Model for Presence", RFC 4479, 1029 July 2006. 1031 [RFC4485] Rosenberg, J. and H. Schulzrinne, "Guidelines for Authors 1032 of Extensions to the Session Initiation Protocol (SIP)", 1033 RFC 4485, May 2006. 1035 [RFC4975] Campbell, B., Mahy, R., and C. Jennings, "The Message 1036 Session Relay Protocol (MSRP)", RFC 4975, September 2007. 1038 [RFC5031] Schulzrinne, H., "A Uniform Resource Name (URN) for 1039 Emergency and Other Well-Known Services", RFC 5031, 1040 January 2008. 1042 [I-D.ietf-ecrit-framework] 1043 Rosen, B., Schulzrinne, H., Polk, J., and A. Newton, 1044 "Framework for Emergency Calling using Internet 1045 Multimedia", draft-ietf-ecrit-framework-10 (work in 1046 progress), July 2009. 1048 [RFC5627] Rosenberg, J., "Obtaining and Using Globally Routable User 1049 Agent URIs (GRUUs) in the Session Initiation Protocol 1050 (SIP)", RFC 5627, October 2009. 1052 [RFC5688] Rosenberg, J., "A Session Initiation Protocol (SIP) Media 1053 Feature Tag for MIME Application Subtypes", RFC 5688, 1054 January 2010. 1056 [RFC3428] Campbell, B., Rosenberg, J., Schulzrinne, H., Huitema, C., 1057 and D. Gurle, "Session Initiation Protocol (SIP) Extension 1058 for Instant Messaging", RFC 3428, December 2002. 1060 [RFC3841] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller 1061 Preferences for the Session Initiation Protocol (SIP)", 1062 RFC 3841, August 2004. 1064 [RFC3840] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, 1065 "Indicating User Agent Capabilities in the Session 1066 Initiation Protocol (SIP)", RFC 3840, August 2004. 1068 [RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S. 1069 Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 1070 Functional Specification", RFC 2205, September 1997. 1072 Author's Address 1074 Jonathan Rosenberg 1075 jdrosen.net 1076 Monmouth, NJ 1077 US 1079 Email: jdrosen@jdrosen.net 1080 URI: http://www.jdrosen.net