SAFE summary ------------ SAFE BOF Dec 3, 2007, 0900 - 1130 Chairs: Colin Perkins (csp@csperkins.org), Markus Isomäki (markus.isomaki@nokia.com) Summary by Markus Isomäki BOF proposal is available at: http://www3.ietf.org/proceedings/07dec/agenda/safe.txt The purpose of SAFE BOF was to discuss two things: 1. Is there interest in the IETF to work on solutions to reduce keep-alive traffic for NAT and firewall traversal. 2. Is the newly-proposed technique for using STUN to discover, query and control firewalls and NATs, a reasonable approach to pursue in this space. The intent of the BOF was not to form a new WG at this point, but to give guidance how to continue work on this area. Scope/properties of STUN control currently is that it works for UDP-only, supports nested NATs and firewalls, detects non-supporting NATs and fails safely with them, and operates on one binding/pinhole at a time using transport source address for authorization. It's main purpose is to query and adjust the refresh period for the binding/pinhole. The main discussion topics that were raised during the BOF were: - Is there a problem with keepalives: It was commented that 20-30 sec keepalive for UDP will dramatically affect battery lifetime e.g. in a device with WCDMA radio. Reference to a paper was later sent to SAFE and BEHAVE lists. - Generality: The initial scope is only UDP. Should this work also for TCP? TCP got a lot of support as it also has unpredictable timeouts in middleboxes. On the other hand UDP-only solution would be simpler and easier to deploy. IPv4/IPv6 translators were also brought into discussion, but declared out of scope. - Future path: If we adopt this protocol now, is there a path to the future problems we want to solve? Arguments for incremental deployment vs. generic solution. - Which applications benefit from STUN/UDP solution: SIP/UDP, real-time media, IPSec/UDP, Teredo, Mobile IP. Some of these don't use STUN. Some, like SIP could run over TCP as well. In general the benefit is biggest for "push" type of applications that need to maintain connectivity over long periods with little actual traffic to send/receive. - Security model: In STUN-control the middlebox is controlled in-band with the application traffic and the transport source addresss is used for authorization. No bulk provisioning of mappings/pinholes is possible. This simplifies the security properties compared to third-party/out-of-band control models. - Scope and difference to existing mechanisms: There was a presentation on survey for existing mechanisms. This caused a lot of clarifying questions on how the existing protocols were classified. - What is the incentive for the vendors to implement this: None of the previous control protocols have been deployed. STUN-control has some simplicity and incremental deployment benefits, at least for those applications that already use STUN. - Problem with overlapping address spaces and nested NATs: Can STUN-control deal with this? A couple of proposals exist, but are complicated. This is a generic problem with NATs, should it be solved regardless of STUN-control. In the end of the session several polls were taken: - Question: "Are some functional requirements (for avoiding frequent keepalive) or deployment considerations left unsatisfied by existing protocols?" Yes/No Result: Majority agrees, but a substantial minority disagrees - Question: "Should the IETF try to solve the problem?" Yes/No Result: Clear majority support. - Question: "Is the NAT control STUN usage a reasonable approach to NAT control, addressing the above requirements?" Yes/Maybe/No (At this point many people commented that this is not a clear question, as some people would also want to support TCP etc.) Result: Seems to be weighted toward "yes" and "maybe", "no" slightly quieter. - Question: "Given that we have a number of proposals in this space, has our understanding of this problem space changed enough that we can build something that people actually will deploy?" Yes/No Result: Response judged 1/3 "yes", 2/3 "no". SAFE minutes ------------ Notes, SAFE Dec 3 2007 0900 recorded by Dean Willis Chaired by Colin Perkins, Markus Isomaki --- Topic: Agenda Bash slides presented by chairs Agenda accepted as proposed by chairs. "Note Well" statement and IPR notice reviewed. Chairs present slides. NOTE: THIS IS NOT A "WORKING GROUP FORMING BOF" -- we are attempting to decide whether there is a need for work in this area. We are not discussing charters or any formative process issues in this meeting. --- Topic: Problem Statement and Scope led by Dan Wing Slides presented Problem: current NAT traversal approaches require keepalives. This produces traffic and power consumption issues, especially for wireless battery powered devices. Scope: Create a NAT control technique that solves keepalive and nesting, detects and fails safely with non-upgraded NATs, and uses source transport address for authorization. Discussion follows . .. . Question: Need to clarify relationship between determining and adjusting of a NAT keepalive interval. Do we need to do both, or will a system do just one? Concluded that the critical piece is determination. We would also like to be able to do adjustment of the timing. Question: It seems that constraining the solutions space to use source transport address might be excessive. Do we want to constrain to this level up front? Are there other possible techniques that should be used? Response by JDR: The idea here is to emulate what NATs already do, which is 5-tuple address based , which we understand well. Things that do protocol inspection or out-of-path controls raise lots of security and deployability issues. Suggestion: Rephrase as requirements, to 1) easy to deploy, 2) confirm that signaling for NAT control is authenticated to at least level of normal TCP as being from the endpoint involved. Question (Philip Matthews): Are we focusing only on UDP? This problem may not exist for TCP. Does it affect anything else? Henning noted that his current hotel seems to time out on IMAP in about a minute, breaking IMAP. Suggestion: Requirement: needs to work on large deployments. Conclusion: Will focus on UDP initially. IPSEC support uses the IPSEC NAT traversal mode using UDP. native IPSEC appears to be out of scope Suggestion: It would be good if the result can be extended to support TCP, but the measure of success is to succeed well enough to get deployed. Comment: Key difference is that TCP has an explicit teardown that can be seen by NATs. Perhaps we could state the scope as protocols that do not have explicit teardowns. --- Topic: Survey of Existing Protocols led by Mary Barnes Slides presented Slide: Categorization of Protocols Question: What is the difference between two-party and multiparty? The distinction is based on whether there's an intermediary relay node apart from the NAT itself. Noted that this may not be a useful distinction for some people. Slides: Protocol Summaries Question: Diameter Gq' , Rx+, Gx+: These are DIAMETER based approaches primarily from 3GPP. Comment: There is also an Megaco-based H.348 protocol. Question: Where is ICE? It probably needs to be included in this summary. Question: What does "Supports Incremental Deployment" mean? We think this means whether the protocol is needed in every middle box on a path or only some. An alternative: Can someone who is interested in deploying a VoIP service make this stuff work by putting the protocol into just the small part of the network they control, or does it require putting boxes into parts they don't control. For example, enterprise IT people generally won't put MIDCOM in their firewall just so end users can access outside applications. This seems to be a very controversial characteristic. Noted that one of the primary goals of SAFE is getting incremental deployability, so we need to understand this better. Slides on Protocol Comparison Noted that NAT-PMP requires direction interaction with the middlebox. Slide on topology/environments: Much discussion over the "Topology Aware" column of the slide. Conclusion that this needs some re-thinking. Perhaps column should be labeled "Topology Unaware". Suggestion for 4th column on this slide: Identify where "end-to-end" breaks if you use each protocol, i.e. UNSAFE considerations. Discussion of the "Nested NATS" column: Suggested that the MIDCOM/SIMCO/DIAMETER series may not really support nested NATs. Discussion of "diverse endpoints" column: Can you give an example of yes and no by function? JDR: For example, UPnP is designed as a residential protocol, lacking nesting, authentication, etc. So this is a consequence of other properties. Perhaps this could be better phrased as discrete function layers than as the broad categorization being attempted. Summary (2) slide: Comment from Keith Moore: It seems like many of the problems with NAT protocols stem from assumption that interactions with the middle box are bad, so don't start with this assumption. Question from Henning Schulzrine: We seem to have a long-term goal in mind. There's a danger that we're always incrementally fixing something. For anything of these things to be truly useful, avoiding the probing problem seems to require some interaction with NATs. Should we look further ahead and start mapping out the things that we're going to want? Response from Colin: The current intent here is to explore a tightly-focused solution to the immediate problem. Suggested by Henning that we at least track the problems we aren't solving, and occasionally consider whether there are other intermediate steps that we should be taking. Counterargument from Philip Matthews: The big problem in getting past solutions to deploy is that the complexity isn't worth the return to the NAT vendors. We need something tightly focused that can get deployed quickly and easily. Comment from Comment from Lars: Questions for this BOF to answer: There are lots of solutions in this area. Do we need something else? Is STUN Control a reasonable thing to do? Is the IETF the right place to do this? --- Topic: NAT Control STUN Usage "STUN Control" led by Dan Wing Slides presented Slide: Tagging procedure with firewalls Question: This seems to indicate that the firewall wants to be seen. What happens when they don't? Ans: They don't tag and remain invisible. Qestion: Is there an assumption that the tagging firewall is the closest middlebox? Ans: No, they can be stacked, or there may be other layers. Slide: Communicate to NAT's embedded STUN server Question: How is signaling directed to the STUN server in the NAT? Ans: The source 3-tuple is reused along with the new destination address of the STUN server in the NAT. The stun server in the NAT correlates based on the source 3-tuple, establishing a binding on the whole 5-tuple. Slide set, nested NATs Comment: There are a lot of arrows here. Is there a way that a P2P app might use one command to open a lot of bindings? Ans: no. Comment: Is there a way to do the overlap bindings in parallel. Noted that since this binding adjustment can be done after the media flow starts, then there's no real setup delay. Slideset: Overlapping address spaces Question: It seems likely that overlapping addresses will occur everytime someone stacks up generic same-brand NAT boxes. Perhaps we should look at a fix instead of a detection. JDR notes that he proposed something on the list last night relating to stacked DHCP-obtained addresses so that routers can detect the conflict and re-request IP addresses to prevent conflict. Much groaning ensued in the audience. Noted that this is a real issue and we need to think about it some more. Comment from Philip Matthews: We don't ned a perfect solution, just something in STUN control that suggests address randomization. That is, new boxes that support this would randomize in net 10 instead of using 192.168.0.0/24 for their private-side addresses. Suggestion from Keith Moore: We need to find a way to detect this sort of brokenness and report it to the end user(or somebody else who can do something) so that they can do something about it. Discussion about issues of randomization in address exhaustion continued with no clear conclusion. --- General Discussion: Comment: We initially saw a lot of wrongness with NAT implementations that included ALGs. How does this not happen here? Ans: The protocol suggested does not do any transparent functions. It only does things by explicit interaction. Of course, there can always be bugs. Question (Aki Niemi): Have we enumerated the applications that would use this? Ans: Any UDP applications that have long periods of no data transmission. Re-discussion of power-management keepalive issue followed. Noted that the suggested approach requires changes to existing end points to gain the advantages of the suggested approach. For applications that aren't currently using STUN, adding STUN support is not an incremental change, even if adding STUN control afterwards would be. Noted that there is a larger question. This sort of keepalive problem applies not only to STUN-enabled NATs, but to stateful firewalls and other things. Do we want to solve this problem once or many times? Comment from Keith Moore; These partial solutions for specific applications may actually hurt deployability. It would be nice to have a more general solution. Open question: Should we split discovery protocol from control protocol? Noted by Hannes Tschofenig that there are people looking at STUN for other protocols that need NAT traversal. Discussion from Cullen: I think of generality in terms of which transport protocols it works with. There are things, like firewalls, where we need to manage something besides UDP. Keith Moore Comment: Two approaches: One is how to fix up the infrastructure so that works. The second is how to establish application layer protocols that work on an existing infrastructure. We seem to be focused on making applications work with existing IPV4 static-addressed NATs. Perhaps we should be looking at the infrastructure instead. Comment: We may need to take v4-v6 protocol translators into account as well. Perhaps we can define them to be better up-front, as they have identical issues to NATs and firewalls. Some respondents think that anything that really solves v4-v4 aps will be directly applicable even without considering v6 up front. Directive from Lars: Would like to suggest v4-v6 is out of scope for this discussion. Ongoing discussion ensued as to merits of narrow targeted solutions vs. broadly applicable solutions. (All known arguments repeated several times). Key discussion is incentive to equipment providers. Noted that the UDP applications being discussed here tend to drive equipment recommendation and purchase. Question: Is it reasonable to solve the problem in-the-small in one group and in-the-large elsewhere? Concern from Aki: This proposal seems to be targeted to SIP Outbound. It's much easier to fix SIP Outbound using TCP. Comment from Keith: There seems to be an assumption that a general solution would be expensive. This may no be valid. We need to move towards explicit guidance on where and what to upgrade. Noted that in addition to SIP and RTP, P2PSIP control connections and HIP are candidates for the proposed STUN control solution. Comment from JDR: The market breeds complexity and incremental single-issue solutions on its own. This is as general a solution as we might ever get deployed. --- Topic: Future Directions Is there a a problem that needs to be solved? Are some functional requirements or deployment considerations left unsatisfied by existing protocols? Noted that there are studies that show existing UDP keepalive reduces battery life by about 50%. Answers range from clearly yes to clearly no, with at least one "can not be determined from this BOF". Discussion by JDR: The real problem is the "push" class problems. Solutions like SIP Outbound convert these to client-server problems, but at significant cost. Keith notes that this sort of transformation creates barriers to deployment of many midrange protocols that can't pay for the massive rendezvous servers needed to use the current approaches. Aki reiterated arguments that fixing NAT keepalive intervals will not occur in the timeframe needed to make W-CDMA work, and that the only reasonable solution is to move to TCP for SIP immediately. --- Poll from chairs: For are requirements left unsatisfied (this question): Profound majority believes there are unsatisfied requirements? Poll question rephrased as "Are some functional requirements (for avoiding frequent keepalive) or deployment considerations left unsatisfied by existing protocols? And: Majority agrees, but a substantial minority disagrees Question: Is there agreement that that the IETF should consider developing a new NAT control mechanism to address these requirements? Discussion on whether this should be a new solution, a fix to an existing solution, be protocol agnostic (aka work for TCP), etc. Suggested that word "new" be deleted from question. Poll: Should the IETF try to solve the above problem? Result: Clear majority support. Question: Is the NAT control STUN usage a reasonable approach to NAT control, addressing the above requirements? Derek and others suggested that it would be reasonable if it includes TCP support. Francois argues that this question is premature. Keith believes this a a reasonable protocol, but that it would be nice to not have a separate protocol for every nob that might be tweaked on a NAT or a firewall. Philip suggest that the approach would be to bring it in as an individual contribution and follow the usual process. Remi and Aki re-suggest that the approach needs to solve TCP and UDP, but that we should clearly focus on keepalive and not tweaking NAT knobs. It seems that we are trying to control the answer by controlling the question here. Poll: Is the NAT control STUN usage a reasonable approach to NAT control, addressing the above requirements? Yes: No: Maybe: Seems to be weighted toward yes and maybe, no slightly quieter. Poll: Given that we have a number of proposals in this space, has our understanding of this problem space changed enough that we can build something that people actually will deploy? Response judged 1/3 yes, 2/3 no.