SAFE summary
------------

SAFE BOF Dec 3, 2007, 0900 - 1130
Chairs: Colin Perkins (csp@csperkins.org), 
        Markus Isomäki (markus.isomaki@nokia.com) 
Summary by Markus Isomäki

BOF proposal is available at:
http://www3.ietf.org/proceedings/07dec/agenda/safe.txt

The purpose of SAFE BOF was to discuss two things:
1. Is there interest in the IETF to work on solutions to reduce keep-alive
   traffic for NAT and firewall traversal. 
2. Is the newly-proposed technique for using STUN to discover, query and
   control firewalls and NATs, a reasonable approach to pursue in this
   space.

The intent of the BOF was not to form a new WG at this point, but to give
guidance how to continue work on this area.

Scope/properties of STUN control currently is that it works for UDP-only,
supports nested NATs and firewalls, detects non-supporting NATs and fails
safely with them, and operates on one binding/pinhole at a time using
transport source address for authorization. It's main purpose is to query
and adjust the refresh period for the binding/pinhole.

The main discussion topics that were raised during the BOF were:
- Is there a problem with keepalives: It was commented that 20-30 sec
  keepalive for UDP will dramatically affect battery lifetime e.g. in a
  device with WCDMA radio. Reference to a paper was later sent to SAFE and
  BEHAVE lists.
- Generality: The initial scope is only UDP. Should this work also for TCP?
  TCP got a lot of support as it also has unpredictable timeouts in
  middleboxes. On the other hand UDP-only solution would be simpler and
  easier to deploy. IPv4/IPv6 translators were also brought into
  discussion, but declared out of scope.
- Future path: If we adopt this protocol now, is there a path to the future
  problems we want to solve? Arguments for incremental deployment vs.
  generic solution.
- Which applications benefit from STUN/UDP solution: SIP/UDP, real-time
  media, IPSec/UDP, Teredo, Mobile IP. Some of these don't use STUN. Some,
  like SIP could run over TCP as well. In general the benefit is biggest
  for "push" type of applications that need to maintain connectivity over
  long periods with little actual traffic to send/receive. 
- Security model: In STUN-control the middlebox is controlled in-band with
  the application traffic and the transport source addresss is used for
  authorization. No bulk provisioning of mappings/pinholes is possible.
  This simplifies the security properties compared to
  third-party/out-of-band control models.
- Scope and difference to existing mechanisms: There was a presentation on
  survey for existing mechanisms. This caused a lot of clarifying questions
  on how the existing protocols were classified. 
- What is the incentive for the vendors to implement this: None of the
  previous control protocols have been deployed. STUN-control has some
  simplicity and incremental deployment benefits, at least for those
  applications that already use STUN. 
- Problem with overlapping address spaces and nested NATs: Can STUN-control
  deal with this? A couple of proposals exist, but are complicated. This is
  a generic problem with NATs, should it be solved regardless of
  STUN-control.

In the end of the session several polls were taken:
- Question: "Are some functional requirements (for avoiding  frequent
  keepalive) or deployment considerations left unsatisfied by existing
  protocols?" Yes/No Result: Majority agrees, but a substantial minority
  disagrees
- Question: "Should the IETF try to solve the problem?" Yes/No Result:
  Clear majority support.
- Question: "Is the NAT control STUN usage a reasonable approach to NAT
  control, addressing the above requirements?" Yes/Maybe/No (At this point
  many people commented that this is not a clear question, as some people
  would also want to support TCP etc.) Result: Seems to be weighted toward
  "yes" and "maybe", "no" slightly quieter.
- Question: "Given that we have a number of proposals in this space, has
  our understanding of this problem space changed enough that we can build
  something that people actually will deploy?" Yes/No Result: Response
  judged 1/3 "yes", 2/3 "no".


SAFE minutes
------------

Notes, SAFE Dec 3 2007 0900
recorded by Dean Willis
Chaired by Colin Perkins, Markus Isomaki

---

Topic: Agenda Bash
slides presented by chairs

Agenda accepted as proposed by chairs.

"Note Well" statement and IPR notice reviewed.

Chairs present slides.

NOTE: THIS IS NOT A "WORKING GROUP FORMING BOF" -- we are attempting to
decide whether there is a need for work in this area. We are not discussing
charters or any formative process issues in this meeting.

---

Topic: Problem Statement and Scope
led by Dan Wing
Slides presented

Problem: current NAT traversal approaches require keepalives. This produces
traffic and power consumption issues, especially for wireless battery
powered devices.

Scope: Create a NAT control technique that solves keepalive and nesting,
detects and fails safely with non-upgraded NATs, and uses source transport
address for authorization.

Discussion follows . .. .

Question:  Need to clarify relationship between determining and adjusting
of a NAT keepalive interval. Do we need to do both, or will a system do
just one? Concluded that the critical piece is determination. We would also
like to be able to do adjustment of the timing.

Question: It seems that constraining the solutions space to use source
transport address might be excessive. Do we want to constrain to this level
up front? Are there other possible techniques that should be used? 

Response by JDR: The idea here is to emulate what NATs already do, which is
5-tuple address based , which we understand well.  Things that do protocol
inspection or out-of-path controls raise lots of security and deployability
issues.

Suggestion: Rephrase as requirements, to 1) easy to deploy, 2) confirm that
signaling for NAT control is authenticated to at least level of normal TCP
as being from the endpoint involved.

Question (Philip Matthews): Are we focusing only on UDP?  This problem may
not exist for TCP. Does it affect anything else? Henning noted that his
current hotel seems to time out on IMAP in about a minute, breaking IMAP. 

Suggestion: Requirement: needs to work on large deployments.

Conclusion: Will focus on UDP initially. IPSEC support uses the IPSEC NAT
traversal mode using UDP. native IPSEC appears to be out of scope

Suggestion: It would be good if the result can be extended to support TCP,
but the measure of success is to succeed well enough to get deployed.

Comment: Key difference is that TCP has an explicit teardown that can be
seen by NATs. Perhaps we could state the scope as protocols that do not
have explicit teardowns.

---

Topic: Survey of Existing Protocols
led by Mary Barnes
Slides presented

Slide: Categorization of Protocols

Question: What is the difference between two-party and multiparty? The
distinction is based on whether there's an intermediary relay node apart
from the NAT itself. Noted that this may not be a useful distinction for
some people.

Slides: Protocol Summaries

Question: Diameter Gq' , Rx+, Gx+: These are DIAMETER based approaches
primarily from 3GPP. 

Comment: There is also an Megaco-based H.348  protocol.

Question: Where is ICE? It probably needs to be included in this summary.

Question: What does "Supports Incremental Deployment" mean? We think this
means whether the protocol is needed in every middle box on a path or only
some. An alternative: Can someone who is interested in deploying a VoIP
service make this stuff work by putting the protocol into just the small
part of the network they control, or does it require putting boxes into
parts they don't control. For example, enterprise IT people generally won't
put MIDCOM in their firewall just so end users can access outside
applications. This seems to be a very controversial characteristic. Noted
that one of the primary goals of SAFE is getting incremental deployability,
so we need to understand this better.

Slides on Protocol Comparison

Noted that NAT-PMP requires direction interaction with the middlebox.

Slide on topology/environments:

Much discussion over the "Topology Aware" column of the slide. Conclusion
that this needs some re-thinking. Perhaps column should be labeled
"Topology Unaware".

Suggestion for 4th column on this slide: Identify where "end-to-end" breaks
if you use each protocol, i.e. UNSAFE considerations.  

Discussion of the "Nested NATS" column: Suggested that the
MIDCOM/SIMCO/DIAMETER series may not really support nested NATs.

Discussion of "diverse endpoints" column: Can you give an example of yes
and no by function? JDR: For example, UPnP is designed as a residential
protocol, lacking nesting, authentication, etc. So this is a consequence of
other properties. Perhaps this could be better phrased as discrete function
layers than as the broad categorization being attempted.

Summary (2) slide:  Comment from Keith Moore: It seems like many of the
problems with NAT protocols stem from assumption that interactions with the
middle box are bad, so don't start with this assumption.

Question from  Henning Schulzrine: We seem to have a long-term goal in
mind. There's a danger that we're always incrementally fixing something.
For anything of these things to be truly useful, avoiding the probing
problem seems to require some interaction with NATs. Should we look further
ahead and start mapping out the things that we're going to want? Response
from Colin: The current intent here is to explore a tightly-focused
solution to the immediate problem. Suggested by Henning that we at least
track the problems we aren't solving, and occasionally consider whether
there are other intermediate steps that we should be taking.
Counterargument from Philip Matthews: The big problem in getting past
solutions to deploy is that the complexity isn't worth the return to the
NAT vendors. We need something tightly focused that can get deployed
quickly and easily. Comment from 

Comment from Lars: Questions for this BOF to answer: There are lots of
solutions in this area. Do we need something else? Is STUN Control a
reasonable thing to do? Is the IETF the right place to do this?

---

Topic: NAT Control STUN Usage "STUN Control"
led by Dan Wing
Slides presented

Slide: Tagging procedure with firewalls

Question: This seems to indicate that the firewall wants to be seen. What
happens when they don't? Ans: They don't tag and remain invisible.

Qestion: Is there an assumption that the tagging firewall is the closest
middlebox? Ans: No, they can be stacked, or there may be other layers.

Slide: Communicate to NAT's embedded STUN server

Question: How is signaling directed to the STUN server in the NAT? Ans: The
source 3-tuple is reused along with the new destination address of the STUN
server in the NAT. The stun server in the NAT correlates based on the
source 3-tuple, establishing a binding on the whole 5-tuple.

Slide set, nested NATs

Comment: There are a lot of arrows here. Is there a way that a P2P app
might use one command to open a lot of bindings? Ans: no.

Comment: Is there a way to do the overlap bindings in parallel. Noted that
since this binding adjustment can be done after the media flow starts, then
there's no real setup delay.

Slideset: Overlapping address spaces

Question: It seems likely that overlapping addresses will occur everytime
someone stacks up generic same-brand NAT boxes. Perhaps we should look at a
fix instead of a detection. JDR notes that he proposed something on the
list last night relating to stacked DHCP-obtained addresses so that routers
can detect the conflict and re-request IP addresses to prevent conflict.
Much groaning ensued in the audience. Noted that this is a real issue and
we need to think about it some more.

Comment from Philip Matthews: We don't ned a perfect solution, just
something in STUN control that suggests address randomization. That is, new
boxes that support this would randomize in net 10 instead of using
192.168.0.0/24 for their private-side addresses.

Suggestion from Keith Moore: We need to find a way to detect this sort of
brokenness and report it to the end user(or somebody else who can do
something) so that they can do something about it.

Discussion about issues of randomization in address exhaustion continued
with no clear conclusion.

---

General Discussion: 

Comment: We initially saw a lot of wrongness with NAT implementations that
included ALGs. How does this not happen here? Ans: The protocol suggested
does not do any transparent functions. It only does things by explicit
interaction. Of course, there can always be bugs.

Question (Aki Niemi): Have we enumerated the applications that would use
this? Ans: Any UDP applications that  have long periods of no data
transmission.  Re-discussion of power-management keepalive issue followed.
Noted that the suggested approach requires changes to existing end points
to gain the advantages of the suggested approach.  For applications that
aren't currently using STUN, adding STUN support is not an incremental
change, even if adding STUN control afterwards would be.

Noted that there is a larger question. This sort of keepalive problem
applies not only to STUN-enabled NATs, but to stateful firewalls and other
things. Do we want to solve this problem once or many times?

Comment from Keith Moore; These partial solutions for specific applications
may actually hurt deployability. It would be nice to have a more general
solution.

Open question: Should we split discovery protocol from control protocol? 

Noted by Hannes Tschofenig that there are people looking at STUN for other
protocols that need NAT traversal.

Discussion from Cullen: I think of generality in terms of which transport
protocols it works with. There are things, like firewalls, where we need to
manage something besides UDP. Keith Moore

Comment: Two approaches: One is how to fix up the infrastructure so that
works. The second is how to establish application layer protocols that work
on an existing infrastructure.  We seem to be focused on making
applications work with existing IPV4 static-addressed NATs. Perhaps we
should be looking at the infrastructure instead.

Comment: We may need to take v4-v6 protocol translators into account as
well. Perhaps we can define them to be better up-front, as they have
identical issues to NATs and firewalls. Some respondents think that
anything that really solves v4-v4 aps will be directly applicable even
without considering v6 up front.

Directive from Lars: Would like to suggest v4-v6 is out of scope for this
discussion.

Ongoing discussion ensued as to merits of narrow targeted solutions vs.
broadly applicable solutions. (All known arguments repeated several times).
Key discussion is incentive to equipment providers. Noted that the UDP
applications being discussed here tend to drive equipment recommendation
and purchase.

Question: Is it reasonable to solve the problem in-the-small in one group
and in-the-large elsewhere? 

Concern from Aki: This proposal seems to be targeted to SIP Outbound. It's
much easier to fix SIP Outbound using TCP. 

Comment from Keith: There seems to be an assumption that a general solution
would be expensive. This may no be valid. We need to move towards explicit
guidance on where and what to upgrade.

Noted that in addition to SIP and RTP, P2PSIP control connections and HIP
are candidates for the proposed STUN control solution.

Comment from JDR: The market breeds complexity and incremental single-issue
solutions on its own.  This is as general a solution as we might ever get
deployed.

---

Topic: Future Directions

Is there a a problem that needs to be solved? Are some functional
requirements or deployment considerations left unsatisfied by existing
protocols? Noted that there are studies that show existing UDP keepalive
reduces battery life by about 50%.

Answers range from clearly yes to clearly no, with at least one "can not be
determined from this BOF".

Discussion by JDR: The real problem is the "push" class problems. Solutions
like SIP Outbound convert these to client-server problems, but at
significant cost. 

Keith notes that this sort of transformation creates barriers to deployment
of many midrange protocols that can't pay for the massive rendezvous
servers needed to use the current approaches. 

Aki reiterated arguments that fixing NAT keepalive intervals will not occur
in the timeframe needed to make W-CDMA work, and that the only reasonable
solution is to move to TCP for SIP immediately.

---

Poll from chairs: For are requirements left unsatisfied (this question):
Profound majority believes there are unsatisfied requirements?

Poll question rephrased as "Are some functional requirements (for avoiding
frequent keepalive) or deployment considerations left unsatisfied by
existing protocols? And: Majority agrees, but a substantial minority
disagrees

Question: Is there agreement that that the IETF should consider developing
a new NAT control mechanism to address these requirements? Discussion on
whether this should be a new solution, a fix to an existing solution, be
protocol agnostic (aka work for TCP), etc.

Suggested that word "new" be deleted from question.

Poll: Should the IETF try to solve the above problem? Result: Clear
majority support.

Question: Is the NAT control STUN usage a reasonable approach to NAT
control, addressing the above requirements?

Derek and others suggested that it would be reasonable if it includes TCP
support. Francois argues that this question is premature. Keith believes
this a a reasonable protocol, but that it would be nice to not have a
separate protocol for every nob that might be tweaked on a NAT or a
firewall. Philip suggest that the approach would be to bring it in as an
individual contribution and follow the usual process. Remi and Aki
re-suggest that the approach needs to solve TCP and UDP, but that we should
clearly focus on keepalive and not tweaking NAT knobs.

It seems that we are trying to control the answer by controlling the
question here.

Poll:  Is the NAT control STUN usage a reasonable approach to NAT control,
addressing the above requirements?

Yes:
No:
Maybe:

Seems to be weighted toward yes and maybe, no slightly quieter.

Poll: Given that we have a number of proposals in this space, has our
understanding of this problem space changed enough that we can build
something that people actually will deploy?

Response judged 1/3 yes, 2/3 no.