Network Working Group                                  Dave Thaler
Internet-Draft                                   Christian Huitema
Expires: January 2002                                    Microsoft
                                                      19 July 2001


                Multi-link Subnet Support in IPv6
          <draft-thaler-ipngwg-multilink-subnets-01.txt>


Status of this Memo

This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups.  Note that
other groups may also distribute working documents as Internet-
Drafts.

Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time.  It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as "work
in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.


Copyright Notice

Copyright (C) The Internet Society (2001).  All Rights Reserved.


Expires January 2002                                      [Page 1]


Draft                    Multilink Subnets               July 2001


Abstract

Bridging multiple links into a single entity has several
operational advantages.  A single subnet prefix is sufficient to
support multiple physical links.  There is no need to allocate
subnet numbers to the different networks, simplifying management.
This document introduces the concept of a "multilink subnet",
defined as a collection of independent links, connected by
routers, but sharing a common subnet prefix.  It then specifies
the behavior of multilink subnet routers so that no changes to
host behavior are needed.


1.  Introduction

Bridging multiple links into a single entity has several
operational advantages.  A single subnet prefix is sufficient to
support multiple physical links.  There is no need to allocate
subnet numbers to the different networks, simplifying management.

However, not all link-layer media can be easily bridged.  Classic
IEEE 802 bridging technology fails when the media does not
naturally support IEEE 802 addressing.  Furthermore, the operation
becomes problematic when the different links don't support the
same MTU size.  Finally, bridging cannot be easily implemented
when the network interface cannot be easily placed in
"promiscuous" mode.

This document introduces the concept of a "multilink subnet",
defined as a collection of independent links, connected by
routers, but sharing a common subnet prefix.  Herein we discuss
many of the problems and possible solutions surrounding this
concept.  The initial version of this draft will not specify
behavior, but merely discuss the tradeoffs.  A later version will
narrow the solution space to a recommended approach.


2.  Terminology

multilink subnet:
      a collection of independent links, connected by routers, but
      sharing a common subnet prefix.

subnet scope:
      multicast SCOP value 3, as specified in [ADDRARCH], which


Expires January 2002                                      [Page 2]


Draft                    Multilink Subnets               July 2001


      covers a (potentially multilink) subnet.  This is the next
      larger multicast scope above link scope.

multilink-subnet router (MSR):
      a router which has interfaces attached to different links in
      a multilink subnet, and which implements the rules in this
      document.

Class 1 multilink subnet:
      a multilink subnet with only one MSR.

Class 2 multilink subnet:
      a multilink subnet composed of multiple MSRs and links in a
      tree topology.  That is, there is only one possible path
      within the subnet between any pair of nodes in the subnet.

Class 3 multilink subnet:
      a multilink subnet composed of multiple MSRs and links
      connected together in an arbitrary topology.

Class 1 MSR:
      an MSR which only works in a Class 1 multilink subnet.

Class 2 MSR:
      an MSR which only works in a Class 1 or 2 multilink subnet.

Class 3 MSR:
      an MSR which works in all types of multilink subnets.


3.  Design Goals

Multilink subnets are designed with the following goals in mind:

o    Existing IPv6 end hosts should continue to work when
     connected to a multilink subnet, without requiring any change
     to their behavior.  For example, the host behavior parts of
     Router Discovery, Neighbor Discovery [ND], and Multicast
     Listener Discovery [MLD], must be supported.

o    Leave link-local address behavior unchanged.  Link-local
     behavior continues to function only within a link, not across
     a multilink subnet.  That is sending and receiving unicast,
     anycast, and multicast traffic within the link should be
     supported in the normal fashion.


Expires January 2002                                      [Page 3]


Draft                    Multilink Subnets               July 2001


o    Also support sending and receiving unicast and anycast
     traffic at the site and global scopes.

o    Also support sending and receiving multicast traffic at the
     subnet scope and above.

o    Prevent routing loops.

o    Support nodes moving between links within the subnet, with a
     reasonably fast convergence time (on the same order as
     Neighbor Unreachability Detection).

o    In a Class 3 multilink subnet, exploit richer connectivity
     than just using a spanning tree.


4.  Overview

This section gives an overview of multilink subnets.  We describe
the behavior of hosts (which is normal IPv6 host behavior with no
changes), and the resulting requirements for routers.


4.1.  Router Discovery

Router Discovery continues to work on a per-link basis, as
specified in [ND].  When sending Router Advertisements (RAs) with
a Prefix Information Option, there are two possibilities for how
an MSR can influence the Neighbor Discovery procedure used.


4.1.1.  Making hosts not use ND

If the MSR sets the A (autonomous address-configuration) flag on,
and the L (on-link) flag off, then hosts on the link will attempt
stateless address configuration [ADDRCONF] in the given prefix,
but will not treat the prefix as being on-link.  As a result,
neighbor discovery is effectively disabled and packets to new
destinations always go to the router first, which will then either
forward them if the destination is off-link, or redirect them if
the destination is on-link.

In the remainder of this document, we will refer to this model as
the "off-link" model, since hosts initially treat all addresses in
the subnet as being off-link.


Expires January 2002                                      [Page 4]


Draft                    Multilink Subnets               July 2001


4.1.2.  Making hosts use ND

If the MSR sets both the A and the L flags, then hosts on the link
will perform stateless address configuration and neighbor
discovery as usual.  However, since Neighbor Solicitations (NSs)
from existing hosts are sent to a link-scoped solicited-node
multicast address, they will never reach nodes on other links
within the subnet.  Instead, MSRs must either know the location of
the destination a priori, or else be able to relay such NS's to
other links, either using link-scoped NS's relayed link-by-link,
or using a subnet-scoped NS.

In the remainder of this document, we will refer to this model as
the "on-link" model, since hosts treat all addresses in the subnet
as being on-link.


4.1.3.  Effects on Duplicate Address Detection

In either approach above, existing nodes will still do Duplicate
Address Detection using the link-scoped solicited-node multicast
address.

Two important issues arise that must be addressed:

1) If two nodes on different links simultaneously attempt DAD for
   the same address, care must be taken to so that the collision
   is detected correctly.

2) If a node moves from one link to another link in the same
   multilink subnet, and performs DAD in its new location, care
   must be taken so that MSRs can distinguish between such a move,
   and a legitimate duplicate, so that after the move, the node
   can retain its address.
Because of these issues, routers cannot use cached information to
respond on behalf of off-link nodes.

Another problem arises from the statement in [ND] that: "the link-
local address MUST be tested for uniqueness, and if no duplicate
address is detected, an implementation MAY choose to skip
Duplicate Address Detection for additional addresses derived from
the same interface identifier".

Collisions would result if the interface identifier were unique on
the link, but not across the entire multilink subnet.  To avoid


Expires January 2002                                      [Page 5]


Draft                    Multilink Subnets               July 2001


this, MSRs must get involved in duplicate address detection even
for link-local addresses, to ensure that all addresses are unique
across a multilink subnet.


4.2.  Neighbor Discovery

Neighbor Discovery is used differently, depending on whether the
on-link or off-link model is used, as described in the previous
section.


Off-link model
     If the subnet is treated as being off-link, all packets are
     sent to a default router.  It is then the default router's
     responsibility to figure out the next-hop of the packets.  If
     the next-hop is on-link, it sends a Redirect to the source.

On-link model
     If the subnet is treated as being on-link, nodes will send
     NS's to the solicited node multicast address.  (If a node has
     interfaces attached to multiple links in the subnet, NS's MAY
     be sent on each link.)  If the next-hop is off-link, a router
     will respond with a proxy Neighbor Advertisement (NA)
     containing its own link-layer address.

In either case, it is the router's responsibility to determine
whether a destination in the subnet is on-link.

In this version of this draft, we will describe the rules for both
of the above models.  A future version of this draft may choose
only one of them.


5.  Basic (Class 1) Behavior

In a Class 1 multilink subnet, only one router exists.  This might
be the case, for example, in a home network where a router
connects a wired and a wireless link together to form a single
subnet.


Expires January 2002                                      [Page 6]


Draft                    Multilink Subnets               July 2001


5.1.  Basic Unicast

In this section, we step through an example of basic unicast
communication, assuming that address configuration has already
completed, and the router's routing table and neighbor cache
already have any required information.

In the simple scenario depicted in Figure 1 below, two links, (1)
and (2) on a common subnet with global prefix G, are connected by
an MSR B.  Node A has link-layer address a on link 1, and has
acquired global IPv6 address Ga.  Similarly, MSR B has on link 1,
link-layer address b1, and IPv6 address Gb1, and on link 2, link-
layer address b2 and IPv6 address Gb2.  Node C has link-layer
address c on link 2, and IPv6 address Gc.  Node D has link-layer
address d on link 1, and IPv6 address Gd.

+---+                      +---+
| A |                      | D |
+-+-+                      +-+-+
  |                          |
--+------------+-------------+--------------(1)--
               |
             +-+-+
             | B |
             +-+-+
               |
---------------+-------------+--------------(2)--
                             |
                           +-+-+
                           | C |
                           +---+

                    Figure 1: Class 1 Scenario


Off-link model
     When A wants to start communication with Gc, it finds that
     the destination address matches no on-link prefix, and so
     sends the packet directly to its default router B.  B first
     applies its usual packet validation rules (including
     decrementing the Hop Count in the IPv6 header).  B knows that
     C is on-link to link 2, with link-layer address c, and so it
     forwards the packet to C.


Expires January 2002                                      [Page 7]


Draft                    Multilink Subnets               July 2001


     When A wants to communicate with Dc, it again finds that the
     destination address matches no on-link prefix, and so sends
     the packet directly to its default router B.  B knows that D
     is on-link to the same link as A, and so responds with a
     Redirect.


On-link model
     When A wants to start communication with Gc, it finds that
     the destination address matches an on-link prefix, and so
     sends an NS to the solicited-node multicast address Sc
     constructed from Gc.  The NS message is received by the MSR
     B, which listens on all multicast groups.  B knows that C is
     on-link to link 2, and responds to A with an NA containing
     its own link-layer address b1 as the Target Link-Layer
     Address.

     After this, A can send packets to the address Gc.  The
     packets will be sent to the link address b1; they will be
     received by B, which will apply its usual validation rules
     (including decrementing the Hop Count in the IPv6 header),
     and forward them to the address c on link 2.

     When A wants to communicate with Gd, it again finds that the
     destination address matches an on-link prefix, and so sends
     an NS to its solicited-node multicast address.  D receives
     the NS and responds.  B also receives the NS, but knows that
     D is on the same link as A, and so does not respond.

Note that we did not assume that the links had to use IEEE 802
addresses, or in fact any form of consistent addressing.  B can
also handle MTU discovery procedures, returning an ICMP messages
if either A or C sends a packet that is too long.


5.2.  Router Configuration

The previous section assumed that the router's routing table and
neighbor cache already had any required information.  We now
describe how this can be done.

Like any other router, an MSR can acquire routes (including the
subnet prefix) by using manual configuration or a routing
protocol.  An MSR with all interfaces in the same subnet MAY


Expires January 2002                                      [Page 8]


Draft                    Multilink Subnets               July 2001


acquire its information solely based on RAs received from another
router (which is not an MSR), in the same way a host would.  It
can then advertise the same prefix/route information on other
links in the subnet, using either the on-link or off-link model.

When needing to resolve a target address to a next-hop (when a
host performs ND or DAD), it send a Neighbor Solicitation on each
attached link in the subnet, except in the on-link model one is
not sent back on the link from which an NS was just received.
After sending an NS, the router suppresses sending of any other
NS's for the same target address for a short interval (which must
be less than ND's RetransTimer).  While it is resolving a next-
hop, the router also remembers each node sending an NS for the
same target address.


A Neighbor Advertisement would be sent in response to an NS only
by (a) the actual node with the target address, or (b) an MSR
which has received an NA in response to a relayed NS it sent as a
result of receiving the first NS.  Specifically, an NA is not sent
just because the MSR has a neighbor cache entry for the target.
When an MSR receives an NA, it sends an NA to all nodes from which
it received NSs above.

As specified in [ND], proxy Neighbor Advertisements sent by MSR's
on behalf of remote targets always have the Override bit clear.


5.3.  Multicast

Most current multicast routing protocols are based on a "Reverse-
Path Forwarding" check.  That is, they drop a packet if the packet
does not arrive on the link towards a given address (e.g., the
source address, or a Rendezvous Point address associated with the
group address).  Thus, multicast will work as long as a router can
tell which link is towards any address within the subnet.  Note
that in particular, simply using the subnet route is not
sufficient in a multilink subnet.  If an MSR's longest-match RPF
lookup matches the subnet route for the multilink subnet, it means
the source is in the subnet, and the neighbor cache is consulted
(as for unicast) to find the link towards the source.


Expires January 2002                                      [Page 9]


Draft                    Multilink Subnets               July 2001


5.4.  Disabling Class 1 MSRs

The above rules assume that the MSR is the only MSR in the subnet.
Consequently, a Class 1 MSR MUST disable itself if it detects that
another MSR is present.  This can be done by assigning a flag
(say, bit 0x4) in the RA that is set by all MSRs.

TBD: If an MSR only sends RAs on links other than the one from
which it got an RA from the "real" (non-MSR) router, then it seems
that there can safely be multiple MSR's in parallel on that same
link, and the rule above will not disable them either.  This is
really a Class 2 subnet.

TBD: Do the rules above work in transit multilink subnets or not?
If not, it also needs to disable itself if any RAs are seen on
multiple links in the subnet.


6.  Class 2 Behavior

                           to Internet
                                |
                              +-+-+
                              | R |
                              +-+-+
                                |                       |
   --+------------+-------------+----------+---(1)--    |  +---+
     |            |                        |            +--+ F |
   +-+-+        +-+-+                    +-+-+          |  +---+
   | C |        | A |                    | B +----------+
   +---+        +-+-+                    +-+-+          |
                  |                        |            |  +---+
      --(2)-------+-------+----   ---+-----+---(3)--    +--+ G |
                          |          |                  |  +---+
                        +-+-+      +-+-+                |
                        | D |      | E |               (4)
                        +---+      +---+                |

                    Figure 2: Class 2 Scenario


Figure 2 shows a sample tree topology with MSRs A and B connecting
four links into a single subnet with hosts C, D, E, F, and G.  R
is a normal router that provides connectivity to an internet, and
sends RAs on link 1.


Expires January 2002                                     [Page 10]


Draft                    Multilink Subnets               July 2001


TBD: Fill this in.  Is there actually any difference between Class
1 and Class 2 behavior, if a single specific link is configured as
the "upstream" (towards outside of the subnet) link?  We may be
able to collapse Class 1 and 2 into the same thing.

TBD: What about transit subnets with exit points on different
links?


7.  Security Considerations

TBD.


8.  Appendix A: Class 3 Behavior

In the network depicted in Figure 2, we have now three links, and
also three multilink-subnet routers (MSRs), B, E, and F.

+---+                      +---+
| A |                      | D |
+-+-+                      +-+-+
  |                          |
--+------------+-------------+----------+---(1)--
               |                        |
             +-+-+                    +-+-+
             | B |                    | E |
             +-+-+                    +-+-+
               |                        |
    -----------+-------------+----------+---(2)--
                             |
                           +-+-+
                           | F |
                           +-+-+
                             |
            ------+----------+--------------(3)--
                  |
                +-+-+
                | C |
                +---+

                    Figure 3: Class 3 Scenario


The network is sufficiently complex to expose several problems:


Expires January 2002                                     [Page 11]


Draft                    Multilink Subnets               July 2001


o    If A sends an NS packet, that packet is received by both B
     and E.  Depending on the inter-router communication
     mechanism, this could lead to duplicate transmissions on link
     2, and possibly to random behaviors, or to loops.

o    If A sends a multicast packet, and that packet is relayed by
     both B and E, it would lead to duplicate traffic, or even
     potential loops.  It may not be relayed at all, if neither B
     nor E realize there is a group member hidden behind F.

There are multiple possible approaches to solving the above
problems which might meet our design goals.  We discuss each
approach in turn below, with examples using Figure 3 when no
previous state is known.


8.1.  Method A: Flooding Neighbor Solicitations

Neighbor Soliticitations and Advertisements are proxied as
described earlier, with the following additional rules.

Since multiple paths may exist, to assist in loop prevention and
provide shortest paths, a new "Local Distance" option in NA's can
be defined:
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     Type      |    Length     |    Reserved   |  Hop Count    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          Timestamp                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The option contains five fields, encoded in 8 octets.

The Hop Count field contains an 8-bit unsigned integer being the
number of hops between the advertising station and the source or
the target address.  It is used to assist in loop prevention and
provide shortest paths.

The Timestamp, is a 32-bit integer (in seconds) that describes the
time at which the source or target address was last advertised by
the actual node with that address.  It is used to ensure that
neighbor discovery messages do not loop forever if the propagation
delay through across the subnet is significant.  (Authors' note:


Expires January 2002                                     [Page 12]


Draft                    Multilink Subnets               July 2001


is there a way to make this work without synchronized clocks?  Is
a Timestamp really required?)

If this option is used, it is expected that an MSR's neighbor
cache entries would also contain the Hop Count and Timestamp
information associated with the link-layer address used.

The absence of such this option implies a Hop Count value of 0.
When proxying an NA, an MSR would include the Local Distance
option with an incremented value.  Legacy nodes will ignore the
option, but MSRs (and new nodes if they wish) can use the option
to prefer link-layer addresses with a lower Local Distance.

To route actual packets, an MSR's route lookup would determine
that the longest matching route is on-link to multiple links.  The
router would consult its (conceptual) neighbor cache, and use the
next-hop with the lowest Local Distance.  The same procedure would
apply to multicast packets as well, when the router would look up
the RPF address.


8.1.1.  On-link model example

In Figure 3, when A wishes to communicate with Gc, both B and E
will receive an NS from A.  Each will originate an NS for Gc on
link 2.  B, E, and F will receive the NS's on link 2.  B and E
will ignore each others' NS since they have just sent an NS for
the same address.

F will receive the NS's and the first one will cause it to create
a neighbor cache entry in the INCOMPLETE state, and originate its
own NS on link 3.  When C receives this NS, it will respond with
an NA.

When F receives the NA from C, it will respond to B and E with an
NA with its own link-layer address f2 as the Target Link Layer
Address, and a "Local Distance = 1" option.  B and E will then
respond to A with NAs containing b1 and e1, respectively, as the
Target Link Layer Address, and a "Local Distance = 2" option.


8.1.2.  Off-link model example

In Figure 3, when A wishes to communicate with Gc, it will send
packets to a default router, say, B.  B will send an NS on link 1,


Expires January 2002                                     [Page 13]


Draft                    Multilink Subnets               July 2001


which will be received by E, and on link 2 which will be received
by E and F.  Depending on timing, E may send an NS on link 1 or
link 2 or neither.  (If a short delay were inserted before
sending, both could be suppressed.)  F will send an NS on link 3,
to which C will reply with an NA.  Upon receiving the NA, F sends
an NA to all nodes from which it has seen an NS for Gc, namely B
and possibly E.

B (and possibly E as well) will then send an NA on link 1, after
which A can communicate with C.


8.2.  Method B: Proactively populate host routes

The basic idea here is that MSR's would inject host routes into a
routing protocol used within (at least) the subnet upon detecting
a new node on a directly-connected link (e.g., when DAD completes
on an Ethernet, or when IPv6CP completes on a PPP link).

Once host routes exist, either the off-link or the on-link model
could be used.  In addition, multicast works with no changes,
since host routes would be used for RPF checks.

Another advantage is that since all resolution is done by MSR's "a
priori", no additional delay is incurred when A wants to
communicate with A.  If the on-link model is used, no neighbor
discovery delay exists at all.  Packets are immediately forwarded
along the correct path.  This approach avoids all bursty-source
problems.


Since host routes are cached state, they cannot, however, be used
for duplicate address detection, due to the issues described in
Section 4.1.3.  That is, the presence of a host route does not
imply a duplicate, since the node may have just moved.  The lack
of a host route does not imply uniqueness, since another node may
be simultaneously choosing the same address.  As a result, DAD
requires additional mechanisms, such as flooding neighbor
discovery messages as in Method A, or provided by a specialized
routing protocol.

Work in progress in the Mobile Ad-hoc Networks (manet) WG may
provide solutions to the above problems in the future.


Expires January 2002                                     [Page 14]


Draft                    Multilink Subnets               July 2001


9.  Acknowledgements

Steve Deering, Brian Zill, Hesham Soliman, and Karim ElMalki
participated in discussions that led to this draft.  The term
"multilink subnet" was coined by Steve Deering.


10.  Authors' Addresses

     Dave Thaler
     Microsoft Corporation
     One Microsoft Way
     Redmond, WA  98052-6399
     Phone: +1 425 703 8835
     EMail: dthaler@microsoft.com

     Christian Huitema
     Microsoft Corporation
     One Microsoft Way
     Redmond, WA  98052-6399
     EMail: huitema@microsoft.com


11.  References

[ADDRARCH]
     Hinden, R., and S. Deering, "IP Version 6 Addressing
     Architecture", RFC 2373, July 1998.

[ADDRCONF]
     Thomson, S., and T. Narten, "IPv6 Stateless Address
     Autoconfiguration", RFC 2462, December 1998.

[MLD]
     Deering, S., Fenner, W., and B. Haberman, "Multicast Listener
     Discovery (MLD) for IPv6", RFC 2710, October 1999.

[ND] Narten, T., Nordmark, E., and W. Simpson, "Neighbor Discovery
     for IP Version 6 (IPv6)", RFC 2461, December 1998.


12.  Full Copyright Statement

Copyright (C) The Internet Society (2001).  All Rights Reserved.


Expires January 2002                                     [Page 15]


Draft                    Multilink Subnets               July 2001


This document and translations of it may be copied and furnished
to others, and derivative works that comment on or otherwise
explain it or assist in its implmentation may be prepared, copied,
published and distributed, in whole or in part, without
restriction of any kind, provided that the above copyright notice
and this paragraph are included on all such copies and derivative
works.  However, this document itself may not be modified in any
way, such as by removing the copyright notice or references to the
Internet Society or other Internet organizations, except as needed
for the purpose of developing Internet standards in which case the
procedures for copyrights defined in the Internet Standards
process must be followed, or as required to translate it into
languages other than English.

The limited permissions granted above are perpetual and will not
be revoked by the Internet Society or its successors or assigns.

This document and the information contained herein is provided on
an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Expires January 2002                                     [Page 16]