[rrg] IRON-RANGER, an interesting Core-Edge Separation (CES) architecture
Robin Whittle <rw@firstpr.com.au> Wed, 31 March 2010 05:26 UTC
Return-Path: <rw@firstpr.com.au>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id BB17F3A6BE7 for <rrg@core3.amsl.com>; Tue, 30 Mar 2010 22:26:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.413
X-Spam-Level: **
X-Spam-Status: No, score=2.413 tagged_above=-999 required=5 tests=[AWL=0.578, BAYES_50=0.001, DNS_FROM_OPENWHOIS=1.13, HELO_EQ_AU=0.377, HOST_EQ_AU=0.327]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mwtIfHQQ6YWo for <rrg@core3.amsl.com>; Tue, 30 Mar 2010 22:26:18 -0700 (PDT)
Received: from gair.firstpr.com.au (gair.firstpr.com.au [150.101.162.123]) by core3.amsl.com (Postfix) with ESMTP id CDE0B3A6C12 for <rrg@irtf.org>; Tue, 30 Mar 2010 22:18:50 -0700 (PDT)
Received: from [10.0.0.6] (wira.firstpr.com.au [10.0.0.6]) by gair.firstpr.com.au (Postfix) with ESMTP id 81C68175BCC; Wed, 31 Mar 2010 16:19:12 +1100 (EST)
Message-ID: <4BB2DB52.1070901@firstpr.com.au>
Date: Wed, 31 Mar 2010 16:19:14 +1100
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
To: RRG <rrg@irtf.org>, "Templin, Fred L" <Fred.L.Templin@boeing.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Subject: [rrg] IRON-RANGER, an interesting Core-Edge Separation (CES) architecture
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Mar 2010 05:26:26 -0000
*** 104 kbyte message ahead. *** Short version: Fred's IRON-RANGER CES architecture has been significantly redesigned in the last few weeks. Here I present an explanation of a subset of this architecture, as I understand it, since I think this subset is novel and interesting. I call this subset IRON-RANGER-lite (I-R-lite) - and I suggest some improvements to this which may be different from what Fred has in mind. I compare the scaling properties of this modified subset of I-R to Ivip, which is a more complex architecture. I think Ivip scales better - because I-R's overly simple architecture does not allow any "cached concentration", or "aggregation" of the querying demands of ITR-like functions - so all these queries need to be handled directly by the authoritative query servers: IR(VP) routers. These improvements simplify the I-R system - by having End User Networks (EUNs) tell VP companies directly which IR(EID) routers their EID should be mapped to. This removes the need for IR(EID) routers to register the EIDs with IR(VP) routers. With a relatively minor addition to this - Map Update messages, which are similar or identical to Map Reply messages - I-R would have a real-time mapping system which would be as fast as Ivip. Real-time mapping distribution to the ITR-like devices - IR(LF/GW) routers - enables the mapping to be a single IR(EID) (AKA "ETR") address. If the system is then designed to allow only a single such address, then IR(LF/GW) devices can be much simpler, since they don't need to test reachability or choose between multiple IR(EID) routers. EUNs would then be responsible (as they are in Ivip) for changing the mapping in real-time to achieve their portability, multihoming, inbound TE or other goals. (This would enable I-R to support TTR Mobility as efficiently as Ivip - and better than LISP - since the MN could drop the tunnel to the old TTR earlier than if non-real-time mapping was used. Note: TTR Mobility only requires mapping changes when the MN moves more than about 1000km.) This "I-R-lite" subset of the current IRON-RANGER design, and its modified version with real-time mapping could be used for dynamic TE on a minute-by-minute basis - irrespective of whether the mapping is simplified to always be a single IR(EID) address. Still, this souped up real-time version of I-R won't scale as well as Ivip. The reason is I-R's simplicity: the ITR-like devices - IR(LF/GW) - work directly with the authoritative query servers - IR(VP). In Ivip, the ITRs do not work directly with the authoritative QSAs - their query burden is "aggregated" by the use of QSR intermediaries. Optional QSCs between the ITRs and QSRs further reduce the burden on QSRs. Another scaling problem of I-R is the "scattergun" approach of tunneling initial traffic packets to all (I assume 3 or so) IR(VP) routers. This is somewhat inefficient, but more importantly sets strict limits on the number of IR(VP) routers there can be for each VP. Ivip has no such inefficiencies or restrictions in the number of authoritative QSA query servers. I guess few people have been closely following the discussions between Fred and I on his proposal, originally "RANGER", but now known as "IRON-RANGER" (I-R). I wouldn't want to be examined on all we have discussed. Fred responded to my critiques with clarifications and design changes and the result is a Core-Edge Separation architecture which I think interesting. I still think Ivip is the best CES architecture (msg06219), but I think the current I-R design, or at least my understanding of the following subset of it, is superior to LISP-ALT or to any other LISP approach I know of. Here is my description of my understanding of what to me are the most interesting aspects of Fred's design. I will refer to as "I-R-lite". I haven't checked this with Fred. Please await Fred's response before drawing any inferences about what he is planning. - Robin http://www.firstpr.com.au/ip/ivip/ I-R terminology =============== Here is a version of the roles which an IRON router can perform, based the discussions which lead to (msg06351). There is a minor update to the IR(VP) description. IR(LF) - Forwarding traffic packets which arrive from its local routing system. Advertises all I-R "edge" space in its local routing system. (All IRON routers are capable of doing this, but it is still a distinct role.) IR(GW) - Like an IR(LF) role except that the IRON router advertises the I-R "edge" space in the DFZ (that is: in the DFZ, to DFZ routers of other ASes but its own). (This is similar to LISP's Proxy Tunnel Routers and Ivip's DITRs, though a DITR normally only advertises a subset of the "edge" space.) IR(VP) - One of typically 3 or so. IRON routers which handles a given Virtual Prefix of I-R "edge" space. This means it: 1 - Accepts registrations of EID prefixes which fall within this VP, from the IRON routers which perform the IR(EID) role for these EID prefixes. 2 - Accepts tunneled traffic packets from IRON routers performing the IR(LF) and IR(GW) functions and then tunnels the packet to the correct IR(EID) role router for the EID prefix which covers the packet's destination address. (This may be the same router, so no tunneling is required - but usually it will be another IRON router.) 3 - After 2, sends mapping for the matching EID to the IR(LF) or IR(GW) role router. If it has no registered EID prefix for a traffic packet it receives, it drops the packet. IR(EID) - Accepts tunneled packets from IR(VP), IR(LF) and IR(GW) role routers and then delivers the packet to the destination network. A multihomed end-user network will have two or more ISPs and there will be an IR(EID) role router for each such ISP. (Whether this router is in the ISP network or in the end-user network is not fixed, but it is always on a conventional "RLOC" - AKA "core" address.) How I-R-lite differs from the full I-R proposal =============================================== "I-R-lite" is described here as a CES system for either IPv4 or IPv6, where each of these protocols is handled independently by a separate I-R-lite system. While I-R as Fred plans it is intended to be a single system linking IPv4 and IPv6, for simplicity and clarity, I-R-lite involves one complete independent system for IPv4 and another for IPv6. (Ivip is the same - two independent systems. However, I don't rule out there being some kind of linkage between the two if the extra complexity is justified by some important inter-working or transition benefits.) In this description I won't go into details of encapsulation or of how to solve the resulting Path MTU problems. I figure they can be solved one way or another. I am assuming something like Fred's SEAL tunneling would be used, where the outer header's source address is that of the Ingress Tunnel Endpoint. This is different from what I plan for Ivip. The I-R-lite subset I describe below does not support (at least in any simple, inexpensive, manner) any ISP BR filtering of packets by their source address. Fred's proposal also apparently involves "recursion". I don't really understand this and I can't see what purpose it might serve in scalable routing, so it is not part of I-R-lite. Fred intends I-R to support Mobility in some way - but as far as I know, this is not along the lines of TTR Mobility (Translating Tunnel Router), which is the only way I know of doing global mobility, for IPv4 and IPv6. Ivip uses TTR Mobility this, and so could LISP. Maybe I-R-lite could use TTR Mobility, but below I ignore Mobility and focus only on providing scalable support for non-mobile end-user networks (EUNs) gaining portability, multihoming and inbound TE. Fred has discussed his intention that each ETE (Egress Tunnel Endpoint) router, such as the second in these pairs: IR(LF/GW) -> IR(VP) IR(VP) -> IR(EID) IR(LF/GW) -> IR(EID) will be able to establish whether the ITE (Ingress Tunnel Endpoint), the first of these pairs, is authorised to handle packets from a given prefix. I understand this as meaning: ITE-A tunnels a traffic packet which has a source address of XXX. ETE-B receives this, and assumes it came from ITE-A. (However, it may not have, since perhaps it has never received a packet from ITE-A, and so this is the first SEAL encapsulation with a new SEAL-ID which can only later be used to verify packets as having come from the same ITE). ETE-B is somehow able to ascertain which prefixes of address space ITE-A is authorised to handle. ETE-B drops the traffic packet if its source address does not match one of these prefixes. I can't imagine how ETE-B could do this - much less how it could do so in a tiny fraction of a second, because any longer would unreasonably delay the traffic packet. Also, if the ITE is an IR(GW) router, it could be handling packets sent from any network in the world - so how could the ETE decide which source addresses were and were not acceptable for traffic packets tunneled by an IRON router performing the IR(GW) role? I-R currently has no business plan for how the operators of IR(GW) routers pay for their operation. If an IR(GW) router advertised all VPs into the DFZ, then it is doing work for all the companies which control VPs. An IR(GW) could also advertise just some VPs in the DFZ. In both cases, there are no current plans for the operators to charge the VP companies. I think this won't work. In Ivip, companies which run DITRs will charge the MABOC (Mapped Address Block Operating Companies) whose MABs these are, and furthermore will provide the MABOCs with itemised traffic figures for individual micronets, so the MABOCs can charge their end-user networks according to the DITR traffic which was addressed to the micronets of each end-user network. I am not promoting my "I-R-lite" idea as a solution to the routing scaling problem. I am suggesting that this subset of Fred's proposal is interesting and worthy of the attention of RRG folks. I also discuss some things which I would do to improve I-R-lite which I thin differ from Fred's plans. Then I discuss how well I think this modified version would scale - comparing it to Ivip I-R-lite ======== I describe I-R-lite using the current terminology Fred and I developed recently for the various roles an IRON router ("IR") could perform - though perhaps Fred doesn't see the need for every such term. Any statement below about "I-R" (Fred's IRON-RANGER architecture) also applies to my "I-R-lite" unless there is a note to the contrary. Traffic packets are tunneled from one IRON router to another. I-R uses Fred's SEAL approach to tunneling and Path MTU discovery, but in this description I will simply assume encapsulated tunneling can be done, without going into details. The goals of I-R-lite include: Scalable support for end-user networks using an "edge" subset of the global unicast address space to achieve portability, multihoming and inbound traffic engineering (TE). Works the same in principle for IPv4 and IPv6. However, IPv6 address allocation would probably be simpler, since all the "edge" space could come from fresh prefixes, while for IPv4, it must come from potentially very numerous prefixes scattered throughout the global unicast range. Complete decentralisation of all crucial aspects of the system. (I-R, as currently designed, does not achieve this because it relies on a single file to be read by all IRs, and for a centralised source of changes to this. So I-R-lite doesn't achieve this either - but I suggest improvements which would do so.) The non-goals include: Mobility - just for this discussion. (In fact, IRON-RANGER and I-R-lite would support TTR Mobility. My modified version of I-R-lite with real-time mapping would support it marginally better, as does Ivip.) Support for ISP BR source address filtering. Real-time control of tunneling. Therefore, I-R-lite, like LISP (and unlike Ivip) involves mapping for a multihomed EUN consisting of multiple IR(EID) router addresses, with the ITE device having to choose which of these to use. This raises many problems, and I don't suggest it is the best way to run a CES system, but the LISP team and Fred seem to be happy with it. (However my modified version of I-R-lite supports real-time control of mapping, like Ivip.) Both LISP and Ivip use the term "ITR" and "ETR" in the same way: ITR - Ingress Tunnel Router. Accepts traffic packets and tunnels them to an ETR. ETR - Egress Tunnel Router. Receives these tunneled packets and delivers them to the destination network. I-R does not use these terms, but the terms ITE and ETE (Ingress / Egress Tunnel Endpoint) means much the same thing. LISP, Ivip and I-R are all capable of handling packets sent by hosts in networks which lack ITRs (ITEs). In order to do this, each architecture has a special subset of these devices which, instead of advertising "edge" space in an ISP or End User Network (EUN), advertise it to neighbouring ASes - they advertise "edge" space in the DFZ. Ivip: DITR (Default ITR in the DFZ). Previously known as OITRDs and before that, erroneously, as "Anycast ITRs in the Core/DFZ". DITRs generally only advertise a subset of the "edge" space. Ivip "edge" space is advertised in the DFZ by these DITRs as separate "Mapped Address Blocks" (MABs), each of which typically covers the space used by many EUNs. An IPv4 /16 MAB would typically provide "edge" space for thousands or tens of thousands of EUNs. LISP: PTR - Proxy Tunnel Router. As far as I know, each PTR advertises all the "edge" space. There is no specific LISP term equivalent to Ivip's "MAB", but such a term is needed. Dino recently used "coarse prefix" to refer to the same thing. I-R: IRON routers performing the IR(GW) role. As with LISP I understand each such router advertises all the "edge" space. The closest thing to a MAB in I-R is a "Virtual Prefix" - "VP". In LISP and I-R, for IPv6, if the "edge" space is always in a currently unused part of the address space, then each PTR or IR(GW) router could probably advertise the whole "edge" space to its neighbouring DFZ routers with a single prefix, or just a few prefixes. In the same circumstance, Ivip's DITRs would not do this, since each one typically only advertises a subset of the total set of MABs. This is because each DITR site is run by a company which typically works for a subset of the MABOCs. This means that for IPv6, in the above scenario, both LISP and I-R could have very large numbers of "coarse" prefixes or VPs and still only add one or a very small number of prefixes to "the DFZ routing table" (shorthand for the set of globally advertised prefixes which every DFZ router needs to handle). Both Ivip and LISP-ALT involve a single tunneling arrangement: from the ITR to the ETR. LISP-ALT optionally allows the ITR to send (in a different form of tunneling) the initial(*) traffic packets to the ETR, via the ALT network. But otherwise, LISP-ALT involves ITRs tunneling traffic packets directly to ETRs, just like Ivip. * "Initial traffic packets" is shorthand for those packets an ITR or ITE device receives before it has mapping for how to properly tunnel such packets to the final destination device - which for Ivip is a single ETR, for LISP is one of several ETRs and for I-R is one of several IRON routers performing the IR(EID) role for the matching prefix. I-R is fundamentally different from Ivip and LISP in this regard. The initial traffic packets are tunneled by an IR(LF/GW) router to an IR(VP) router which tunnels them to the correct (of potentially several) IR(EID) routers. This is the general process, but it is possible that the IRON router playing the IR(VP) role (and there are typically 2, 3 or 4 such routers for any one VP) is also the correct router of the potentially multiple IR(EID) routers for the EID prefix which the packet is addressed to. In this case, there is no need for a second stage of tunneling. Also, if the IR(VP) role router is also acting as an IR(LF) or IR(GW) router, it may accept the traffic packet from its local network or the DFZ with this role, and so have no need to tunnel it to a separate IR(VP) role routers, since this router also happens to be one of the IR(VP) role routers for the VP which covers the traffic packet's destination address. As soon as the IR(VP) router receives such a traffic packet, it regards it as an implicit Map Request, and sends back some "mapping" information to whichever IRON router (performing an IR(LF) or IR(GW) role) tunneled the packet to the IR(VP) router. This was initially described as a "route redirection" message, but I will refer to it as simply a Map Reply message. In the following example, I refer to an IR(LF) router - which advertises all the "edge" space in the routing system of whichever ISP or end-user AS network it resides in. The same example would apply to an IR(GW) router which does the same thing - except that it advertises all "edge" space in the DFZ. In this example, an IR(LF) router advertises all the "edge" space in its local routing system and so receives a traffic packet addressed to 44.44.01.02. The IR(LF) router recognises this as being within the "edge" subset of the global unicast address range, but does not have any mapping for an EID prefix which covers this address. It does this by knowing already every VP in the system, and recognising that this address falls within one of those VPs. In this example, there is a VP 44.44.0.0 /16 - though perhaps this is rather large for VP - maybe Fred intends they be somewhat smaller, such as a /18 or /20, to better share the load on a greater number of IR(VP) routers. The load depends very much on the traffic volumes, not so much on how much address space is within each VP. The EID prefix which matches this destination address is 44.44.01.00 /28 - but the IR(LF) router doesn't have any knowledge of this. If it did, it would have mapping for this EID prefix, and wouldn't need to tunnel the packet to an IR(VP) router. The IR(LF) router tunnels the packet to *all* of the IRON routers which are playing the IR(VP) role for this VP: 44.44.01.0 /28. (How they know the addresses of these IR(VP) routers is described below.) The reasons for tunneling to all these IR(VP) routers is to avoid the need for buffering the packet, while ensuring that at least one copy of the packet will arrive at a working IR(VP) router. (See the most recent discussions with Fred about this. I referred to this as the "scattergun" approach. It raises a few inefficiency - and therefore scaling difficulties - but I think it is a good way of solving the problem of IR(VP) routers being suddenly unreachable or dead.) Each IR(VP) has (ideally) complete knowledge of the mapping for all the EIDs in its VP. (How it gains this is discussed below.) For each traffic packet it receives from an IR(LF) or IR(GW) router, it sends back a Map Reply packet with mapping information for the matching EID. (Whether it should do this for every such packet, or just for the first few, could be debated. Since there could be a flurry of such traffic packets, it is probably good enough to send mapping for just the first 2 or 3.) Below, I assume there are always 3 IR(VP) routers for each VP. Assuming, for simplicity, that there is a single IR(EID) router, the paths taken by an initial traffic packet are: /--->--IR(VP)1-->--\ / \ SH-->--IR(LF)---->--IR(VP)2-->----IR(EID)--->--Destination-EUN. \ / \->--IR(VP)3-->----/ Fig 1. Presumably, there will be a method by which the IR(EID) router can ignore the second and third copies of these packets, and only send the first one to the destination EUN. I don't know how Fred plans to do this, but I will assume something can be added to the SEAL tunneling protocol to make it easy for the IR(EID) to recognise these three packets as all resulting from the one initial traffic packets. This can't be done by simply looking at the traffic packet, since perhaps the sending host sent several identical packets. Probably the way to do this is something like this: The IR(LF) creates a nonce for this particular traffic packet and includes it in the SEAL headers when it tunnels it to each IR(VP) router. This can't be the SEAL ID, since it maintains separate (random start, monotonically incrementing) SEAL ID sequences for each IR(VP) router. Each IR(VP) router includes this nonce in the packet it sends to the IR(EID) router. The IR(EID) router uses the nonce to identify unique traffic packets and so to discard the second and subsequent copies it receives, sending only the first to the destination EUN. This looks OK, but it adds to the length of the SEAL header. See below for a gotcha when there are two or more IR(EIDs). Such nonces are only needed for these initial packets which get tunneled to all 3 or so IR(VP) routers - the nonce is not needed for traffic packets tunneled directly from an IR(LF) router (which in this section of the discussion includes all IR(GW) routers) to an IR(EID) router. Before, while or after each IR(VP) router tunnels the traffic packet (perhaps with the nonce I suggested above) to the IR(EID) router, it also sends a Map Reply packet to the IR(LF) router: /---<--IR(VP)1 / IR(LF)----<--IR(VP)2 \ \-<--IR(VP)3 Fig 2. The map reply packets are secured with the SEAL-ID the IR(LF) router used when tunneling to each IR(VP) router. (I recall that SEAL-IDS are 32 bit integers, starting at some random value by any particular ITE router at the time it first tunnels a packet to any particular ETE router. Then the number is incremented for every packet tunnelled from this ITE to ETE. Off-path attackers are assumed not to be able to guess the starting or the current value. There may be some gotchas in how the routers at each end easily recognise valid and invalid SEAL-IDs without having to cache each one. Fred and I discussed this in the past month or two, and I recall there was probably a way of doing it. However I think it would be tricky or perhaps impossible to implement by using simple counters to frame a range of acceptable values, rather than by caching the use of each value along with a caching time, or time of use - to give the effect of maintaining a timer for each individual SEAL-ID.) All traffic packets this IR(LF) receives before it gets a "map reply" are "initial" packets. As soon as the IR(LF) has mapping in its cache which covers the EID prefix - in this case 44.44.01.0 /28 - then all further traffic packets it receives which match this EID prefix are no longer "initial" and so are not tunneled to all the IR(VP) routers for the matching VP. All these subsequent traffic packets are tunneled to the IR(EID) router: SH-->--IR(LF)---->------------->--IR(EID)--->--Destination-EUN. Fig 3. For a multihomed EUN, there will be two or more IR(EID) routers, so the process could be more complex. Depending on how the priorities and weightings or whatever are defined, and the individual perspective of each of the three IR(VP) routers, it is possible that one or more IR(VP) routers will choose to tunnel the packet to a different router IR(EID)1 than one or more of the other IR(VP) routers, which tunnel their copies of the traffic packet to IR(EID)2. Then instead of the initial packet flow of Fig 1, we could get: /--->--IR(VP)1-->--\ / \ SH-->--IR(LF)---->--IR(VP)2-->----IR(EID)1--->--Destination-EUN. \ / \->--IR(VP)3-->----IR(EID)2-->--/ Fig 4. This would result in a duplicate copy of the traffic packet arriving at the destination EUN. A assume this cannot be tolerated, so I guess Fred will devise a way of preventing this. Note: below I suggest I-R adopting Ivip's approach of the mapping containing just a single address, that of one IR(EID) router. If this is adopted, then this duplicate packet problem can't occur. If both IR(EID)1 and IR(EID)2) were at the destination EUN, then there would be a simple solution, since they could compare notes about nonces and so determine which was the first arrived copy of the traffic packets and which were duplicates which could be dropped. For instance, the two functions of IR(EID)1 and IR(EID)2 could be performed by the one router. Each would have a different "core" address of course, IR(EID)1 from PA space from one ISP and IR(EID)2 from PA space from another ISP. However, if IR(EID)1 and IR(EID)2 are not in the same place or device - such as if they were in separate ISP networks - then I think the problem of duplicate packet delivery to the destination EUN is a serious "gotcha". I think it could only be solved by having a special router function at the EUN to distinguish between duplicate packets, and by the IR(EID) routers tunneling packets to it, with the nonce I suggest above which would be generated for each initial traffic packet by the IR(LF) router. The IR(VP) routers somewhat resemble a "Default Mapper" (DM) in APT. However, in APT, the DM was within the same (ISP, typically) network as the ITR, and I recall the ITR simply forwarded the traffic packet to one of the potentially multiple DMs in its network. I also recall that the DM interpreted the mapping information, chose a particular ETR to tunnel the packet to, and then in the Map Reply message to the ITR, told it simply to tunnel packets to this ETR if they matched the the EID prefix which was also in the Map Reply message. I-R's forwarding of initial packets also resembles the use of LISP-ALT where traffic packets are sent on the ALT network. I think the ALT network is a bad idea, since the paths can easily cris-cross the globe and because there are fundamental difficulties making it scale well, while maintaining generally short paths while also avoiding single points of failure. In LISP-ALT, there isn't an exact equivalent to an IR(VP) router or an APT DM - since the ALT network would deliver the packet to one of the correct ETRs. I-R involves tunneling initial packets directly from the IR(LF) router to the IR(VP) router (actually to all the 3 or so IR(VP) routers) and then directly from the IR(VP) router(s) to the IR(EID) router(s). This tunneling is via the Internet - there are two such tunnels, or really three parallel parallel paths, each with two tunnels in series, as shown in Fig 1 and Fig 4. LISP-ALT's delivery of initial packets is by forwarding over an overlay network composed of Internet tunnels between ALT routers. The traffic packet has to be forwarded typically by multiple ALT routers, and this involves multiple Internet tunnels, since there is a tunnel between each router. There could be half a dozen or a dozen such tunnels - maybe more, it depends on how the ALT system is structured, which has never been described in detail. There needs to be a careful choice about how many IRON routers perform the IR(VP) role for each VP. 1: Too few. Single point of failure. 2: Better, but still pretty highly strung with dependence on only two VP routers being alive and reachable - and lots of EIDs prefixes depend on this. 3: Better still - maybe a good choice. 4: Maybe a good choice. 5: Probably an excessive number. I suggest 3 or 4 as good choices. More than this and there are efficiency concerns, since each initial packet needs to go to every IR(VP) router. Also, for a choice of 4, assuming all the IR(VP) routers are up and reachable, this quadruples the workload of the IR(EID) routers in handling initial packets and the nonces I suggest above. Furthermore, since there are likely to be two or perhaps more IR(EID) routers, and since different IR(VP) routers might choose a different IR(EID) router, there would be more trouble with duplicate initial packets going over the final links to the destination network. Choosing a higher number of IR(VP) routers has one advantage in addition to increasing robustness. It will also tend to mean that the shortest path from the IR(LF) router to the IR(EID) router(s) via any of the IR(VP) routers will tend to be shorter - since the more IR(VP) routers there are (presumably scattered widely around the Net), the greater the chance that one will result in a short total path from the IR(LF) to it, and then to the IR(EID). Below, I will assume there are exactly 3 IR(VP) routers for each VP. I also assume that these are geographically and topologically located in a diverse manner, so an outage which makes one unreachable is unlikely to affect the other two. I will also assume that all the EUNs which use I-R "edge" space are multihomed - typically with two ISPs and so with two IR(EID) routers, but occasionally with 3 or 4. This "I-R-lite" subset of IRON-RANGER is a novel and interesting arrangement. Every IRON router can perform the IR(LF) role, and some of them are configured to do the same to the DFZ, and so be perform the IR(GW) role. Neither of these roles involves the router initially knowing any mapping at all. All they need to know is: 1 - A complete list of VPs. 2 - For each VP, 3 (in my example) IRON routers which are performing the IR(VP) role for this VP. Then, after they tunnel traffic packets to all the IF(VP) routers for a given VP, they get back a Map Reply which covers a specific EID prefix - and don't need to bug the IR(VP) routers again about traffic packets whose destination addresses match this EID, for a period specified by the caching time in the Map Reply message. Fred suggested that all IRON routers discover the above by downloading a single file which contains all this information - when they boot - and then doing regular checks, which I call "delta checks" with some network of centrally coordinated servers to find out how the file changes over time. For a full-scale deployment with tens of thousands of VPs, perhaps 100k of them - the size of this file is going to be a few megabytes, and the rate of change is going to be quite slow. The file would contain either IP addresses or FQDNs of the IRON routers playing the IR(VP) role. I think this is practical, but there objections to the centralised nature of this one file, which I discuss below. For now, I assume a single file and that each IRON router would do a delta check, on its own schedule unsynchronised with other IRON routers, every 10 minutes. This is my choice for an example. Exactly what the time would be, or should be, hasn't yet been discussed. Let's say that at 3:00 UTC a particular IRON router 22.22.88.88 is configured to be one of the 3 IR(VP) role routers the VP previously discussed: 44.44.0.0 /16. By 3:10 UTC, in theory, we can assume that every IRON router knows this. This information is used for two purposes: 1 - As described above, for an IR(LF) or IR(GW) router to decide which 3 (in my example) IRON routers to tunnel an initial packet to, based on the traffic packet's destination address matching a given VP. 2 - As not so far mentioned, how the 2 (or perhaps 3 or 4) IR(EID) routers for a given EID in this VP register themselves with each of the three IR(VP) routers for this VP. Registration will take some time. Fred roughly described a way of doing it, involving a message with some crypto stuff (such as using signatures an PKI I think) so the IR(VP) router could tell from the message sent by the IR(EID) router that it really was authorised as performing this role for a given EID. Below I suggest some improvements to this, but for now I an discussing the I-R-lite subset of what I understand to be Fred's I-R design. Let's say it takes each IR(LF) router up to two minutes to register itself with the newly established IR(VP) router. Now, for Justin (Just In Case) we add a 3 minute fudge factor and decree that within 15 minutes of an IR(VP) role for a router being made available on the master list (and via delta checks) that any self-respecting IR(EID) router for an EID within this VP can be assumed to have completed its registration with the new IR(VP) role router. This enables a simple solution to the problem of IR(LF) and IR(GW) routers tunneling packets to newborn IR(VP) routers before they have had time to get registrations for all their EIDs: IR(LF) and IR(GW) routers will wait 15 minutes after the appearance of a new IR(VP) router before tunneling any traffic packets to it. However, since it could take 10 minutes for any one IR(LF) or IR(GW) router to discover that the new IR(VP) router has been established, then this blows out the maximum time before the IR(VP) router is doing its full workload to 25 minutes. This can be fixed by all IRON routers knowing the current UTC and by the master file and its delta updates including the UTC time each IR(VP) role for a given VP was established. Then, we would have a 15 minute time before each IR(VP) role was fully operational. (Fred did not want an IR(VP) role router to tunnel a traffic packet to any other IR(VP) router just because it lacked a registration for an EID which matches its destination address. I think this dropping of such packets is a good idea.) One thing I haven't explored with Fred is the creation of the "mapping" information, and how the 3 or so IR(VP) routers might work together on this. I understand that each IR(VP) role router would receive an EID registration from each of the typically 2 or more IR(EID) routers for a given EID. Somehow, the IR(VP) router (perhaps after comparing notes with the other 2 or so IR(VP) role routers) must develop "mapping" which can be sent to the IR(LF) and IR(GW) routers. I understand from previous discussions that the mapping will resemble that of LISP: The EID: start address and the length of the prefix in bits. The addresses of the typically 2 or perhaps more IR(EID) role routers for this EID. Information on preferences and weightings to control load sharing between them, including no load sharing: sending all packets to IR(EID)1 and not IR(EID)2, unless IR(EID)1 is dead, unreachable or unable to get packets to the destination network. (IRON-RANGER seems to have the same problems as LISP in the ITR / IR(LF) / IR(GW) routers having no direct way of testing the reachability of the destination network through each ETR / IR(EID) router. Ivip has no such problem, since there is a single ETR address in the mapping, and the end-user network is responsible for changing the mapping in real-time. The end- user network, or some company they appoint to do so, can easily test reachability of the network through each ETR, since it knows one or more hosts or routers on the network it can try to get a response from via each ETR. LISP ITRs, APT ITRs/DMs and I-R IR(LF/GW) routers are given no information about a host or router in the destination network by which they could test actual reachability of the network through each ETR / IR(EID) router.) It is not clear how the IR(EID) role routers give this preference or weightings or whatever to each of the IR(VP) routers they register with - or how the IR(VP) routers individually or collectively decide on the mapping for this EID. Somehow, each IR(VP) router must have already computed mapping for each registered EID in its VP. It needs this when it tunnels a traffic packet addressed to this EID - and it needs it to create the Map Reply it sends to IR(LF/GW) routers. In recent discussions, there was a difficult problem when the IR(LF) router tunneled a packet to just one IR(GW) role router. What if the IR(GW) role router was dead, being rebooted, or unreachable? The traffic packet would be lost, and there is no reliable way of finding out in a fraction of a second that this had occurred. Fred's "scattergun" approach solves this nicely - the IR(LF) router tunnels the traffic packet to all IR(VP) role routers (according to the 15 minute start-up time arrangements just described). So as long as at least one of these IR(VP) routers is reachable and alive, the traffic packet won't be lost. The first Map Reply from any of these tells the IR(LF) router not to tunnel packets to the IR(VP) routers any more - but to use the mapping to tunnel them directly to the correct IR(EID) router out of 2 (or perhaps 3 or 4) specified in the mapping. (In the above discussion, for brevity, I used "IR(LF)", but all the above also applies to routers playing the IR(GW) role too.) Fred also suggested that the central file should list every IRON router. I think this is unnecessary since in the functionality I described above for I-R, all of which is part of my I-R-lite subset, there is no need for IRON routers to know about all other IRON routers. They only need to know about the subset which is performing the IR(VP) role for one or more VPs. He mentioned there would only be 100k or so IRON routers, and that these would be larger routers in ISPs, or similar. Fred's ~100k estimate is subject to the critique that it forces all this ITE work to be done by a relatively small number of routers, when it would be desirable to have the option of spreading it out over more numerous, lower capacity, less expensive devices which are generally closer to sending hosts. Below, in the improvements section I envisage many more IRON routers - primarily to perform the IR(LF) role. This single file arrangement, with a single source of information for the delta checks (even though the delivery of the file and the delta checks themselves might be farmed out in a distributed manner) is subject to the critique of being overly centralised, and so having a single point of failure. Below, in the improvements section I suggest some alternative arrangements which would not be subject to these critiques. I can't remember what Fred's design involves in terms of "Map Updates". An IR(VP) router may at some stage become aware that a given IR(EID) registration has lapsed or been cancelled. (Fred hasn't described a cancellation arrangement, but I assume this could be added securely as an extension of the registration arrangement, so an IR(EID) role router could de-register itself.) Does the IR(VP) role router maintain in its cache a record of every map reply it sent, in a form by which it can find those sent in the current caching time which concerned an EID prefix whose mapping has just changed? If so, can it then send a Map Update message, with the new mapping for this EID, in a similar fashion to the Map Reply message it sent to each such IR(LF/GW) router which tunnels an initial packet to the IR(VP) router. With these caveats, the subset of I-R I have presented here - my "I-R-lite" subset - has some interesting properties: 1 - All IRON routers can perform - and generally do perform - the IR(LF) or perhaps IR(GW) roles. To do this, they need only to cache the mapping information sent by IR(VP) routers. 2 - There is no central repository of mapping. (Only a central repository of which IRON routers are performing IR(VP) roles for which VPs.) 3 - Initial packets get delivered reliably and pretty quickly. With three or so IR(VP) routers, this means at worst-case the packet will traverse the Earth and come back again. For instance, if the SH and IR(LF/GW) router is in South Africa and the closest operational IR(VP) router is in Vancouver, and the destination network and its IR(EID) router is in Italy, then the initial packets need to go from South Africa, to North America, and then back to Italy. The typical initial packet path for I-R will probably be longer than the typical outcome for Ivip, but I think it will typically be better than using LISP-ALT to deliver initial packets. It will certainly be superior to the current LISP-ALT arrangement of ITRs dropping all initial packets until a new packet arrives after the map reply arrives. Improving I-R-lite ================== Splitting up the VP file ------------------------ The first obvious improvement is to decentralise the master VP file. Having a single file or a single organisation controlling files in multiple locations, represents a single-point of failure for the entire system. It would be better if a failure by one organisation only affected a subset of the IR "edge" space - the subset which that organization is paid to be responsible for. I-R "edge" space is all within VPs. Presumably each VP has a single organization which is responsible for it. That VP organization is the only one which would want to run this VP's 3 or so IR(VP) routers - and the only organization which should be allowed to control them. It seems likely that one organization will be responsible for more than one VP, at least in IPv4 where various small (long prefix) VPs might be carved out the address range wherever possible. In IPv6, there's almost endless space, so all the VPs would be on fresh unused space, and have vast address space within them - so each organization could have a single VP with huge capacity. However, if there is a lot of traffic to the VP, as there would be, this would overload a single set of IR(VP) servers - so even with IPv6, VPs will need to be kept small enough that the traffic for each VP is within the scaling limits of the set o3 3 or so IR(VP) servers. One way or another, the VP organizations could decide which of some smaller number of "VP file" companies they wanted their VP to be handled by. Then, for load sharing and splitting up the whole system into smaller chunks, each IRON router would download a file from each such "VP file" company, and likewise do the 10 minute or whatever interval delta checks with each such company. However, if there were more than a few dozen such "VP file" companies, this means each IRON router has to do quite a lot more delta checks to the servers of each such company. The servers for the VP file companies could be discovered by DNS in some way. There's no free lunch here - but the current design of a single file for all VPs is not the only way the same basic architecture could be implemented. Having some kind of flexibility and multiple smaller files seems like a good idea. Alternatives to the VP file --------------------------- It would also be possible for IRON routers to discover all the VPs and which 3 or so IRON routers perform the IR(VP) role for each VP by a method similar to what I propose in Distributed Real Time Mapping. Later, this will be fully described in: http://tools.ietf.org/html/draft-whittle-ivip-drtm At present (version 01) this ID doesn't yet include this, so please refer to the section "Stage 2 needs a DNS-based system so TRs (QSRs) can find DITR-Site-QSDs (QSAs)" of: http://www.ietf.org/mail-archive/web/rrg/current/msg06128.html This would involve each IRON router walking a special part of the DNS which describes VPs and non-VP areas, where the DNS replies for the VPs also contain by some means the IP addresses of the IR(VP) routers. Instead of "delta checks", there would be DNS cache time-outs for the information which was discovered in this way, so there would be a continual series of DNS requests for these items from all IRON routers. This has some scaling problems, because there are larger numbers of IRON routers than there would be QSRs in Ivip, but it is probably a valid way of replacing the file-download with delta check approach with something more decentralised. If multiple IRON routers shared a single caching DNS resolvers, then the DNS resolver would cache a lot of this, so the authoritative nameservers for these items wouldn't have to handle all the queries. The delta-check to one or a few dozen servers (or more local servers with the same information) would probably be more efficient than the DNS approach. It depends a lot on how many VPs there are, and how rapidly all the IRON routers need to know about IRON routers taking on new IR(VP) roles. "Aggregating" the VP discovery process -------------------------------------- I-R has no intermediate buffer layer between the IRON routers and the source of this VP information. For I-R to have as many IR(LF) routers as Ivip has ITRs, this would place a much greater load on I-R's VP discovery system (whether with a file and delta checks or via DNS as suggested above). This doesn't happen in Ivip, because the ITRs don't need to know very much at all. All they need to know is how to query one or more local (such as in the same ISP network) QSR Resolving query servers, perhaps via local caching QSC query servers. The QSRs need to know about all the MABs, which is the rough Ivip equivalent of I-R's VP. I-R could be enhanced with some local or nearby server which enables multiple IRON routers to more efficiently discover the VP file(s) and changes to it. This is a form of aggregation, to make it unnecessary for a bunch of IRs in one area to all, individually, get files from and send delta checks to some far distant server. With the DNS model, this can be easily achieved by multiple IRON routers sharing a nearby caching resolving DNS server. Another approach might be some IRON routers passing on the changes they discovered to neighbours, but this needs to be done securely. Both these approaches would reduce long-distance communications, but would do little or nothing to reduce the workload of each IRON router in keeping itself up-to-date with the VPs. Probably the most efficient arrangement would be for multiple IRON routers to connect to a nearby "resolving and notifying" server of some kind, which took responsibility for directly sending all changes to the VP information to each IRON router. Then, each IRON router doesn't need to ask, or poll (delta check) for changes. It just sits there and is securely told whenever there is a change. The resolving server would take responsibility for discovering all the VPs - by the multiple files approach, or the DNS approach - and would be devoted to either polling these (delta check) or in some other way getting all the changes on a timely basis. Then it would send out the changes to all its client IRON routers. Each IRON router would probably use two or perhaps three of these for robustness, which would multiply the total workload and the workload of each IRON router. Note that this arrangement would be somewhat analogous to how multiple (dozens to hundreds) of Ivip ITRs rely on two or three local QSR servers. However, with I-R, each IRON router is required to know all the VPs and for each VP the 3 or so IR(VP) routers - which is a more onerous task than that of the Ivip ITRs. Ivip ITRs only need to know the full subset of the global unicast address space which is "edge" space. They don't need to know anything about individual MABs. In IPv6, a whole short prefix, such as a /8 or /16, could contain all the edge space, so this becomes trivially simple. For IPv4, there will be MABs of various sizes scattered all about. Even then, the ITRs don't need to know specific MABs, so if four are adjacent and can be covered by a single shorter prefix, then the shorter prefix of "edge" space is all each ITR needs to know. Ivip ITRs don't need to know about where one MAB ends and another starts, since they always send their mapping queries to an upstream QSC or QSR query server which is in the same network, or in an upstream ISP's network. The QSRs need to know about every MAB, and the addresses of 2 or so typically nearby authoritative QSA servers for each MAB. I think this last approach - multiple IRON routers sharing a single "resolving and notifying" server of some kind - is the one which holds most promise for both reducing the workload of each IRON router and for reducing the burden a larger number of IRON routers would place on whatever centralised or somewhat centralised systems specify the VPs and which routers are performing the IR(VP) role for each VP. More, cheaper, devices for the IR(LF) role? ------------------------------------------- Fred envisages about 100k total IRON routers, most of which would be performing the IR(LF) role and many of which would also be performing the IR(EID) roles and I guess quite a few of which will be performing IR(VP) roles for one or more VPs. In order to reduce costs and spread the load better, it would be desirable to have many more devices playing the IR(VP) role. For instance, if it could be done in software by cheap servers, as well as in large, centrally located, hardware-based routers, then this could be a lot more cost-effective and easier to introduce. However, to allow for ten times more IR(VP) role devices, there needs to be a pretty lightweight method by which they discover all the VPs and their IR(VP) routers. As noted above, some kind of nearby "resolving and notifying" server looks most promising. If the global impact of each IRON router performing only the IR(LF) role could be really minimised, and if the workload for each such device could also be minimised, then the IR(LF) role could be performed in an sending host which is not in itself on an I-R "edge" address. IPv6 allows the provision of vast amounts of address space in each VP, so few are needed to get sufficient address space. Still, it seems that in order to split up the IR(VP) workload effectively, there needs to be a large number of VPs for both IPv4 and IPv6. Ivip ITRs only need to know what parts of the global unicast address range are "edge". This is done by them learning from their local QSR the details of every MAB (Mapped Address Block), each of which is a DFZ-advertised prefix. They don't need to know anything more about the MAB. Any traffic packet addressed to "edge" space will be handled by the ITR function. I-R IR(LF) functions need not only to know the VPs (the closest I-R equivalent to Ivip's MABs) but to know 3 or so IR(VP) routers for each VP. Unlike Ivip ITRs, they don't need to buffer traffic packets when awaiting mapping - they simply tunnel the traffic packet to the 3 or so IR(VP) routers, which is a non-trivial amplification of the basic workload of tunneling each traffic packet once. IR(LF) routers don't need to re-request mapping, as an Ivip ITR might if it gets no response after a while, because each of the traffic packets tunneled to IR(VP) routers is a map request. I think the biggest scaling bottleneck in I-R is the reliance on a small number of IR(VP) routers to handle initial packets. The "scattergun" approach means that it is a bad idea to have more than about 3 or 4 such IR(VP) routers per VP. Due to each such IR(VP) router for a given VP getting every initial packet handled by any IR(LF/GW) router, having more IR(VP) routers per VP doesn't achieve load sharing - it just creates more work for the IR(LF/GW) routers, and for the IR(EID) routers which have to handle 3 or more traffic packets, one from each IR(VP) router. This means that VPs need to be made small enough not to have so much traffic as to overload the IR(VP) routers. A MAB is advertised by a set of DITRs around the world, and it is up to the MABOC (MAB Operating Company) how many DITRs it runs, and where, to handle that part of the traffic sent to the MAB's micronet addresses which comes from hosts in networks without ITRs. The more DITR sites which handle this MAB, the less work the DITR at each site needs to do. I think this means that I-R's VPs need to be more numerous than Ivip's MABs. The next bottleneck is that each IR(LF/GW) router needs to know about all this larger number of VPs, and for each VP, needs to know 3 or so IR(VP) routers. Ivip ITRs need to know only what is "edge" or not ("core"). They don't need to know exactly where MABs begin and end. When expressing this "edge" space as prefixes, the number of prefixes each Ivip ITR needs to know could be reduced in some circumstances when neighbouring MABs could be covered by a single shorter prefix. For instance, if there are two MABs: 5.5.0.0 /17 5.5.128.0 /17 the ITR only needs to know that 5.5.0.0 /16 is "edge" space. ITRs will discover this from their upstream QSCs or QSRs by a simple mechanism I haven't yet designed. (For IPv4, it might be easier for each ITR to learn about what is "edge" space from its local QSR by downloading a single bit-map, with 2^24 bits (actually, a little less: 224 x 256 x 256 = 14.6Mbits ~= 1.83 Mbytes), one for each /24, to flag whether each /24 in the global unicast address range is "edge" or not.) A further bottleneck with I-R is that the ITRs have a greater need of changed information about VPs than Ivip ITRs have about which parts of the address range are "edge". This is because with I-R, even if VPs remain reasonably stable, as would Ivip MABs, there would be a degree of instability in which 3 or so IRON routers are performing the IR(VP) role for each VP. An Ivip DITR doesn't need to know about all the MABs in the Ivip system or what all the "edge" space is. It only needs to know which MABs it is handling. Compared to Ivip's MABs (or a simpler expression of what is "edge" space or not), IR is likely to involve a greater number of VPs which every IR(LF/GW) router needs to know about. The IR(LF/GW) router needs to know more about each VP, and it needs to get more updates about this information. This makes it more difficult to introduce larger numbers of IR(LF/GW) routers into I-R than it is to introduce larger numbers of ITRs into Ivip. This has the effect, in I-R, of concentrating work into a smaller number of IR(LF/GW) routers than the number of ITRs (and DITRs) which would handle the work in Ivip. To avoid congestion, each of this fewer number of IR(LF/GW) routers needs to be more powerful, with greater bandwidth. Also, the "scattergun" approach of I-R increases to some extent the upstream bandwidth required compared to an Ivip ITR handling the same traffic. Due to Ivip's real-time mapping system, Ivip ITRs are simple compared to the ITRs of LISP or the IR(LF/GW) routers of I-R. The lack of real-time mapping in LISP and I-R requires more complex ITRs, to cache more complex mapping, and to do some kind of reachability testing in order to choose for themselves which ITR (LISP) or IR(EID) router to tunnel the packets to. Also, Ivip ITRs can be on "edge" space, whereas LISP ITRs and IR(LF) routers need to be on "core" addresses. All these factors make it a lot easier with Ivip to have large numbers of ITRs, closer to sending hosts than with LISP or I-R. Ivip ITRs can be in in sending hosts (not behind NAT), which cannot be attempted with LISP and I-R as currently described. The use of large numbers of server-based ITRs in Ivip spreads the load and enables this work to be done with less expense than by concentrating it into a smaller number of expensive, dedicated, routers - as is the current plan with I-R and LISP. But even if this goal is achieved - more IR(LF) routers closer to hosts, each handling less traffic and so being potentially cheaper and/or less congested - this means that there will be more work for the IR(VP) routers than with a fewer number of IR(LF) routers. For instance, if an ISP only had 3 IR(LF) routers, on average more traffic packets would be covered by currently cached mapping than would be the case for any one of a more numerous arrangement of IR(LF) routers. If there were 20 of these routers, in each one, it is less likely that any one IR(LF) router would already have the mapping. Due to the "unaggregated" exposure of the IR(VP) routers to every single IR(LF/GW) router, I-R involves more map query and reply activity, all other things being equal, than Ivip or perhaps LISP. (Also, the Map Request in I-R consists of tunneling a potentially bulky traffic packet to each of the 3 or so IR(VP) routers, and continuing to do so until a map reply packet comes back from one of them.) LISP-ALT on its original form doesn't provide any such "aggregation", since each ITR gets its mapping from the authoritative query server (usually an ETR) via the ALT network. However, LISP-ALT with Map Resolvers would have such aggregation IF the Map Resolvers cached map reply information they got from the authoritative query servers. I am not sure whether LISP-ALT's Map Resolvers are caching or not. With Ivip (Distributed Real Time Mapping) there is a great deal of this "aggregation", since dozens or hundreds of ITRs may use one or a few QSRs, which are caching query servers. (QSRs in turn will handle less queries if there are caching QSCs between the ITRs and these QSRs, but this doesn't alter the degree to which the QSRs reduce the mapping query load on the authoritative QSA query servers.) Map Updates, resulting in real-time mapping distribution -------------------------------------------------------- IR(VP) routers - individually or perhaps working with the other IR(VP) routers for the one VP - as part of making up their mapping information to be ready to send to IR(LF/GW) routers, will at various times change the mapping for a given EID prefix. This would happen when, for instance: 1 - The EID prefix is defined by one or more LF(EID) routers registering it. 2 - The EID prefix becomes undefined if no LF(EID) routers register it - including by not re-registering it at the rate required by the IR(VP) router. 3 - One or more new IR(EID) routers register an existing EID prefix - so adding their addresses to the mapping for that EID prefix. 4 - The registration of one or more IR(EID) routers expires due to it not reregistering in time. 5 - The registration is cancelled. (I am not sure if Fred intends an IR(EID) router to be able to cancel it, but I guess it could be part of the registration protocol.) 6 - Some other reasons, such as changes to the nature of EID registrations which affect the currently undescribed methods by which the "preference and weighting" part of the mapping is generated. When the mapping changes, it really needs to change in an identical fashion for all the 3 or so IR(VP) routers - by some inter-router protocol which is yet to be described and would be closely related to the yet-to-be described EID registration protocol. When the mapping changes, it would be possible for each IR(VP) router to send out a "Map Update" message, similar in form to the Map Reply message, and likewise secured by the SEAL ID of the tunneled traffic packet which gave rise to the original Map Reply. I don't recall if this if part of Fred's design, but I think it would be possible and desirable. Then, if the IR(VP) router kept a record of which IR(LF/GW) routers it had sent Map Replies to, concerning the prefix whose mapping has just changed, then it could send out Map Update messages to those routers. There is some inefficiency in this, due to 3 or so IR(LF) routers sending out their own, identical content (different SEAL ID), Map Update to each IR(LF/GW) router. But perhaps that means that this "reverse scattergun" effect can be used to advantage - to assume that there is no need for the IR(VP) router to expect a handshake, acknowledgement or whatever from the IR(LF/GW) router, because generally at least one of the Map Updates will arrive. (There's no need for IR(LF/GW) routers to acknowledge the receipt of a Map Reply message. If all 3 or so Map Reply messages are lost, then the IR(LF/GW) will continue to tunnel traffic packets to the 3 IR(VP) routers, and presumably these will be sending back more Map Replies for these, so the process will be self-limiting without need for ACKs.) These proposed "Map Update" messages will cause IR(LF/GW) tunneling to be changed more rapidly to a better arrangement than the only alternative: not sending the Map Update and waiting for the cached mapping in the IR(LF/GW) routers to expire - at which time the routers will request fresh mapping, by sending traffic packets to all 3 or so IR(VP) routers again until at the first Map Reply response arrives. These proposed Map Update messages involve extra state in, and effort by, IR(VP) routers. However, it would enable several benefits, including: 1 - IR(LF/GW) routers could cache their mapping for a much longer time, since the cache time no longer controls the ability of the system to get fresh mapping to the IR(LF/GW) routers. 2 - This extended cache time reduces the frequency with which IR(LF/GW) routers have their cached mapping time out, and so reduces their effort in tunneling traffic packets to all 3 or so IR(LF/GW) routers. 3 - This in turn will significantly reduce the workload of the IR(VP) routers. 4 - Point 4 means that VPs can be bigger and less numerous. 5 - Point 5 means that there will be less VPs in the total list, enabling there to be less work done by each IR(LF/GW) router, and therefore makes these cheaper and more able to be installed in greater numbers, closer to hosts. IF I-R was upgraded to do Map Updates from IR(VP) routers to IR(LF/GW) routers, then this would make its mapping distribution a genuinely real-time arrangement, like Ivip's. If so, then I-R could adopt Ivip's very simple mapping information - a single ITR address for Ivip, or a single IR(EID) address for I-R. This would simplify IR(LF/GW) routers considerably. This would also externalise the detection of reachability and the decisions about how to restore connectivity after a multihoming service failure, making it the responsibility of end-user networks or of whoever they appoint to do this. I regard this as one of the major benefits of Ivip over LISP and the current I-R design. A different approach to "registering EID prefixes" -------------------------------------------------- The current I-R design appears to be based on the same assumptions as LISP - that the ETR (LISP) or IR(EID) (I-R) function is responsible for the mapping of an EID it handles. Therefore, in I-R, IR(EID) routers are assumed to be configured securely, and given some authentication items - and then it is the job of the IR(EID) router to register itself with the 3 or so IR(VP) routers for the VP its EID prefix is within. I have never thought of Ivip this way - and I think there's no reason why I-R needs to work this way either. The 3 or so IR(VP) routers which handle a given VP will definitely be run by, or run for and controlled by, whatever single organization is responsible for this VP - the "VP Company". The rough equivalent in Ivip is the MABOC (MAB Operating Company). These IR(VP) routers are not going to accept registrations without each registration passing stringent security arrangements - hence the need to give the 2 or so IR(EID) routers the correct authentication items which the IR(VP) routers will accept. This could be a signed message to the effect that this particular IRON router (specified by its IP address) is authorised to perform the IR(EID) role for a specified EID prefix. Then the IR(VP) routers will need to use some arrangements (PKI?) to verify these signatures, and to cache this verification in some way so they don't need to do this for every re-registration - and to occasionally check to see the public key used for the signature has not been revoked by the relevant PKI CA (Certification Authority). So the current design is something like this, for each EID prefix, assuming there are two ISPs and so two IR(EID) routers: 1 - The multihomed EUN (end-user network) chooses which 2 IRON routers will perform the IR(EID) role for its EID prefix. This is basically a matter of receiving an IP address for an IR(VP) router from each of from its ISPs. 2 - The EUN creates two messages, one for each of the IP addresses, attesting that its EID prefix is to be registered by an IRON router with the given IP address. One approach to this would be the EUN to sign these messages with its own key pair. To do this, it would have to use a key pair which is covered by a PKI system which is recognised by the IR(VP) routers for the VP its EID prefix is within. This clearly involves the company which controls the VP, which is probably the company it leases this EID space from. Another approach is to obtain these signed messages by having them signed by the VP company's key-pair. But to do so, the EUN will need to authenticate itself with that VP company. So either way, the EUN needs to authenticate itself to the VP company in some way, and provide the IP addresses of the IR(EID) role routers. 3 - These signed messages are now in the possession of the EUN, who passes one to one ISP and the other to the other ISP. This may involve the EUN authenticating itself to the ISP, but it already has a business relationship with the ISP - so this is probably trivial. 4 - Each ISP loads its message into the IRON router which is performing an IR(EID) role for this VP. Now this IRON router will be able to register itself with the 3 or so IR(VP) routers, and keep doing so for as long as is required. This raises a tricky question if the EUN stops using one or both ISPs. How can it prevent these messages still being sent by these IRON routers? It should be able to prevail on the ISPs to take these signed messages out of their routers, but let's say the ISP has gone feral and won't respond to reasonable requests. This would be an intolerable situation, black-holing traffic packets. So there needs to be some additional mechanism by which the EUN can contact the VP company and have the 3 IR(VP) routers ignore the registration message from the errant router. I think this is all a lot of work for no good purpose. A much better idea is for the EUN to get IP addresses of the routers from its ISPs, and securely communicate these to the VP company. Then the VP company programs that into the 3 IR(VP) routers it controls. That's it - there's no need for the two IR(EID) routers to register anything with the 3 IR(VP) routers. But now let us consider two scenarios - the original, non-real-time mapping arrangement and my proposed real-time approach. In the original non-real-time approach, the absence of a registration from a previously registered IR(EID) router would signify to each IR(VP) router that that IRON router should no longer be included in the mapping for the given prefix. Depending on the caching time specified in the Map Reply messages sent from IR(VP) routers to IR(LF/GW) routers, it would take some time for all IR(LF/GW) routers to gain updated mapping which did not mention this IRON router as one of the IR(EID) routers for this EID prefix. But if, as just suggested, the IR(VP) routers don't figure out the mapping of an EID prefix based on repeated registrations from IR(EID) routers, because they are simply configured with the IP addresses of supposed IR(EID) routers by the VP company, then how do the IR(VP) routers figure out the mapping and how would they change this mapping to remove a particular IR(EID) router if it became "unreachable"? I think the IR(VP) router has no business deciding what the mapping should be once the above suggestion is implemented. In the non-real-time arrangement (the original I-R design) the ITR-like IR(LF/GW) routers are supposed to figure out which IR(EID) router are reachable or not, and which ones can be used to reach the destination network as part of their built-in independent per IR(LF/GW) router, approach to multihoming service restoration. So with the original non-real-time mapping, and the new, direct (no registration) arrangements where the VP company simply configures its 3 VP routers with the IP addresses of the IR(LF/GW) routers, then I think there is nothing more to do. Just let the IR(LF/GW) routers do their reachability testing however Fred intends this to be done. With the real-time-mapping arrangements (Map Update), if the EUN chooses a new ISP, or the ISP tells them they have a different IR(EID) router, then the EUN must simply communicate the new IP address securely to the VP company, who will pop it straight into their 3 IR(VP) routers. This removes the need for the just suggested extra mechanism by which the VP company could prevent its 3 IR(VP) routers from accepting a registration from the errant IRON router in the feral ISP. In the real-time approach (Map Updates), combined with the just-suggested (no registration) direct "VP company controls the mapping directly via its IR(VP) routers" approach, then the moment the VP company changes the mapping in its 3 IR(VP) routers, each one will send a Map Update message to whichever IR(LF/GW) routers to which it has, within the caching time, sent the mapping for this EID. So this gives the VP company direct real-time control of the mapping in all the ITRs which need it. Since the EUN directly tells the VP Company which IP addresses to use, and since these real-time updates directly control the cached mapping in all ITRs currently tunneling packets to this EID prefix, this means the EUN has direct, real-time, control of the mapping of its EID prefix and so direct real-time control of which IR(EID) role router the IR(LF/GW) routers will tunnel packets to. This is just like Ivip. There's no longer a need for ITRs to do any reachability testing, since the EUN can easily hire a company to do better reachability testing of its actual network via the two or more IR(EID) routers - and give that company the authentication details it needs to be able to tell the VP Company to change the mapping. So, with these to elaborations - which look feasible to me - IRON-RANGER could have real-time mapping just like Ivip. This is at odds with the LISP tradition and with the expectation of many RRG folks that such things are impossible. With this proposed pair of modifications: 1 - "EUN -> VP Company -> IR(VP) router" instead of "IR(EID) routers registering their EID prefixes with the IR(VP) routers". 2 - Map Updates for real-time mapping. I-R would still not be as scalable as Ivip if there were a very large number of ITRs tunneling packets to this EID prefix at the time the mapping is changed. With this modified version of I-R, each such change would require each of the 3 or so IR(VP) routers to send a Map Update to each of these IR(LF/GW) routers which is (or might be) tunneling packets - all those IRON routers which, within the caching time, were recently sent Map Replies for this EID prefix. This is a triplication of total workload for the IR(VP) routers, compared to a single device sending the Map Update to the IR(LF/GW) router. (However, as noted previously, this "reverse scattergun" approach might be robust enough to send the Map Update without any requirement for acknowledgement, retries etc.) It is also an escalation of workload for each affected IR(LF/GW) router, because it will receive not just one Map Update, but 3. The real problem is for the IR(VP) routers which must each send a Map Update packet to every IR(LF/GW) router which, within the caching time, was previously sent a Map Reply for this EID. This will not scale well. If there are 100 such IR(LF/GW) routers, then it will be bad. If there are 10,000, it will be really bad. Ivip avoids these scaling problems, because the mapping change goes out to a dozen or more DITR sites, and is received at each site by the authoritative QSA query server there. That sends out any needed Map Updates to its queriers, who likewise were sent a Map Reply for this EID prefix within the current caching time. But these will be far fewer in number per QSA than the total number of ITRs which need to get such a Map Update. Firstly, the work is spread over a (typically) dozen or more QSAs at this many widely distributed DITR sites. Secondly, each querier - each QSR - which needs to get a Map Update will sometimes, or frequently, pass the information on to multiple ITRs. So I think the Ivip approach will scale better, and not result in an overly large workload for any server (QSA, QSR or QSC). Also, in Ivip, the ITR gets a single Map Update which it will acknowledge. This is probably less expensive than the I-R approach where the IR(LF/GW) router gets 3 Map Updates, one from each IR(VP) router. Scaling vs. simplicity - and more on real-time mapping distribution =================================================================== Here I compare Ivip (with Distributed Real Time Mapping) and version of the I-L-light subset with two significant modifications, as described above: 1 - No registrations of EIDs from IR(EID) routers. Instead, EUNs tell the VP company the IP addresses of the IR(EID) routers which will be handling each of their one or more EID prefixes. This is inherently a real-time process, since the VP company receives the new information and can easily configure the 3 or so VP routers to have new mapping, in less than a second. 2 - By adding Map Updates (if they were not already in Fred's design), the mapping changes (which always result from the above real-time process) can be sent to each IR(LF/GW) router which is caching the mapping for this EID prefix. These two changes would make the I-R mapping distribution system just as real-time as Ivip. Then, the mapping could still consist of multiple IR(EID) routers with priorities, weightings etc. to tell IR(LF/GW) routers how to do load sharing and choose between IR(EID) routers them more than one of them appears to be working - as with LISP and the current I-R design. However, it would also be possible to simplify the system so that, like Ivip, the mapping can only consist of a single address of a single IR(EID) router. This would mean that IR(LF/GW) routers would never be required to choose between IR(EID) routers - and all EUNs would be required to achieve their multihoming service restoration and/or inbound TE goals by changing the mapping in real-time. Even if this decision is not taken, the above two points make the system real-time - so any EUN which wants to control the mapping of its EIDs in real-time can do so. The only question is whether the system is simplified, as in Ivip - which requires all EUNs to do this. Either way, the real-time control of mapping and tunneling is a major advance on LISP and the original I-R design. Some EUNs might want to use the system for dynamic (responsive to traffic flows, minute to minute, or even faster) inbound TE. Some such EUNs will find it highly advantageous to steer inbound traffic over their two or more ISP links, being able to maximise the utilization of each link within some acceptable level which avoids congestion. I guess content distribution networks could also steer traffic sent to various EID prefixes to their IR(EID) routers at various separate sites all around the Net, to dynamically load balance the workload of these sites. For instance, a single site EUN has 5 hosts, or groups of hosts, each with a differing pattern of incoming traffic - where the levels of traffic for each changed hour-to-hour or minute-to-minute. By splitting each group into two smaller groups, and defining an EID prefix for each small group, the EUN will be able to dynamically steer these 10 streams of traffic between two or more ISP links, perhaps of different capacities, and so maximise the utilization of these expensive links while avoiding congestion which would occur without this dynamic inbound TE capability. The next thing which will happen is the VP Companies will want to charge for mapping changes, or at least for frequent mapping changes. That's fine - they should do so. The EUNs which have a high enough need for inbound TE to pay the fee per mapping change will continue to make those changes. The fee might be between a few cents and a few tens of cents - and still be highly worth the expense for some EUNs, since it is cheaper to do this at peak times than to pay for higher capacity links to their ISPs. This need to charge for frequent mapping changes results directly from the ability of the system to let EUNs (or companies they appoint) to directly control the tunneling behavior of IR(LF/GW) routers all over the world. This applies whether the system uses the current complex mapping, with complex IR(LF/GW) functions - or adopts the simpler Ivip-style single address mapping, and so enables all IF(LF/GW) routers to be significantly simpler. Sill, this modified I-R architecture seems to have more scaling problems than Ivip. It is simpler than Ivip - and I argue below that the extra complexity in Ivip is justified by the way this complexity enables Ivip to scale better. Anything which increases the number of ITR / IR(LF/GW) routers is assumed to be good, since the more there are of these, the less work each one has to do, and the cheaper each one can be. This makes it more attractive to implement the ITR or IR(LF/GW) functions in cheap COTS (Commercial Off The Shelf) servers - and this enables the system to be introduced rapidly with fewer costs and risks than relying on major upgrades to the functionality of routers from the major router manufacturers. The I-R architecture (or at least the above modified real-time version of I-R-lite) is simpler than the architecture of Ivip with DRTM. The change "l" listed above - EUNs telling VP companies directly what the mapping should be - is a considerable simplification of the original I-R design. As far as I know, it is superior in every respect to the the plan for each IR(EID) router to have a signed message which it uses to repeatedly register an EID with each of 3 or so IR(VP) routers. The change "2" above adds some complexity to the I-R design, but the Map Update message is similar or identical to the Map Reply, so there are no more complex protocols required. The additional complexity is partly in the IR(VP) router needing to retain state about, within the current caching time, which IR(LF/GW) router it has sent mappings to for each EID in the VP. The other extra complexity is in the IR(LF/GW) functionality - these need to be able to recognise Map Update messages and act on them (which is very similar to how they respond to a Map Reply message). They may need a fancier SEAL-ID recognition system, since the SEAL-ID the IR(LF/GW) sent with the original traffic packet to the IR(VP) router is used to secure both Map Reply and Map Update messages from the IR(VP) router. These Map Update messages may arrive quite a time (whatever the caching time is) after the initial traffic packet was tunneled, so the "window of acceptable SEAL-IDs" needs to be correspondingly wider, in time. Depending on the caching time, this may involve recognising significantly older SEAL-IDs than in the current design. I think Fred's current plan is to use some kind of sliding window arrangement on SEAL-IDs to recognise them as valid, rather than to cache each one and maintain some kind of timer for each one. Still, I think adding Map Updates will not involve excessive complexity. I will dub this real-time souped up version of my I-R-lite subset "I-R-RT". I-R-RT has virtues of simplicity. It has only three kinds of network element which handle traffic packets, if we consider the IR(GW) role of IRON routers - like LISP PTRs or Ivip DITRs, receiving traffic packets from the DFZ - to be not significantly different from the IF(LF) role, which does the same thing, but only advertised the "edge" space in a local routing system, and so doesn't attract traffic packets from the DFZ (from other ASes). All network elements except the newly proposed VP List servers are roles for IRON routers. There are three types of role, each with different responsibilities. All IRON routers are assumed to be capable of performing the IR(LF/GW) role, and a subset of them will also perform one or both of the other two roles. IR(LF/GW) Must know all the VPs and for each VP must know the IP addresses (or FQDNs?) of the 3 or so IRON routers which are performing the IR(VP) role for that VP. Do this by either: 1 - Download file(s) and then do delta checks, using the VP File servers mentioned below. 2 - A DNS-based approach as suggested above. Initially, in the absence of its cache containing mapping for a matching EID, the IR(LF/GW) router tunnels any traffic packet which is addressed to an "edge" address to all 3 IR(VP) routers for the matching VP. Accepts Map Reply messages from these IR(VP) routers with a caching time. (This time can now be quite long, since we are no longer relying on cache time-outs for IR(LF/GW) routers to discover mapping changes.) Tunnels each subsequent traffic packet matching the EID specified in the Map Reply message to one IR(EID) router. Also accepts (within the caching time) Map Update messages, which are similar or identical to Map Reply messages, and alters its tunneling instantly according to the new mapping. Work with IR(VP) and IR(EID) role routers to solve PMTUD problems. IR(VP) Typically 3 IRON routers will perform the IR(VP) role for a given VP. Some IRON routers will do this for many VPs. Receives mapping for each EID directly from the VP company which is responsible for the VP. Presumably this role router is run by - or at least controlled and paid for by - the VP Company. When it receives a traffic packet in a tunnel from an IR(LF/GW) router, or from its own internal IR(LF/GW) function, does four things: 1 - Tunnels the packet to a single IR(EID) router according to the mapping of the matching EID. 2 - Sends back (perhaps within itself) to the IR(LF/GW) function the mapping for this EID prefix, with a caching time. (This is secured by the SEAL-ID in the SEAL tunnel headers of the traffic packet just received.) 3 - Stores the EID, the SEAL-ID and maintains some kind of timer, so it can perform the following if necessary: 4 - If the mapping for an EID prefix is changed, send the new mapping as a Map Update message to all the IF(LF/GW) routers which, in the recent N minutes (whatever the caching time) were sent a Map Reply for this EID. Work with IR(LF/GW) and IR(EID) role routers to solve PMTUD problems. IR(EID) These accept tunneled packets sent by IR(LF/GW) routers - all but the "initial" packets - and from IR(VP) routers - the "initial" traffic packets the IR(LF/GW) router received before it got mapping for a matching EID. Forwards the decapsulated traffic packets to the destination EUN (End User Network). Assuming there are 3 IR(VP) routers and that all are working, ignore the 2nd and 3rd packet from these - only forward the first to the EUN. Work with the IR(LF/GW) and IR(VP) routers to solve PMTUD problems. (If existing complex mapping is used, then the IR(EID) role routers would need to work together in some way to prevent the destination EUN receiving duplicate initial traffic packets if some IR(VP) role routers tunneled packets to IR(EID)1 and others to IR(EID)2. However, if the mapping is only allowed to be a single IR(EID) address, like Ivip, then there will be no such problem with duplicated traffic packets.) VP-list servers These currently have no formal name or function - and they are probably not routers. There needs to be some kind of global, redundant, load-shared, system by which all IRON routers get to know what the current VPs are, and which 3 or so IRON routers are playing the IF(VP) roles for each VP. This involves providing files for download and responses to delta checks. This is only if a DNS-approach is not used. Ideally, I think, there should be no single file, or single server - but some more distributed and fault- tolerant system than the one originally proposed by Fred, which involves a single file, from potentially multiple servers. As with Ivip, EUNs must make their own arrangements to control the mapping themselves, or appoint some other organization to do so. If single address mapping is used, the above is a reasonably complete description of the system. If the existing LISP-like multi address (two or more IF(EID) address) mapping is retained, then the IR(LF/GW) role, and the IR(VP) role must also retain whatever mechanisms Fred proposes for testing reachability and deciding which of the mapping's multiple IR(LF/GW) role routers to tunnel the traffic packet to. Ivip's network elements are as follows. Please see these for more information: http://tools.ietf.org/html/draft-whittle-ivip-arch http://www.firstpr.com.au/ip/ivip/drtm/ These could be roles so a single server or router performed multiple such roles - but generally I consider these to be separate classes of device. All these could be implemented as software on a server. ITR Learn about all the "edge" space, such as by getting a list of MABs, from their local QSR (perhaps via one or more intermediate QSCs) - or a simpler list which doesn't mention individual MABs when two or more are adjacent. Advertise this "edge" space in the local routing system and so tunnel each received traffic packet which is addressed to an "edge" address to an ETR. If the ITR has no cached mapping matching the destination address, buffer the packet and send a Map Request (which includes a nonce and the packet's destination address) to a local QSR or QSC. (Each ITR will auto-discover, or be configured with, the address of 3 or so QSRs or QSCs which it uses for all its Map Queries. The ITR needs to resend the Map Request if no Map Reply arrives within, say, 80ms.) The Map Reply specifies a micronet of SPI (Scalable PI = "edge" space) and a single ETR address, with a caching time. When this mapping arrives, tunnel the buffered packet to the single ETR specified in the mapping. If the mapping is already cached, when the traffic packet arrives, tunnel the packet to the single ETR specified in the mapping. Cache this mapping for the caching time, together with the nonce of the original request, and accept Map Update messages from the QSR or QSC, which will be secured by the same nonce. These updates will either change the ETR address or tell the ITR to flush this micronet from the cache. (The latter would be for when the existing micronet is split or joined to some other micronet - so if the ITR is still handling packets addressed to the old micronet, it will buffer them and make a new Map Request, to receive mapping for a different micronet which covers the destination address.) ITRs can be in ISP and EUNs, including EUNs using conventional edge space and those using SPI "edge" space. So ITRs can be on a micronet address - SPI "edge" space. ITRs cannot be behind NAT in the current design. ITRs work with ETRs to handle PMTUD problems caused by encapsulation. DITR Like an ITR, but: 1 - Advertises only a subset of the "edge" space - specific MABs (Mapped Address Blocks) into the DFZ, and so receives packets addressed to these MABs. An ordinary ITR receives packets addressed to all MABs. A DITR doesn't need to know all the MABs or all "edge" space - it just gets the list of MABs it needs to advertise from its QSA. 2 - Looks up the mapping, if it is not already cached. Sends a Map Request to a QSA which is in the same rack at the same DITR site, so there is a fast 100% reliable, connection, and only a few ms delay in being able to tunnel the packet. 3 - Is located at a DITR site, where the site and its one or more DITRs and QSAs typically only handle a subset of the MABs. (According to which MABOCs this DITR site operator is working for.) 4 - Analyses traffic so the company which operates the DITR site can bill the MABOCs (MAB Operating Companies) for the traffic handled for each MAB. This analysis will include time and micronet details so the MABOC can bill its SPI-leasing EUN customers for each such customer's DITR traffic. 5 - May connect to the QSA via a QSC, but most likely just sends Map Requests, and receives Map Replies and Map Updates, directly to the QSA which is at the same site, and presumably in the same rack. Conceptually, there is a single QSA, but in fact there may be two or more for redundancy, and perhaps the DITR will be configured to use a QSA at another DITR site run by the same company as a backup if its own site's QSAs fail. (This last option is not mentioned in the DRTM ID or in Ivip-arch.) 6 - Stops advertising its MABs in the DFZ if its QSA is dead, or can't get up-to-date mapping. ITFH Like an ITR, but is built into the sending host. Handles traffic packets sent to all MABs, but does not "advertise" routes to these - it simply intercepts outgoing packets generated by the host's otherwise conventional stack. The sending host can be on conventional space or on "edge" space (SPI, micronet space). In the current Ivip design, it can't be behind NAT. QSA Authoritative Query Server. These are only located at DITR sites. In theory a QSA could be authoritative for the mapping of all MABs, but in practice, each DITR site will only support a subset of MABs. Gets, by some means, a real-time feed of mapping changes for all its MABs and so maintains a complete mapping database for each MAB. (How this is done is not currently specified, but since this is only for the QSAs in a single DITR network, and since each such network only handles a subset of MABs and will probably have no more than a few dozen DITR sites, this is assumed to be possible in secure, scalable, fashion. Private network links could be used between these sites.) Responds to Map Requests from DITRs at this site - sending them Map Reply messages within a few milliseconds and sending them Map Update messages if and when required. Conceptually, there is a single QSA at each DITR site. In reality, there may be one for the use of the DITRs there and one or more for accepting Map Replies from typically nearby QSRs. QSAs are in DITR sites. A single DITR network might have one or two dozen such sites, each handling the same subset of MABs. These would be scattered around the Net to share the load and generally minimise total path lengths. QSR Caching Resolving Query Server. ISPs run 1 or more likely 2 or 3 of these for their own ITRs and for the ITRs in their customer networks. Auto-discovers, via a DNS mechanism, all the MABs and provides a form of this information - the complete set of "edge" space - to all the ITRs it serves. Also, for each MAB, discovers the address of 2 or 3 typically nearby QSAs which handle that MAB. Accepts Map Requests from queriers (ITRs and QSCs) and sends them Map Replies and Map Updates. Answers the queries from its own cached mapping or by sending a query to one of the nearby QSAs, depending on which MAB the queried address lies within. Sends its own Map Requests to QSAs, depending on which VP the queried address fits within. Accepts Map Replies and later potentially Map Updates from these QSAs. QSC This is an optional device - a Caching Query Server. It accepts Map Requests from ITRs and/or other QSCs - and sends them Map Replies and Map Updates. It sends its own Map Requests (when it receives a Map Request it can't answer from its cache) to one of the handful of QSRs and/or QSCs which are "upstream". QSCs, when they serve multiple ITRs, can frequently answer Map Requests from their cache - since a previous request by another ITR filled the cache. So QSCs can reduce the workload of QSRs. (The code for the ITR/DITR, QSA, QSR and QSC functions will have many common elements.) ETR Egress Tunnel Router. Accepts the tunneled traffic packets from ITRs and forwards them to the destination network. May be in the ISP network and so shared by multiple destination networks, or may be located in the destination network, such as on a PA address from the ISP. Works with ITRs to handle PMTUD problems caused by encapsulation. Please see the text and diagrams at: http://www.firstpr.com.au/ip/ivip/drtm/ for how all these fit together, and for how initial services and substantial scalable routing benefits will result without any ISP investment, just by using DITRs, QSAs and ETRs. The network elements are, in their logical groups: ITR QSC QSR QSA ETR DITR (Optional) ITFH All but the ETR would share some common code elements. Ivip scales better than the above modified version of I-R for a number of reasons: 1 - Ivip ITRs are simpler, since they don't need to know all the MABs - just the subset of global unicast space which is "edge" (SPI) space. They don't need to know each MAB, or know anything about each MAB - while the I-R equivalent - IR(LF/GW) - needs to know all VPs and know 3 or so IR(VP) routers for each VP. If I-R retains its current multi-address mapping arrangements, Ivip ITRs are much simpler because they use single ETR address mapping and so do not do any reachability testing or make any choices between multiple ETR addresses. This means ITRs can be more numerous, closer to hosts and even in sending hosts. This reduces the load per ITR, enabling them to be cheaper, including being implemented with software on an inexpensive server. 2 - An Ivip ITR only tunnels all traffic packets to a single ETR, while an I-R IR(LF/GW) router tunnels initial packets to all 3 or so IR(VP) routers. 3 - An Ivip ETR only receives a single initial packet, while an I-R IR(EID) router typically receives 3 or so, and must use only the first. This is only true if the current multi-address complex mapping is retained. If I-R adopts the single address (single IR(EID) router) mapping like Ivip, then this problem of duplicate traffic packets arriving from two or more IR(EID) routers won't occur. 4 - In I-R, the authoritative query servers are the IR(VP) routers. Due to the "scattergun" approach to handling their potential unreachability - tunneling initial traffic packets to all IR(VP) routers, the number of IR(VP) routers needs to be strictly limited. I assume a figure of 3 or so. This means that the VPs must be made small enough that each IR(VP) router can handle the load of all initial traffic packets handled by all the IR(LF/GW) routers in the world. Consequently, there will be more VPs in I-R than MABs in Ivip. 5 - The current I-R design involves each IR(GW) advertising all the VPs in the DFZ. (With IPv6, this might be achieved with a single short prefix, if all the "edge" space is within that prefix.) This means there must be a lot of these IR(GW) routers, since at any one location in the DFZ, it must be assured that no such IR(GW) router becomes congested. While an Ivip DITR could advertise all MABs, in practice each DITR (each DITR site) will only handle a subset of MABs. To the extent that a single DITR can't handle all the MABs of a site, there can be multiple DITRs there. (A MAB can even be split and handled by 2 or more DITRs at a given site, but still advertised by a router there as the single MAB prefix.) 6 - Ivip includes plans (charged-for mapping changes and DITR traffic) which allow attractive business cases to be made for DITRs and the DITR sites, which will be run by, or for, MABOCs. 7 - In I-R, each IR(LF/GW) directly queries the authoritative query server(s). It actually queries all 3 or so IR(VP) routers. Furthermore, the query is not a short packet, but a potentially long traffic packet. These implicit queries continue, in their triplicated form, until the IR(LF/GW) router receives at least one Map Reply from one of the IR(VP) routers. Also, the authoritative query servers - IR(VP) routers - need to tunnel these traffic packets to IR(EID) routers. In short, there is no "aggregation", "cached concentration" or whatever it might be called, between the querying ITR-like devices and the authoritative query servers. Likewise, in this modified version of I-R with real-time mapping distribution due to Map Update messages, the authoritative query servers need to do all the work of sending Map Updates directly to each ITR-like IR(LF) router which is caching a Map Reply mapping for the EID prefix whose mapping just changed. (Also, all 3 or so IR(VP) routers have to do this, so each IR(LF) router gets typically 3 Map Updates. This is simple and more robust than getting one, but it is less efficient.) In Ivip, the authoritative QSAs are not queried directly by the ITRs, except by the DITRs at that site. The ITRs may query via one or more levels of caching query server (QSC), each level of which tends to reduce the workload of the upstream query server, which may be another QSC or one of the QSRs. Even without any QSCs, ITRs always query local QSRs, and when each QSR serves a large number of ITRs, each QSR will frequently be able to answer from its cached mapping, thereby reducing the number of Map Requests the authoritative QSAs must handle. 8 - This reduces the workload of QSAs. Firstly, they receive fewer Map Queries and send less Map Replies. Secondly, when a micronet's mapping changes, they send typically far less Map Updates than there are ITRs which need Map Updates, because they send the Map Update to a QSR, which will typically send an equivalent Map Update to multiple ITRs, either directly or via one or more levels of QSCs. 9 - In Ivip, there is no very low, such as 3 or so, limit on the number of authoritative query servers. As noted above, for each I-R VP, there can probably be no more than 3 or 4 IR(VP) routers. With Ivip, there can be as many QSAs as there are DITR sites - and even within a DITR site, there can be multiple QSAs to spread the load of the queries for concerning multiple MABs. There may be other scaling benefits to Ivip as well. To summarize: I-R is an interesting and in principle relatively simple CES architecture. However, this simplicity involves a direct communication path between the ITR-like devices and the authoritative query servers - with a further "scattergun" inefficiency in these interactions. Ivip has more types of network elements and is more complex, but this enables the workload to be split up in a manner which reduces total effort - and in particular reduces the total effort by the authoritative QSA query servers or any other single network element. With no low (3 or so) limit on the number of authoritative QSA query servers, Ivip can have dozens, or in principle hundreds of QSAs, to share the total load for the MABs they handle - though I expect most DITR networks to work fine with 10 to 20 sites.
- [rrg] IRON-RANGER, an interesting Core-Edge Separ… Robin Whittle
- Re: [rrg] IRON-RANGER, an interesting Core-Edge S… Robin Whittle
- Re: [rrg] IRON-RANGER, an interesting Core-Edge S… Templin, Fred L
- Re: [rrg] IRON-RANGER, an interesting Core-Edge S… Robin Whittle
- Re: [rrg] IRON-RANGER, an interesting Core-Edge S… Templin, Fred L