idnits 2.17.1 

draft-shyam-real-ip-framework-48.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 11 instances of lines with non-RFC6890-compliant IPv4
     addresses in the document.  If these are example addresses, they should
     be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 859 has weird spacing: '...lent to  the a...'

  -- The document date (January 06, 2018) is 2302 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Unused Reference: '18' is defined on line 1647, but no explicit
     reference was found in the text

  == Unused Reference: '19' is defined on line 1650, but no explicit
     reference was found in the text

  == Unused Reference: '20' is defined on line 1653, but no explicit
     reference was found in the text

  == Unused Reference: '21' is defined on line 1656, but no explicit
     reference was found in the text

  == Unused Reference: '22' is defined on line 1658, but no explicit
     reference was found in the text

  == Unused Reference: '23' is defined on line 1661, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 4893 (ref. '4') (Obsoleted by RFC 6793)

  ** Obsolete normative reference: RFC 5395 (ref. '12') (Obsoleted by RFC
     6195)

  -- Obsolete informational reference (is this intentional?): RFC 1771 (ref.
     '19') (Obsoleted by RFC 4271)

  -- Obsolete informational reference (is this intentional?): RFC 1883 (ref.
     '20') (Obsoleted by RFC 2460)

  -- Obsolete informational reference (is this intentional?): RFC 2460 (ref.
     '22') (Obsoleted by RFC 8200)


     Summary: 2 errors (**), 0 flaws (~~), 9 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	INTERNET DRAFT                                          S. Bandyopadhyay
3	draft-shyam-real-ip-framework-48.txt                    January 06, 2018
4	Intended status: Experimental
5	Expires: July 06, 2018

7	    An Architectural Framework of the Internet for the Real IP World
8	                  draft-shyam-real-ip-framework-48.txt

10	Abstract

12	   This document tries to propose an architectural framework of the
13	   internet in the real IP world. It describes how a three-tier mesh
14	   structured hierarchy can be established in a large address space
15	   based on fragmenting it into some regions and some sub regions inside
16	   each of them. It shows how to make a transition from private IP to
17	   real IP without making significant changes with the existing network.
18	   With the useful works done through IPv6, it provides all necessary
19	   inputs based on which a specification of IP with 64 bit address space
20	   may be emerged.

22	Status of this Memo

24	   This Internet-Draft is submitted in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF).  Note that other groups may also distribute
29	   working documents as Internet-Drafts.  The list of current Internet-
30	   Drafts is at http://datatracker.ietf.org/drafts/current/.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   This Internet-Draft will expire on July 06, 2018.

39	Copyright Notice

41	   Copyright (c) 2018 IETF Trust and the persons identified as the
42	   document authors. All rights reserved.

44	   This document is subject to BCP 78 and the IETF Trust's Legal
45	   Provisions Relating to IETF Documents
46	   (http://trustee.ietf.org/license-info) in effect on the date of
47	   publication of this document. Please review these documents
48	   carefully, as they describe your rights and restrictions with respect
49	   to this document.

51	Table of Contents
52	   1. Introduction.....................................................2
53	   2. Background.......................................................3
54	   3. A Three tier mesh structured hierarchical network................4
55	      3.1. Route propagation...........................................5
56	      3.2. Determination of prefix lengths.............................8
57	           3.2.1. A pseudo optimal distribution of prefixes in
58	                  a 64 bit architecture................................9
59	           3.2.2. Whether to go for a two tier or three tier hierarchy
60	                  ....................................................11
61	      3.3. Issues related to Satellite communications.................11
62	           3.1.1. Setting default route inside VLSM tree..............12
63	           3.1.2. IP VPN with MPLS inside VLSM tree...................14
64	                  3.1.2.1. Extension to RSVP-TE to support IP
65	                           VPN inside VLSM tree.......................14
66	   4. Provider Independent addressing, name services and multihoming..16
67	      4.1. PI address Resolution......................................18
68	           4.1.1. Record Format.......................................21
69	           4.1.2. Messages............................................23
70	           4.1.3. Master file and data file...........................25
71	           4.1.4. Zone maintenance and transfers......................26
72	   5. Issues related to IP mobility...................................27
73	      5.1. Changes expected with the specifications related
74	           to IP mobility.............................................29
75	   6. Refinements over existing IPv6 specification....................30
76	   7. Distributed processing and Multicasting.........................32
77	   8. Transition to real IP from private IP...........................32
78	   9. IANA Consideration..............................................33
79	   10. Security Consideration.........................................33
80	   11. Acknowledgments................................................33
81	   12. Normative References...........................................34
82	   13. Informative References.........................................35
83	   14. Author's Address...............................................35

85	1. Introduction

87	   Transition from IPv4 to IPv6 is in the process. Work has been done to
88	   upgrade individual nodes (workstations) from IPv4 to IPv6. Also,
89	   there are established documents to make routers/switches to work to
90	   support IPv4 as well as IPv6 packets simultaneously in order to make
91	   the transition possible [1]. CIDR[2] based hierarchical architecture
92	   in the existing 32-bit system is supposed to be continued in IPv6 too
93	   with a large address space. There are documents/concerns over BGP
94	   table entries to become too large in the existing system [3]. There
95	   are proposals to upgrade Autonomous System number to 32-bit from
96	   16-bit to support the demand at the same time [4]. The challenge
97	   relies on how to make the transition smooth from IPv4 to a real IP
98	   world with least changes possible.

100	   The term "real IP environment" is referred to an environment where
101	   hosts in a customer network will possess globally unique IP addresses
102	   and communicate with the rest of the world without the help of
103	   NAT[5]. This document reflects changes required with the BSD 4.4
104	   source code where ever applicable.

106	2. Background

108	   Existing system is in work with Autonomous System (AS) and inter-AS
109	   layer with the approach of CIDR. In order to meet the need within the
110	   32-bit address space, Autonomous Systems of various sizes maintain
111	   CIDR based hierarchical architecture. With the help of NAT [5], a
112	   stub network can maintain an user ID space as large as a class A
113	   network and can meet its useful need to communicate with the rest of
114	   the world with very few real IP addresses. With the combination of
115	   CIDR and NAT applied in the entire space, most of the part of 32-bit
116	   address space gets effectively used as network ID.

118	   With traditional CIDR based hierarchy, a node of higher prefix can be
119	   divided into number of nodes with lower prefixes. Each divided node
120	   can further be subdivided with nodes of further lower prefixes. This
121	   process can be continued till no further division is possible. The
122	   point worth noting is at each point the designer of the network has
123	   to preconceive the future expansion of the network with the concept
124	   in the mind that the resource can not be exhausted at any point of
125	   time. This phenomenon leads the designer to allocate resources much
126	   higher than whatever is needed which leads to a space of unused
127	   address space. The problem gets aggravated once resource gets
128	   exhausted by any chance. e.g. a node of prefix /16 can be divided
129	   with a number of nodes of prefixes /24. If any one of the nodes /24
130	   gets exhausted, resources of other nodes of prefixes /24 can not be
131	   used even if they are available.

133	   In IPv4 environment, there is a desperate attempt of the service
134	   providers to provide internet services with the help of NAT. e.g. a
135	   large educational institute meets its current requirement with 4 real
136	   IP addresses; one for its mail server, one for its web server, one
137	   for its ftp server and another one for its proxy server to provide
138	   web based services to all of its users. In general, these services
139	   are used by an organization of any size(it may be 400 or even 40000).
140	   In the current scenario, the CIDR based tree has been built using
141	   these components together. When private IP will be replaced with real
142	   IP, each customer network will require IP addresses based on its size
143	   and requirement.

145	   Transitioning from private IP to real IP basically requires the
146	   following components:

148	      o A solution for site multihoming with provider assigned
149	        address space
150	      o A strategy to replace private IP to real IP
151	      o A solution to uniquely identify a host in a real IP environment
152	      o A solution to make individual nodes and routers/switches to work
153	        with IPv4 and next generation IP simultaneously.

155	   Solution for site multihoming has been provided in a separate
156	   document [8]. Section 8 shows how to make a transition from private
157	   IP space to real IP space with provider assigned addresses with CIDR
158	   based approach itself without reorganization of the existing provider
159	   network. Section 4 provides a solution for identifying a host
160	   uniquely with a number in a real IP environment. RFC 4213 [1] has
161	   already described the transition mechanism from IPv4 to IPv6 for
162	   individual nodes and routers.

164	   Transitioning to real IP will eliminate the extra routing entries
165	   associated with multihomed sites and thus will reduce the size of the
166	   BGP table substantially. Assignment of addresses requires an
167	   architectural framework. It may continue with the existing CIDR based
168	   architecture (provided transitioning to real IP will be good enough
169	   to handle all routing related issues for ever) or may come out with a
170	   different approach. Mesh structured hierarchy will reduce the growth
171	   of routing entries in a CIDR based environment as well as convenient
172	   for distribution of network resources in a suitable manner in the
173	   long run.

175	   This document also tries to resolve and enhance several issues that
176	   were carried on as part of deployment of IPv6. It shows that a 64 bit
177	   address space is good enough for all practical purposes. With the
178	   useful works done through IPv6, it provides all necessary inputs
179	   based on which a specification of IP with 64 bit address space may be
180	   emerged.

182	3. A Three-tier mesh structured hierarchical network

184	   As Autonomous Systems of various sizes are supported, Autonomous
185	   Systems and the nodes inside the Autonomous Systems can be viewed as
186	   graphically lying on the same plane within the address apace. If
187	   network can be viewed as lying on different planes, routing issues
188	   can be made simpler. If network is designed with a fixed length of
189	   prefix for the Autonomous System everywhere, routing information for
190	   the rest will get confined with the other part of the network prefix.
191	   Which means the maximum size of AS gets assigned to all irrespective
192	   of their actual sizes. This can be made possible with the advantage
193	   of using a large address space and dividing it into number of regions
194	   of fixed sizes inside it. Thus entire network can be viewed as a
195	   network of inter-AS layer nodes. Each node in the inter-AS layer can
196	   act either only as a router in the inter-AS layer or as a router in
197	   the inter-AS layer with an Autonomous System attached to it with a
198	   single point of attachment or as an Autonomous System with multiple
199	   Autonomous System border routers (ASBR) appearing like a mesh. Thus
200	   two tier mesh structured hierarchy gets established between AS layer
201	   and inter-AS layer with each AS having a fixed length of prefix.

203	   Based on the definition of Autonomous System, it is a small area
204	   within the entire network that maintains its own independent identity
205	   that communicates with the rest of the world through some specific
206	   border routers. In the similar manner, if a larger area (say region
207	   or state) can be considered as network of Autonomous Systems, that
208	   can maintain its own identity by communicating with the rest of the
209	   world through some border routers (say, state border router), mesh
210	   structured hierarchy can be established within the inter-AS layer.
211	   The inter-AS layer will be split into inter-AS-top and inter-AS-
212	   bottom. To maintain this hierarchy, each node of inter-AS-top needs
213	   to have multiple regional or state border routers (say, SBR) through
214	   which each one will communicate with the rest of the world in the
215	   similar manner an Autonomous System maintains ASBR. Thus, entire
216	   network will appear as a network of nodes of inter-AS-top layer. To
217	   maintain hierarchy, each node of the inter-AS-top needs to have a
218	   fixed length of prefix. i.e. each node of the inter-AS top will be
219	   assigned a maximum (fixed) number of nodes of Autonomous Systems.

221	   Thus, with three-tier mesh structured hierarchy in the network layer,
222	   network ID can be viewed as A.B.C. If pA, pB and pC be the prefix
223	   lengths of inter-AS-top, inter-AS-bottom and AS layers respectively,
224	   there will be 2^pA nodes at the topmost layer, 2^pB at the inter-AS-
225	   bottom layer and 2^pC nodes at the AS layer. Thus the entire space
226	   gets divided into a fixed number of regions and each region gets
227	   divided into fixed number of sub regions. This division is supposed
228	   to be made based on geography, population density and their demands
229	   and related factors.

231	   Let nMaxInterASTopNodes be the possible maximum number of nodes
232	   assigned at the top most layer and nMaxInterASBottomNodes be that at
233	   the inter-AS-bottom layer and nMaxASNodes at the AS layer. Where
234	   nMaxInterASTopNodes <= 2^pA and nMaxInterASBottomNodes <= 2^pB and
235	   nMaxASNodes <= 2^pC.

237	3.1. Route propagation

239	   With hierarchy established, routing information that gets established
240	   inside a node of inter-AS-top, does not need to be propagated to
241	   another node of inter-AS-top. Entire routing information of inter-AS-
242	   top layer needs to be propagated to inter-AS-bottom layer. So, each
243	   router of inter-AS layer will have two tables of information, one for
244	   the inter-AS-top and another for the inter-AS-bottom of the inter-AS-
245	   top node that it belongs to. BGP (with little modification) will work
246	   very well with a trick applied at the SBRs. Each SBR will not
247	   propagate the routing information of inter-AS-bottom layer of its
248	   domain to another SBR of neighboring domain. i.e. SBR of one top
249	   layer node will propagate routing information only of inter-AS-top
250	   layer to SBR of another top layer node. Inside a node of inter-AS-
251	   top, routing information of inter-AS-top and inter-AS-bottom need to
252	   be propagated from one ASBR to another neighboring ASBR. Inside a top
253	   layer node A, routing information of another top layer node B will
254	   have two parts; one for the list of SBRs through which a packet will
255	   traverse from top layer node A to B and another for the list of ASBRs
256	   through which the packet will traverse from one AS to another inside
257	   A. In terms of BGP, AS_PATH attribute will be split into two parts;
258	   one for the information of the top layer and another for the bottom
259	   layer. Within the same node A routing information of one AS to
260	   another AS will not have any top layer information. i.e. the top
261	   layer information will be set to as NULL.

263	   Similarly, each node of the AS layer will have three tables of
264	   routing entries. One for the inter-AS-top, one for the inter-AS-
265	   bottom and another for the routing information inside the Autonomous
266	   System itself.

268	   Introduction of hierarchy at the inter-AS layer reduces the size of
269	   the routing table substantially. With the availability of hardware
270	   resources if flat address space is maintained at each layer, problems
271	   related to CIDR can be avoided. With flat address space, no
272	   hierarchical relationship needs to be established between any two
273	   nodes in the same layer. So, all the nodes inside each layer can be
274	   used till they get exhausted. With flat address space (i.e.  without
275	   prefix reduction), BGP tables will have maximum nMaxInterASTopNodes +
276	   nMaxInterASBottomNodes entries.

278	   IGP like OSPF has got provision to divide AS into smaller areas. OSPF
279	   hides the topology of an area from the rest of the Autonomous System.
280	   This information hiding enables a significant reduction in routing
281	   traffic. With the support of subnetting, OSPF attaches an IP address
282	   mask to indicate a range of IP addresses being described by that
283	   particular route. With this approach it reduces the size of the
284	   routing traffic instead of describing all the nodes inside it, but
285	   introduces another level of hierarchy. If subnetting concept can be
286	   avoided from the AS layer(with the additional overhead of computation
287	   inside the SPF tree), each area can be configured from a free pool of
288	   addresses based on its requirement dynamically. So, an AS can be
289	   divided into number of areas of heterogeneous sizes with the nodes
290	   from a free pool of address space.

292	   Similarly, the concept of area can be introduced in the inter-AS-
293	   bottom layer the way it works in OSPF. The area border routers in the
294	   inter-AS-bottom layer have to behave exactly in the similar manner
295	   the way an ABR behaves in OSPF. i.e. an area border router will hide
296	   the topology inside an area to the rest of the world and will
297	   distribute the collected information inside the area to the rest. It
298	   will distribute the collected routing information from outside to the
299	   nodes inside as well. In order to implement this, protocol running in
300	   the inter-AS layer (say BGP) will have to introduce a 'cost' factor.
301	   This cost factor can be interpreted as the cost of propagation of a
302	   packet from one AS to another. The protocols running inside AS layer
303	   (RIP/OSPF, etc) will have to the supply the cost information for a
304	   packet to travel from one ASBR to another. All the protocols must
305	   behave in unison for supplying this information. The cost factor is
306	   needed for a remote node while sending a packet to a node inside an
307	   area while more than one area border routers are equidistant from
308	   that remote node. Thus inter-AS-bottom layer (i.e. one inter-AS-top
309	   level node) can be divided into number of areas of heterogeneous
310	   sizes with nodes of AS from a free pool of address space. BGP adopts
311	   a technique called route aggregation. Along with route aggregation it
312	   reduces routing information within a message. In the similar manner,
313	   introduction of area inside inter-AS-bottom layer will not only
314	   reduce the complexity of the protocol, but will reduce the size of a
315	   BGP packet substantially.

317	   With this architecture, each node(router) inside an AS is represented
318	   as A.B.C.  Each node may or may not be attached with a network which
319	   acts as a leaf node (i.e. a network will not act as a transit). In
320	   order to make use of user-id space properly and to support customer
321	   networks of heterogeneous sizes, the user-ID space needs to be
322	   divided as subnet-ID and user-ID. Profoundly, a VLSM (variable length
323	   subnet mask) type of approach (in the form of a tree) has to be
324	   adopted at each node of an AS. So, each node of the AS layer will act
325	   as the root of a tree whose leaves are independent small customer
326	   networks which will act as stub. As the routing information of inter-
327	   AS layer as well as AS layer need not be passed inside any node of
328	   the VLSM tree, each router inside the tree should maintain default
329	   route for any address outside of its network/domain. With this
330	   approach, load on each router of the service providers will become
331	   negligible. Protocols that supports VLSM with MPLS/VPN has to be
332	   implemented inside the tree. Inside the VLSM tree, all the physical
333	   ports of a switch have to be configured with the subnet mask. So,
334	   mere MPLS on top of static routing table should do the rest.

336	   The fundamental assumptions based on which this architecture lies can
337	   be summarized as follows:

339	   i) Entire network can be viewed as a network of regions or states
340	   where each region or state can have its own identity by communicating
341	   with the rest of the world through some state border routers. Each
342	   region or state is a network of Autonomous Systems. Each region as
343	   well as each Autonomous System inside them will have a fixed
344	   (maximum) length of prefix.

346	   ii) Availability of hardware resources is such that flat address
347	   space can be maintained at the inter-AS layer.

349	   Introduction of mesh-structured hierarchy will have several
350	   advantages:

352	      o  Load at each router will get reduced substantially.
353	      o  Concept of CIDR style approach and complexity related to
354	           prefix reduction can be easily avoided.
355	      o  Mesh structured hierarchy will make traffic evenly distributed.
356	      o  Physical cable connection can be optimized.
357	      o  Administrative issues will become easier.

359	3.2. Determination of prefix lengths

361	   With this architecture, IP address can be described as A.B.C.D where
362	   the D part represents the user id. Each router in the inter-AS layer
363	   will have two tables of information, one for the inter-AS-top and
364	   another for the inter-AS-bottom of the inter-AS-top node that it
365	   belongs to. Whereas, each node of the AS layer will have three tables
366	   of routing entries; one for the inter-AS-top, one for the inter-AS-
367	   bottom and another for the routing information inside the Autonomous
368	   System itself. In the worst case. a node inside an AS needs to
369	   maintain nMaxInterASTopNodes + nMaxInterASBottomNodes + nMaxASNodes
370	   entries in its routing table.

372	   The dynamic nature of allocating an area from a free pool of address
373	   space is more frequent at the AS layer than at the inter-AS-bottom
374	   layer. As OSPF supports all the features needed, it can be considered
375	   as default choice in the AS layer. Existing implementation of OSPF
376	   (Version 2) supports subnetting, by which an entire area can be
377	   represented as a combination of network address and subnet mask. With
378	   this approach, entire routing table gets reduced substantially. With
379	   the removal of subnetting, all the nodes inside an area will have an
380	   entry inside the routing table (OSPF Version 1). So the deterministic
381	   factor is what is the maximum number of nodes inside an AS OSPF can
382	   support once subnetting support gets removed. So the prefix length of
383	   AS layer will be determined by this factor of OSPF.

385	   With the introduction of hierarchy in the inter-AS layer, number of
386	   entries in the BGP routing table will get reduced substantially. Even
387	   if pA and pB both are selected as 16, number of routing entries come
388	   within the admissible range of existing BGP protocol. But, it is the
389	   responsibility of IANA to come out with a scheme how
390	   nMaxInterASTopNodes and nMaxInterASBottomNodes are to be selected.
391	   Each top level node will have nMaxInterASBottomNodes nodes. It will
392	   be a waste of address space if each country gets assigned a top level
393	   nodes (e.g. china has got a population of 1,306,313,800 people where
394	   as Vatican City has got only 920 according to a census of 2006). So a
395	   moderate value of nMaxInterASBottomNodes is desirable, with which
396	   larger countries will have a number of top level nodes. e.g. each
397	   state of USA can be assigned a top level node. With the introduction
398	   of area in the inter-AS-bottom layer, each top level node can be
399	   divided into number of areas of heterogeneous sizes. So, a group of
400	   neighboring countries with less population can share the address
401	   space of a top level node. Similarly, user-id space has to be decided
402	   based on the largest area VLSM tree should be spanned through. All
403	   these issues are completely geo political and have to be decided by
404	   IANA.

406	3.2.1. A pseudo optimal distribution of prefixes in a 64 bit
407	   architecture

409	   In order to have optimal use of cable connections, length of the VLSM
410	   tree is expected to be as short as possible. Also any single
411	   organization may prefer to have its user id space to be under the
412	   same network id. So, a 16 bit user-id may become insufficient for
413	   places like large university campus, where as 32 bit will become too
414	   large. Hence, 24 bit user-id will be a moderate one which is the
415	   class A address space in IPv4 (also used as the space for private
416	   IP). As published in 1998 [6], OSPF can support an area with 1600
417	   routers and 30K external LSAs. So, 11 bits are needed to support this
418	   space. With the assumption that OSPF can support much more address
419	   space with the advancement of hardware technology as well as to keep
420	   the space open for future expansions, 12 bits are assigned for the AS
421	   layer. 16 bits are assigned for the inter-AS-bottom layer. So, if on
422	   the average, 16 bit equivalent space gets used within the user-id
423	   space (i.e. one out of 256) and 8 bit equivalent nodes gets used
424	   inside an AS (16% of 1600), for a top level node (with 16 bit
425	   equivalent AS nodes), it will generate 2^40 IP addresses, which will
426	   give 8629 IP addresses per person in Japan (with a population of
427	   127417200; Japan is at the 10th position from the top in the
428	   population list of the world). So, even if all the countries with
429	   population less than or equal to Japan are assigned a top level node
430	   and all the provinces/states of countries with larger population are
431	   assigned a top level node each, total number of nodes will come well
432	   under 1024. If a number of neighboring countries with lesser
433	   population shares a top level node, total number of top level nodes
434	   will come down further.  This suggests that 62 bit equivalent
435	   (10(pA)+16(pB)+12(pC)+24(user-id)) space will be good enough for
436	   unicast addresses. This distribution expects OSPF to support 65K
437	   (64K+1K) external LSAs.

439	   Distribution of address space will be finalized based on the
440	   consultation with IANA. Primarily, they may appear to be as follows:

442	   64 bit address space may be divided into two 63 bits blocks:

444	   i. Global unicast addresses with the most significant bit set to 0.
445	   This space is equally divided between provider assigned (PA) address
446	   space and provider independent (PI) address space.

448	   a) Provider assigned address space with prefix 00.

450	   b) Provider independent (PI) address space with prefix 01.  Provider
451	   independent address space will be used for the customers who would
452	   like to retain their number even after changing their providers. As
453	   routing will be based on PA addresses, each PI address will be
454	   associated to at least one PA address. Most significant part of PI
455	   addressing is, it is independent of the architectural framework of
456	   the provider network; even if the architectural framework changes,
457	   same format of PI addressing can be maintained. Once implemented, PI
458	   address of a node will be the number that will be generally used by
459	   the common people. Section 4 describes issues related to PI
460	   addressing in detail.

462	   ii. Address space with the MSB set to 1 will be distributed within
463	   the rest. Each of them will have a fixed prefix. This distribution
464	   will be based on the requirements and the work that have already been
465	   done in connection to IPv6:

467	   a) Address space for multicasting with a prefix set to 1111.

469	   b) Address space for link-local address: Link local addresses will
470	   have a prefix 1110.

472	   c) Router address space: This space will be used by the routers and
473	   will have a prefix 1101.

475	   d) Address space for private IP: Each customer network can maintain
476	   private address space to communicate within its users. This space
477	   will be distributed within all the customer sites of a corporate that
478	   can maintain VPN services. A 32 bit address space should be good
479	   enough for private IP. Private address space will have a 32 bit
480	   prefix with leading 4 bits are set to 1100 and the rest are set to 1.

482	   Rest of the address space has been kept for future use.

484	3.2.2. Whether to go for a two-tier or three-tier hierarchy

486	   Establishment of hierarchy in the inter-AS layer reduces the size of
487	   BGP entries to a great extent, but leads to an improper use of
488	   address space due to geo-political reason. If hierarchy in the inter-
489	   AS space gets removed, entire 26 bit (10+16) space will be available
490	   for a single layer and use of inter-AS space will be true to its
491	   sense, but will increase external LSA (and/or number of entries in
492	   the BGP table) dramatically. So, it depends on to what extent OSPF
493	   can support external LSAs. BGP expects the packet length to be
494	   limited to 4096 bytes. BGP manages to make it work with this
495	   limitation with the concept of prefix reduction in the CIDR based
496	   environment. As the number of inter-AS nodes increases, BGP has to
497	   change this limit in order to make it work in flat address space. The
498	   alternate will be to divide the inter-AS space into number of areas
499	   as defined in section 2.1. The area border routers will advertise the
500	   aggregated information to the rest of the world. BGP may have to
501	   incorporate both the options at the same time. As the number of nodes
502	   in the inter-AS layer increases, in order to reduce the number of
503	   entries in the routing table, inter-AS space has to be split into two
504	   separate planes. So, two-tier hierarchy can be considered as an
505	   interim state to go for three-tier hierarchy. If it so happen that
506	   current available data is good enough to support the present need, it
507	   will be worth to look for to what extent it can support in the
508	   future. Assignment of inter-AS nodes in two-tier hierarchy should be
509	   based on the geographical distribution as if it is part of three-tier
510	   hierarchy. Otherwise, introduction of three-tier hierarchy in the
511	   future will become another difficult task to go through. Based on the
512	   report of year 2011, BGP supports ~400,000 entries in the routing
513	   table. With this growing trend, BGP may have to change the limit of
514	   packet length even in a CIDR based environment. With the introduction
515	   of two-tier hierarchy, number of entries in the routing table will
516	   come down drastically and with the three-tier approach, it will come
517	   down further.

519	3.3. Issues related to Satellite communications

521	   Establishment of hierarchy in the inter-AS layer expects the only way
522	   any two autonomous systems in two different top level nodes
523	   communicate is through their SBRs. If two autonomous systems inside
524	   the same top level node communicate through satellite, it will be
525	   considered as a direct link between them. Whenever autonomous system
526	   'ASa' of top level node 'A' communicates with autonomous system 'ASb'
527	   of top level node 'B' through satellite, they have to go through
528	   their state border routers. i.e.  satellite port inside 'A' that
529	   communicates with a satellite port inside 'B' will be considered as
530	   state border router. If multiple such ports exists inside node 'A',
531	   all of them will be equidistant from any port inside 'B'. Which
532	   expects any satellite port inside 'B' to have prior knowledge of list
533	   of autonomous systems that will be under the purview of any port
534	   inside 'A'. So, all the satellite ports of 'A' have to exchange such
535	   group of information with all the satellite ports of 'B' and vice
536	   versa. These group of autonomous systems can be considered as a
537	   cluster of autonomous systems inside an area of a top level node. If
538	   number of such ports is small, some heuristics can be applied while
539	   assigning AS numbers in order to reduce the processing time during
540	   the circuit establishment phase.  It will become difficult to
541	   maintain such heuristics once the number of such ports becomes large.
542	   So, in case of satellite communication, the advantage of establishing
543	   hierarchy inside inter-AS layer diminishes as the number of satellite
544	   ports increases. If any private corporate maintains its own satellite
545	   channel to communicate between its offices at distant locations, all
546	   of these offices are going to be considered as under the user-id
547	   space of its network. Service providers that provide satellite
548	   services to the end-site customers, can operate in the usual manner
549	   as they will provide connection to customer networks which will act
550	   as stub.

552	3.4. Setting default route inside VLSM tree

554	   Section 3.1 describes that there is no need to pass down the routing
555	   information of the external world inside VLSM tree that acts as a
556	   stub. Inside a VLSM tree, a node of higher prefix can be divided into
557	   number of nodes with lower prefixes. Each divided node can further be
558	   subdivided with nodes of further lower prefixes. This process can be
559	   continued as long as it is desired or no more division is further
560	   possible.

562	   Following figure shows a typical arrangement of VLSM tree of a
563	   service provider's network with IPv4 address space. Switch SW-A is
564	   connected to the outside world and maintains global routing table. It
565	   acts as the root of a VLSM tree that acts as a stub. It has been
566	   assigned an address block 11.1.16.0/20 which is distributed among its
567	   four children SW-B, SW-C, SW-D and SW-E with the approach of VLSM.
568	   Switch SW-B further divides its address space between switches SW-F
569	   and SW-G. Switch SW-F assigns an address block 11.1.16.0/24 to
570	   customer network CN-A. Switch SW-G assigns address block 11.1.20.0/24
571	   and 11.1.21.0/24 to two customers CN-B and CN-C; where as switch SW-E
572	   assigns address block 11.1.30.0/24 to customer network CN-D.

574	   Routing inside the tree takes place with the following principle.

576	   Inside the tree, if a node (switch/router) that is assigned a domain
577	   (NetAddr/NetMask) receives a packet which is destined to somewhere
578	   outside of its domain, needs to forward the packet to its parent in
579	   the hierarchy.

581	                               +--------------+
582	                               |     SW-A     |
583	                               | 11.1.16.0/20 |
584	                               +-+-+------+-+-+
585	                                 | |      | |
586	                 +---------------+ |      | +----------------+
587	                 |                 |      |                  |
588	          +------+-----+ +---------+--+ +-+----------+ +-----+------+
589	          |    SW-B    | |    SW-C    | |    SW-D    | |   SW-E     |
590	          |11.1.16.0/21| |11.1.24.0/22| |11.1.28.0/23| |11.1.30.0/23|
591	          +---+----+---+ +------------+ +------------+ +--+---------+
592	              |    |                                      |
593	              |    +-------+                              |
594	              |            |                           +--+--+
595	      +-------+----+  +----+-------+                   |CN-D |
596	      |   SW-F     |  |    SW-G    |                   +-----+
597	      |11.1.16.0/22|  |11.1.20.0/22|                11.1.30.0/24
598	      +--+---------+  +--+------+--+
599	         |               |      |
600	         |               |      |
601	      +--+--+         +--+--+ +-+---+
602	      |CN-A |         |CN-B | |CN-C |
603	      +-----+         +-----+ +-----+
604	   11.1.16.0/24  11.1.20.0/24 11.1.21.0/24

606	   If a host in CN-A wants to send a packet to an address 11.1.21.116,
607	   CE router of CN-A forwards it to SW-F. SW-F finds the destination
608	   address of the packet to be outside of its domain and forwards the
609	   packet to its parent SW-B. SW-B finds that a port that has been
610	   configured with the matching destination address and forwards it to
611	   its child SW-G. Switch SW-G sends the packet to customer network CN-
612	   B.

614	   If a host in CN-B wants to send a packet to 11.1.17.120, CE router of
615	   CN-B forwards the packet to SW-G. SW-G finds the destination address
616	   of the packet to be outside of its domain and forwards the packet to
617	   its parent SW-B. SW-B finds that a port that has been configured with
618	   the matching destination address and forwards the packet to its child
619	   SW-F. SW-F finds the destination address to be within its domain, but
620	   no port has been configured with the matching destination address and
621	   generates ICMP UNREACHABLE.

623	   If a host in CN-C wants to send a packet to 16.2.22.116, CE router of
624	   CN-C forwards the packet to SW-G. SW-G finds the destination address
625	   of the packet to be outside its domain and forwards the packet to SW-
626	   B. SW-B forwards the packet to its parent SW-A. SW-A find the
627	   destination address of the packet to be outside its domain and
628	   consults with the global forwarding table and forwards the packet
629	   through the right port.

631	3.4.1. IP VPN with MPLS inside VLSM tree

633	   Section 3.1 describes that there is no need to pass down the routing
634	   information of the external world inside VLSM tree. This section
635	   describes how to make IP VPN work inside VLSM tree without using BGP.

637	   RFC4364 [7] describes "IP VPN" with BGP/MPLS. To support VPN, PE
638	   routers maintain per-site forwarding table. When a packet arrives
639	   from an associated CE router, PE router consults with this forwarding
640	   table to forward the packet. If the packet is supposed to be
641	   forwarded to another site of VPN through the backbone, it uses two-
642	   level label stack. The upper label is used to forward the packet from
643	   ingress PE router to the egress PE router; where as, the inner label
644	   is used for the egress PE router to identify the associated CE router
645	   where the packet is supposed to be forwarded. BGP is used by the
646	   Service Provider to exchange the routes of a particular VPN among the
647	   PE routers that are attached to that VPN. Configuration takes place
648	   on PE routers of both the sides of LSP. The simplest way to achieve
649	   this is to configure these attributes manually on PE routers. In
650	   order to have dynamic allocation of inner label, MPLS signaling
651	   protocols (in place of BGP) need to be extended. Allocation of inner
652	   label has to be done by the egress PE router. Same message that is
653	   used for the assignment of upper label may be used for the assignment
654	   of inner label. Inside the forwarding table, each entry contains the
655	   forwarding destination address based on a set of destination
656	   addresses (NetAddress/NetMask) of the IP packets received from
657	   ingress CE router. While establishing inner label, ingress PE router
658	   needs to send these attributes with the signalling message and the
659	   egress PE router needs to validate those before assigning label.

661	3.4.1.1. Extension to RSVP-TE to support IP VPN inside VLSM tree

663	   This section describes extension to RSVP-TE[17] to support dynamic
664	   allocation of inner label of two-level label stack used to support
665	   VPN services.

667	   In order to establish LSP using RSVP-TE, ingress PE router sends Path
668	   message to the egress PE router. Path message is augmented with a
669	   LABEL_REQUEST object.  Labels are allocated downstream and
670	   distributed (propagated upstream) by means of RSVP Resv message. For
671	   this purpose, the RSVP Resv message is extended with a special LABEL
672	   object. In order to support VPN to establish the inner label, Path
673	   message is augmented with a VPN_ATTRIBUTE label. Similarly, RSVP Resv
674	   message is extended with a VPN_LABEL object. When an egress PE router
675	   receives a Path message, it checks the presence of VPN_ATTRIBUTE
676	   object. On finding this object, egress PE router checks the viability
677	   of assignment of VPN label with the parameters from the VPN_ATTRIBUTE
678	   object and the attributes that are already configured with the egress
679	   PE router. If the test is positive, it assigns a VPN label and does
680	   the rest of the processing of LSP label assignment and sends the RSVP
681	   Resv message with the extension of VPN_LABEL object towards the
682	   ingress PE router. On receiving Resv message with VPN_LABEL object,
683	   ingress PE router assigns VPN label along with the rest of the
684	   processing of Resv message and completes the operation. VPN_ATTRIBUTE
685	   and VPN_LABEL objects are described below.

687	   VPN_LABEL class=208, C-Type=1
688	    0                   1                   2                   3
689	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
690	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
691	   |                         (inner label)                         |
692	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

694	   VPN_ATTRIBUTE  class=209, C-Type=1
695	    0                   1                   2                   3
696	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
697	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
698	   |         Global Unicast Address of Ingress CE Router           |
699	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
700	   |         Global Unicast Address of Egress CE Router            |
701	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
702	   |             Net Address of Destination IP Packet              |
703	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
704	   |             Net Mask of Destination IP Packet                 |
705	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

707	   The format of the Path message is as follows:

709	      <Path Message> ::=       <Common Header> [ <INTEGRITY> ]
710	                               <SESSION> <RSVP_HOP>
711	                               <TIME_VALUES>
712	                               [ <EXPLICIT_ROUTE> ]
713	                               <LABEL_REQUEST>
714	                               [ <VPN_ATTRIBUTE> ]
715	                               [ <SESSION_ATTRIBUTE> ]
716	                               [ <POLICY_DATA> ... ]
717	                               <sender descriptor>

719	      <sender descriptor> ::=  <SENDER_TEMPLATE> <SENDER_TSPEC>
720	                               [ <ADSPEC> ]
721	                               [ <RECORD_ROUTE> ]

723	   The format of the Resv message is as follows:

725	      <Resv Message> ::=       <Common Header> [ <INTEGRITY> ]
726	                               <SESSION>  <RSVP_HOP>
727	                               <TIME_VALUES>
728	                               [ <RESV_CONFIRM> ]  [ <SCOPE> ]
729	                               [ <POLICY_DATA> ... ]
730	                               [ <VPN_LABEL> ]
731	                               <STYLE> <flow descriptor list>

733	      <flow descriptor list> ::= <FF flow descriptor list>
734	                               | <SE flow descriptor>

736	      <FF flow descriptor list> ::= <FLOWSPEC> <FILTER_SPEC> <LABEL>
737	                               [ <RECORD_ROUTE> ]
738	                               | <FF flow descriptor list>
739	                               <FF flow descriptor>

741	      <FF flow descriptor> ::= [ <FLOWSPEC> ] <FILTER_SPEC> <LABEL>
742	                               [ <RECORD_ROUTE> ]

744	      <SE flow descriptor> ::= <FLOWSPEC> <SE filter spec list>

746	      <SE filter spec list> ::= <SE filter spec>
747	                               | <SE filter spec list> <SE filter spec>

749	      <SE filter spec> ::=     <FILTER_SPEC> <LABEL> [ <RECORD_ROUTE> ]

751	   Egress router generates an error with Error Code = 24, sub-code = 116
752	   (VPN label allocation error) if the operation fails.

754	4. Provider Independent addressing, name services and multihoming

756	   Provider independent addressing can be conceived as naming a host
757	   with a number. It can be used by customer networks who would like to
758	   retain their number even after changing their service provider; also
759	   it is useful to designate a host uniquely if the customer network is
760	   multihomed. Just like in name services, as address corresponding to a
761	   name needs to be resolved first to initiate communication, the same
762	   is required for PI addressing. Each globally unique PI address will
763	   be associated to at least one global unicast provider assigned
764	   address. For a host with single interface, this number will be same
765	   as the number of service providers the customer network is associated
766	   with.

768	   As either source or destination or both may be multihomed, there
769	   could be multiple paths to communicate between two hosts. This is
770	   required both for name services as well as for PI addressing.

772	   A system call needs to be introduced to get the source address based
773	   on the destination address. If application program needs to use the
774	   destination address directly, it needs to use this system call.

776	   int getcommaddr(int sockfd, struct in_addr *dst, struct addr_pair
777	   *endpts);

779	   'addr_pair' holds the addresses of communication end points as
780	   follows:

782	   struct addr_pair {
783	       struct in_addr src;
784	       struct in_addr dst;
785	   };

787	   'getcommaddr'[8] returns the number of source-destination pairs for
788	   communication; the field 'endpt' will hold the array of these
789	   addresses. The array will be in sorted manner based on the best
790	   possible route.  'sockfd' is used to get the 'type of service'
791	   assigned. So, an application program needs to set its type of service
792	   before using this call.

794	   'getcommaddr needs to call a routine 'getmappedaddr' to resolve the
795	   mapped provider assigned addresses of a provider independent address.

797	   int getmappedaddr(struct in_addr *piaddr, struct in_addr *mpiaddr);

799	   'getmappedaddr' will return number of mapped addresses and 'mpiaddr'
800	   will hold their values.

802	   Users may use name instead of IP address to reach the destination. A
803	   new system call needs to be introduced 'gethostbynamewithsrcaddr',
804	   which is an extension to 'gethostbyname' as follows:

806	   struct hostent *gethostbynamewithsrcaddr(int sockfd,const char *name,
807	                  int *nroutes, struct addr_pair *endpts);

809	   'gethostbynamewithsrcaddr'[8] takes 'name' and 'sockfd' as input
810	   parameters and finds out the best possible route to reach the
811	   destination. It returns the pointer to the 'hostent' structure as
812	   returned by 'gethostbyname' system call.  The parameter 'nroutes'
813	   gets the number of possible routes to be used and the corresponding
814	   source and destination addresses gets assigned to 'endpts' in sorted
815	   manner. 'sockfd' is used to get the 'type of service' assigned. So,
816	   an application program needs to set its type of service before using
817	   this call.

819	   An application program needs to use these source addresses from the
820	   top (i.e. the 0th) to establish connection with the destination. It
821	   needs to bind source address 'src' and then connect with the
822	   destination address 'dst'.

824	4.1. PI address Resolution

826	   This section tries to come up with a solution for PI address
827	   resolution with the approach of DNS[10] with necessary differences.
828	   Just like name space in DNS, entire address range with prefix 01 will
829	   be the address space used by PI addresses. Servers that will hold the
830	   information of mapping between PI addresses and corresponding PA
831	   addresses will be called as PIMapServers and the programs that will
832	   be used to resolve addresses will be called as PIMapResolvers.

834	   In case of DNS where name is used in hierarchical format to resolve
835	   the addresses, PI address resolution will be based on the prefix of
836	   the PI address used for resolution.  The prefix is determined based
837	   on the architectural model used for the internet. Based on the prefix
838	   information addresses of a list of servers can be found out that will
839	   act as regional servers which will be used to resolve mapped PA
840	   addresses corresponding to that PI address. A prefix will serve a
841	   fixed address space within entire PI address space. Address space
842	   belonging to a prefix will be distributed within customer networks of
843	   heterogeneous sizes. Address space allocation and the mapping of
844	   associated PA address(es) will be assigned by a regional authority.
845	   The regional authority will be fully responsible for the operation of
846	   regional servers in that region.

848	   Like DNS, there are some root servers which will have some fixed
849	   addresses, under which there are some prefixes which will act as top-
850	   level-domains. In case of CIDR based hierarchy, these prefixes may be
851	   of different prefix lengths which are selected based on the
852	   requirements. Each prefix in a top level domain can further be split
853	   into number of prefixes with the approach of CIDR. This tree
854	   structured hierarchy will be kept on growing till we get prefixes
855	   associated with regional servers. Each prefix associated with a
856	   regional server will be distributed amongst customer networks of
857	   various sizes as well as prefixes that will again be associated with
858	   some regional servers with the approach of CIDR. These regional
859	   servers can be considered as equivalent to  the authoritative name
860	   servers of DNS which are associated with zones. As stated earlier,
861	   prefixes starting with "00" will be assigned for provider assigned
862	   addresses and prefix starting with "01" will be assigned for provider
863	   independent addresses where as prefix starting with "1" will be
864	   assigned for addresses of all other types.

866	   As inherent hierarchy is involved in "Mesh structured hierarchy",
867	   this hierarchy goes up to two levels. As usual, there will be some
868	   root servers with fixed assigned addresses. Each root server will
869	   have prefixes with "01.A" that will act like top level domain. Under
870	   each top level domain, there will be entries with prefixes "01.A.B".
871	   Within a region "A.B", every global PA address is represented as
872	   "00.A.B.C.user-id". In order to support customer networks of
873	   heterogeneous sizes with the approach of VLSM, the "user-id" portion
874	   is further divided as "subnet-id.user-id". So, the effective network
875	   prefix of a customer network in PA address space is "00.A.B.C.pa-
876	   subnet-id". Within an "A.B", entire PI address space with prefix
877	   "01.A.B" will be distributed within customer networks of
878	   heterogeneous sizes. So, effective network prefix of a customer
879	   network with PI address will be "01.A.B.pi-subnet-id". A particular
880	   prefix "01.A.B.pi-subnet-id" will be mapped to at least one provider
881	   assigned prefix of same prefix length. For a multihomed customer
882	   network within "A.B" that receives services from two service
883	   providers will have prefixes "00.A.B.C1.pa-subnet-id1" and
884	   "00.A.B.C2.pa-subnet-id2". A PI address prefix "01.A.B.pi-subnet-id"
885	   of same length will be mapped to both these prefixes of PA address
886	   space. Every region "A.B" will have regional server and backup
887	   server(s) with a maximum limit (say 4) with net addresses
888	   "00.A.B.server1", "00.A.B.server2", "00.A.B.server3" and
889	   "00.A.B.server4".

891	   Each PIMapServer will have a database of records that will have
892	   information to resolve PI addresses. In memory copy of a region will
893	   have an array of records where each record will have the following
894	   format:

896	   +------------+---------+------+-----+-------+-----------+
897	   | NetAddress | NetMask | Type | TTL | NAddr | Addr(1-4) |
898	   +------------+---------+------+-----+-------+-----------+

900	   First two fields "NetAddress/NetMask" represents the PI address range
901	   of a network. "Type" will be either Domain/Referral/Individual/
902	   SingleEntry/Default based on which a query and rest of the fields of
903	   a record have to be processed. A PI address can have maximum four
904	   mapped PA addresses. "Addr1", "Addr2", "Addr3", "Addr4" will hold the
905	   corresponding PA addresses and "NAddr" will hold the number of such
906	   addresses. The field "TTL" is a 32 bit integer measured in seconds
907	   which will hold same meaning and approach as defined in the
908	   specification of DNS[10]. When a server receives a query for an
909	   address "X", it extracts the record of the network based on
910	   "NetAddress/NetMask" and "X" from its database. If no matching record
911	   is found, a negative response is sent. Based on the "Type" of the
912	   record, the query is processed in the following manner.

914	   Type=Domain:

916	   This is the most common type. If a customer network would not like to
917	   maintain a map server opts for this option. In this case there will
918	   be one to one mapping between a PI address and corresponding PA
919	   addresses. The fields "Addr1"/"Addr2"/"Addr3"/"Addr4" will hold the
920	   PA Net Addresses corresponding to the PI address of the network.
921	   Server will send the matching record to the resolver with
922	   Type=Domain. Resolver will extract the user-id portion of "X" and
923	   find the corresponding mapped PA addresses based on
924	   "Addr1"/"Addr2"/...etc.

926	   Theoretically, "A.B" portion of a PI address need not match with the
927	   "A.B" portion of the corresponding PA addresses. Consider a large
928	   corporate that has its corporate office and a branch office within
929	   the same region of a particular "A.B" and some other offices with
930	   different values of "A.B". The corporate can maintain a contiguous
931	   range of PI addresses for the ease of its operation. It needs to
932	   split entire PI address range based on its offices and assign the
933	   corresponding PA addresses. In order to minimize the path of a query
934	   it is desirable that "A.B" of a PI address and its corresponding
935	   mapped PA addresses belong to the same region.

937	   Type=Referral:

939	   This is used when an address within the domain "NetAddress"/"NetMask"
940	   has to be processed by another map server. The map server may itself
941	   be another regional server or a server within a customer network.

943	   When a customer network would like to have a direct control for the
944	   mapping of its addresses it needs to opt for this option.
945	   "Addr1"/"Addr2"/"Addr3"/"Addr4" of the database entry will hold the
946	   pointer to the information associated to each map server. "NAddr"
947	   will hold the number of map servers that can be referred. Information
948	   of each server will hold the following values: PI address of the map
949	   server + Number of PA addresses to reach the map server + PA
950	   addresses of the map server. Any one of these map servers need to be
951	   queried for further processing. A server may act either in recursive
952	   mode or in iterative mode based on its implementation just like in
953	   DNS. A large corporate may have different offices and each (or some
954	   of them) may maintain a map server based on their policies.

956	   When a server needs to handle a particular address separately, it
957	   needs to set "NetAddress" with that particular address and all the
958	   bits of "NetMask" will be set to "1". The "Type" field has to be set
959	   as "SingleEntry"(which is similar to the Type Address(A) in terms of
960	   DNS). If some of its addresses need to be handled separately but for
961	   the rest common rule may apply (like Type=Domain), records of the
962	   individual entries should be processed first and then for the rest.
963	   In these cases "Type" has to be set as "Default". So, a server of a
964	   customer network may have database entries with Type=Domain/Referral
965	   /SingleEntry/Default. It makes sense for a server (or a master file)
966	   to have entries with Type=Default, but from the point of a resolver,
967	   it does not make any sense. So a server needs to extract the PA
968	   addresses and form a record with Type=SingleEntry and send it back to
969	   the resolver.

971	   For a host having multiple interfaces, each interface may be assigned
972	   PA addresses supplied by all the service providers, but it is
973	   desirable that PI address gets mapped to only one of them (preferably
974	   for a CE router, the interface which will have the shortest path will
975	   be mapped PI address with the PA address associated with that CE
976	   router).

978	   Type=Individual:

980	   This is meant for the individual users opting for services like
981	   telephonic services that need to maintain PI address. With this
982	   option a mobile user may maintain its PI address after changing its
983	   service provider. A map server needs to maintain some networks with a
984	   range of PI addresses in its database. When a query for an address
985	   "X" is received, server needs to get the corresponding record where
986	   "Addr1" will hold the pointer to a open file descriptor (or pointer
987	   to the in memory copy) of a separate data file where there will be
988	   one to one mapping between PI address and its corresponding PA
989	   address of all the assigned PI addresses. These networks and
990	   assignment of individual PI addresses have to be done by the regional
991	   authority.

993	   As with Type=Default, Type=Individual does not make any sense to a
994	   resolver. So, server needs to extract PA address and form a record
995	   with Type=SingleEntry and send it back to the resolver.

997	   As stated above, this solution is based on the approach of DNS. For
998	   the ease of implementation and to make use of the existing source
999	   code related to DNS (e.g. BIND) most of the features have been taken
1000	   from DNS. Where ever differences arise, the approach followed by this
1001	   document has to be accepted.

1003	   IANA has to assign a port (e.g. 53 in case of DNS) for its UDP/TCP
1004	   based implementation.

1006	4.1.1. Record Format

1008	   Each record (the way they will appear in a master file or will be
1009	   used for communication) will have the following format:

1011	   NetAddress/NetMask + Type (8 bit unsigned int) + <TTL> + RDATA (Type
1012	   specific information)

1014	   Record types are primarily the types of records as described above
1015	   along with three other types: SOA (Start of a zone of authority), MPS
1016	   (host with Type=SingleEntry that acts as a Map server for this zone)
1017	   and DFL (Data File). These types are mainly useful in the context of
1018	   processing AXFR/IXFR/NOTIFY/DFAXFR/DFIXFR messages.

1020	   Types are defined as follows:

1022	   Types               values          comments
1023	   -----------------------------------------------------------
1024	   SEN (SingleEntry)      1    same as type A(address) in DNS
1025	   MPS (MapServer)        2    Map server
1026	   DMN (Domain)           3
1027	   DEF (Default)          4
1028	   REF (Referral)         5
1029	   SOA (Start of a zone)  6
1030	   IND (Individual)       7
1031	   DFL (Data File)        8
1032	   -----------------------------------------------------------

1034	   RDATA of different types will appear as follows:

1036	   Type=SOA:
1037	   PI address of server+SERIAL+REFRESH+RETRY+EXPIRE+MINIMUM (meaning and
1038	   values of SERIAL/REFRESH/RETRY/EXPIRE/MINIMUM are same as they were
1039	   defined in section 3.3.13 of RFC 1035[11])

1041	   Type=(SEN/MPS):
1042	   NAddr(Number of addresses) + corresponding PA addresses

1044	   Type=(DMN/DEF):
1045	   NAddr(Number of addresses) + corresponding Net addresses

1047	   Type=REF:
1048	   NAddr(Number of map server) + for each map server (PI address of map
1049	   server + NAddr(Number of addresses of map server) + corresponding PA
1050	   addresses))

1052	   Type=IND:
1053	   NAddr(=1) + full path name of the data file

1055	   Type=DFL:
1056	   Data file name + SERIAL + Number of records in the data file(32 bit
1057	   unsigned int)

1059	   While used in communication data file name is used as its length (8
1060	   bit unsigned int) followed by the octets of the string.

1062	   TTL value of a record has to be set to 0 if it is not relevant or to
1063	   accept the value associated with the record of SOA.

1065	4.1.2. Messages

1067	   In order to support most of the features of DNS, message format has
1068	   been retained almost same as that of DNS. So, all the relevant fields
1069	   will be processed exactly in the same manner as that have been done
1070	   in DNS and all the irrelevant issues have to be ignored. Rest of this
1071	   section describes where and how changes have to be made.

1073	   As defined in RFC 1035, the top level format of message is divided
1074	   into 5 sections (some of which are empty in certain cases) shown
1075	   below:

1077	       +---------------------+
1078	       |        Header       |
1079	       +---------------------+
1080	       |       Question      | the question for the name server
1081	       +---------------------+
1082	       |        Answer       | answering part of the question
1083	       +---------------------+
1084	       |      Authority      | authoritative map server
1085	       +---------------------+
1086	       |      Additional     | additional information
1087	       +---------------------+

1089	   The header section has been retained as defined in RFC 5395[12] as
1090	   follows:

1092	        0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15
1093	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
1094	       |                      ID                       |
1095	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
1096	       |QR|   OpCode  |AA|TC|RD|RA| Z|AD|CD|   RCODE   |
1097	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
1098	       |                QDCOUNT/ZOCOUNT                |
1099	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
1100	       |                ANCOUNT/PRCOUNT                |
1101	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
1102	       |                NSCOUNT/UPCOUNT                |
1103	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
1104	       |                    ARCOUNT                    |
1105	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

1107	   The question section will have two parts:

1109	   QType(one octet unsigned int)+QData.

1111	   Query types are defined as follows:

1113	   QTypes       values          comments
1114	   -----------------------------------------------------------
1115	   SEN            1    query for mapped PA address
1116	   SOA            6    query information related to SOA
1117	   DFL            8    query information related to data file
1118	   DFXFR          249  data file transfer
1119	   DFIXFR         250  incremental data file transfer
1120	   IXFR           251  incremental authoritative data file xfr
1121	   AXFR           252  authoritative data file transfer
1122	   -----------------------------------------------------------

1124	   QData will hold values based on QType.

1126	   Following section describes issues related to QType=SEN.  Issues
1127	   related to all other QTypes (i.e. related to file transfer) will be
1128	   discussed afterwords.

1130	   For QType=SEN(1): QData=PI address that needs to be resolved.

1132	   The answer section, authority section and additional section will
1133	   have a number of resource records where the number will be specified
1134	   in the header.

1136	   On receiving a query, map server will return the matching record from
1137	   its database.  If response is address, the answer section will hold
1138	   the record of any one of these two types: SEN/DMN.

1140	   If Type=DMN, resolver needs to extract the mapped addresses as
1141	   described in section 4.1.

1143	   If Type=DMN, entire address range will appear in the form of
1144	   NetAddress/NetMask. This will have advantages while catching data for
1145	   any particular address, but getting the information of the entire
1146	   address range.

1148	   If the response is referral, answer section will be empty and the
1149	   authoritative section will hold the record with Type=REF.

1151	   If server supports recursion, for each iterative process that it
1152	   receives a record with Type=REF, it needs to push the record to the
1153	   additional section of the message that needs to be sent to the
1154	   resolver. So, additional section will hold the records of Type=REF of
1155	   the chain of the tree through which PA addresses have been resolved.

1157	4.1.3. Master file and data file

1159	   Section 5 of RFC 1035 states:

1161	   "Master files are text files that contain RRs in text form.  Since
1162	   the contents of a zone can be expressed in the form of a list of RRs
1163	   a master file is most often used to define a zone, though it can be
1164	   used to list a cache's contents."

1166	   Section 5.1 of RFC 1035 states:

1168	   "The format of these files is a sequence of entries.  Entries are
1169	   predominantly line-oriented, though parentheses can be used to
1170	   continue a list of items across a line boundary, and text literals
1171	   can contain CRLF within the text.  Any combination of tabs and spaces
1172	   act as a delimiter between the separate items that make up an entry.
1173	   The end of any line in the master file can end with a comment.  The
1174	   comment starts with a ";" (semicolon)."

1176	   Master files follow the same approach and format in the line of DNS
1177	   as described in section 5 of RFC 1035 with necessary differences.

1179	   An example master file may look like as follows:

1181	   @ "PI NetAddr"/"Net Mask"  SOA  "PI address of primary server" (
1182	                                    20     ; SERIAL
1183	                                    7200   ; REFRESH
1184	                                    600    ; RETRY
1185	                                    3600000; EXPIRE
1186	                                    60)    ; MINIMUM
1187	   "PI NetAddr"/"Net Mask"    MPS  0  NAddr "PA addresses"
1188	   "PI NetAddr"/"Net Mask"    SEN  0  NAddr "PA addresses"
1189	   "PI NetAddr"/"Net Mask"    DMN  0  NAddr "Net addresses"
1190	   "PI NetAddr"/"Net Mask"    DEF  0  NAddr "Net addresses"
1191	   "PI NetAddr"/"Net Mask"    IND  0  NAddr(=1) "Data file name"

1193	   A data file contains a sequence of entries where each entry appears
1194	   in a separate line. Each entry is a mapping between a PI address and
1195	   its associated PA address separated by space(s). Entries are
1196	   generally sorted with PI address.  As in case of master file comments
1197	   can be inserted with the start of a ";" (semicolon) that will end at
1198	   the end of the line.  Data files are commonly associated with the map
1199	   servers maintained by regional authority, but they are not generally
1200	   associated with the map servers maintained by individual customer
1201	   networks. A data file entry may appear to be as follows:

1203	   "PI Address" NAddr "PA Addresses"
1204	   A map server may have a number of data files. These files have to be
1205	   defined in another file (a supporting file, the way boot file
1206	   "named.boot" is used in BIND) that will have information of each of
1207	   them. An entry in that file will follow the same format of a record
1208	   (Type=DFL) and will have the following fields:

1210	   "PI NetAddr"/"NetMask" Type(DFL) TTL "Data File Name" SERIAL "Number
1211	   of records".

1213	   This file will be used to process message with QType=DFL which will
1214	   be used to support data file transfer/incremental data file transfer.

1216	   For QType=DFL(8): QData="PI NetAddr"/"NetMask" of the desired network
1217	   For QType=SOA(6): QData="PI NetAddr"/"NetMask" of the desired zone

1219	   A map server will return a record of Type=DFL on receiving a query
1220	   with QType=DFL where as it will return a record of Type=SOA on
1221	   receiving a query with QType=SOA.

1223	4.1.4. Zone maintenance and transfers

1225	   Section 4.3.5 of RFC 1034 states:

1227	   "The general model of automatic zone transfer or refreshing is that
1228	   one of the name servers is the master or primary for the zone.
1229	   Changes are coordinated at the primary, typically by editing a master
1230	   file for the zone.  After editing, the administrator signals the
1231	   master server to load the new zone.  The other non-master or
1232	   secondary servers for the zone periodically check for changes (at a
1233	   selectable interval) and obtain new zone copies when changes have
1234	   been made.

1236	   To detect changes, secondaries just check the SERIAL field of the SOA
1237	   for the zone.  In addition to whatever other changes are made, the
1238	   SERIAL field in the SOA of the zone is always advanced whenever any
1239	   change is made to the zone."

1241	   Section 1.2 of RFC 5936 states:

1243	   "A DNS implementation is not required to support AXFR, IXFR, and
1244	   NOTIFY, but it should have some means for maintaining name server
1245	   coherency.  A general-purpose DNS implementation will likely support
1246	   AXFR (and in the same vein IXFR and NOTIFY), but turnkey DNS
1247	   implementations may exist without AXFR."

1249	   Zone maintenance and transfer will follow the same approach as DNS
1250	   with few minor updates. Frequency of update of data files will be
1251	   high compared to the frequency of update of master file. That is why
1252	   transfer(/incremental transfer) of data file has been treated
1253	   separately from the transfer(/incremental transfer) of master file.

1255	   For all the messages of QType=AXFR/DFXFR/IXFR/DFIXFR, QData="PI
1256	   NetAddr"/"NetMask" of the desired zone or the desired network. NOTIFY
1257	   message needs to include which file has been updated followed by the
1258	   related information. So, if master file has been changed, NOTIFY
1259	   message with query type SOA will be sent and query type DFL will be
1260	   sent if a data file has been changed.

1262	   Transfer of master file will be same as transfer of master file in
1263	   DNS followed by transfer of all the data files. i.e. processing of
1264	   AXFR will have the same approach as DNS followed by DFXFR for all the
1265	   data files. In order to make this happen, at the end of transferring
1266	   the contents of the master file, server (of AXFR message) needs to
1267	   send NOTIFY message for all of the data files belonging to that zone
1268	   to the client(i.e. the secondary server). Processing of NOTIFY of a
1269	   data file by the secondary server needs to send DFIXFR to the primary
1270	   if data file already exist; otherwise it needs to send DFXFR.
1271	   Incremental update of master file (IXFR) will be same as IXFR in DNS
1272	   with a minor update. If client of IXFR finds a new data file gets
1273	   introduced, it calls DFXFR corresponding to that data file. Similarly
1274	   if an entry of a data file gets deleted, client deletes corresponding
1275	   data file.

1277	   Processing of DFXFR will have same approach of AXFR in DNS.
1278	   Similarly processing of DFIXFR will have same approach as IXFR in
1279	   DNS.  While transferring a data file record, an equivalent record of
1280	   type SEN needs to be sent with the values of PI address and mapped PA
1281	   address(es) from the record of data file. Where ever a record of type
1282	   SOA is sent while processing AXFR/IXFR in case of DNS, record of type
1283	   DFL needs to be sent while processing DFXFR/DFIXFR.

1285	   For AXFR, IXFR and NOTIFY in DNS, one needs to follow RFC 5936[13],
1286	   RFC 1995[14] and RFC 1996[15] respectively.

1288	5. Issues related to IP mobility

1290	   An interface of a customer network may have several IP addresses
1291	   (e.g. for a multihomed customer site, each interface will have
1292	   multiple global unicast addresses also it may have private
1293	   addresses). For a mobile node that has been moved to a customer
1294	   network which gets service from a service provider and maintains
1295	   private IP addresses, will have at least three IP addresses; provider
1296	   assigned unicast address, private address and its permanent "Home
1297	   Address". The "Home Address" will be aliased with the provider
1298	   assigned address (i.e. the co-located care-of address). So the
1299	   interface structure needs to have an additional field to hold the
1300	   value of care-of address. The PCB structure will have an additional
1301	   field 'inp_lcladdr'.  So 'inp_lcladdr' will have the current provider
1302	   assigned address that a foreign node needs to use for communication.
1303	   The field 'inp_laddr' that is used to hold the value of local address
1304	   will hold the value of "Home Address" of a mobile node. Similarly,
1305	   PCB needs to introduce another field 'inp_fcladdr' to support the
1306	   destination address to be mobile.  The existing field 'inp_faddr'
1307	   which is used to address a foreign address will hold the value of
1308	   "Home Address" of the mobile node. Customers with PI address who
1309	   would like to have mobility support, the mapped address will be
1310	   considered as the "Home Address" of the mobile node.

1312	   An outgoing packet from a mobile node in a foreign site needs to be
1313	   stacked with the associated care-of address. While initiating
1314	   communication, the 'bind' system call needs to go through the
1315	   interface list and fetch the associated structure to check whether
1316	   the source address is aliased or not and needs to fill the value of
1317	   'inp_lcladdr' of PCB accordingly.

1319	   When TCP receives a SYN for connection establishment, it allocates a
1320	   PCB and assigns the values for 'inp_laddr', and related fields.
1321	   During this phase, TCP also needs to check whether the local address
1322	   is aliased or not (based on the fields of interface structure; which
1323	   is applicable for a mobile node at foreign site) and needs to fill
1324	   the values of 'inp_lcladdr' accordingly. Similarly if destination
1325	   address is found to be aliased, based on the stacking type, it needs
1326	   to fill up the field 'inp_fcladdr'.

1328	   IP address stacking can be performed with the approach introduced in
1329	   section 6.4 of RFC6275[9]. RFC6275 talks about the stacking of IP
1330	   addresses for a destination address (Let us call it as type 0
1331	   stacking). Two more types of stacking need to be introduced; type 1
1332	   stacking where only source address will appear in the stack and type
1333	   2 stacking where both source address and destination address will
1334	   appear in the stack with a particular type of ordering.

1336	   Protocol output routine like 'tcp_output' or 'udp_output' needs to
1337	   fill the IP packet in the following manner.

1339	   If the socket contains a valid 'inp_lcladdr', use 'inp_lcladdr' as
1340	   the source address and 'inp_laddr' will appear in the stack. If the
1341	   socket contains a valid 'inp_fcladdr' use 'inp_fcladdr' as the
1342	   destination address and 'inp_faddr' will appear in the stack. If only
1343	   'inp_fcladdr' contains a valid address where as 'inp_lcladdr' is
1344	   NULL, use type 0 stacking. If only 'inp_lcladdr' contains a valid
1345	   address where as 'inp_fcladdr' is set as NULL, use type 1 stacking.
1346	   If both 'inp_lcladdr' and 'inp_fcladdr' contains valid addresses, use
1347	   type 2 stacking.

1349	   Protocol input routine like 'tcp_input' or 'udp_input' needs to
1350	   process the packet in the reverse order based on the type of
1351	   stacking.  For type 0 stacking, use the address in the stack as the
1352	   destination address; for type 1 stacking, use the address in the
1353	   stack as the source address; for type 2 stacking use both source
1354	   address and destination address from the stack.

1356	5.1. Changes expected with the specifications related to IP mobility

1358	   RFC6275 demands correspondent node binding from mobile nodes for
1359	   route optimization. This binding is required when a connection gets
1360	   established as well as when the mobile node changes it address space.
1361	   There are application like HTTP which opens up multiple connections
1362	   on the run time which are very short lived. If mobile nodes need to
1363	   send binding messages for all the connections, network will be
1364	   unnecessarily congested. This congestion can be avoided with the
1365	   establishment of binding at the time of connection establishment
1366	   itself.  So, if TCP server happens to be mobile, it will set the
1367	   value of 'inp_lcladdr' in the stack while sending SYN+ACK. TCP client
1368	   which initiates communication through 'connect' needs to set
1369	   'inp_fcladdr' field on receiving TCP+ACK. With this approach
1370	   correspondent node binding messages need to be sent only when a
1371	   mobile node changes its position from one address space to another.

1373	   Route optimization is not applicable to applications which are of
1374	   multicast type.  In these cases packets need to be forwarded with the
1375	   mechanism of reverse tunneling with the approach of "IP Encapsulation
1376	   within IP" as defined in RFC2003.  In order to support packet
1377	   delivery with route optimization method as well as with
1378	   "Encapsulating Delivery Style" based on the application type the
1379	   protocol control block needs to introduce another field
1380	   'inp_hagentaddr' to hold the address of the home agent of the mobile
1381	   node. The interface structure also needs to have same field. The
1382	   'bind' system call needs to go through the interface list to fetch
1383	   'inp_hagentaddr' to the PCB along with 'inp_lcladdr' as described
1384	   earlier. So, protocol output routines like 'tcp_output', 'udp_output'
1385	   need to fill up the packets based on the application type. In
1386	   "Encapsulating Delivery Style" packets need to be formed in the
1387	   following manner.

1389	   The inner IP header will contain
1390	      Source Address: Home address of the mobile node
1391	      (i.e. 'inp_laddr')
1392	      Destination address: Address of the correspondent node
1393	      (i.e. 'inp_faddr')
1394	   The outer IP header will contain
1395	      Source Address: co-located care of address of the mobile node
1396	      (i.e. 'inp_lcladdr')
1397	      Destination Address: Address of the home agent of the mobile node
1398	      (i.e. 'inp_hagentaddr')
1399	   Protocol field: IP in IP

1401	6. Refinements over existing IPv6 specification

1403	   As IPv6 was envisioned long before some of the newer technologies
1404	   e.g. MPLS came into picture, some refinements can be made over the
1405	   existing specification. These considerations are related to bandwidth
1406	   usages and performance inside switches. Experimental results show
1407	   that smaller packet size gives better result for the processing of RT
1408	   packets.  So, it is desirable to have IP packet header to be as small
1409	   as possible.

1411	   As described earlier, evaluation of the parameters
1412	   nMaxInterASTopNodes, nMaxInterASBottomNodes and nMaxASNodes is geo-
1413	   political and have to be decided by IANA. Once these parameters are
1414	   determined with mutual agreements, values of pA, pB, pC and prefix
1415	   length of user id can be determined. With 64 bit address space, IP
1416	   header will be reduced by 16 bytes.

1418	   The 'flow label' field of IPv6 packet header may not be of any use
1419	   with MPLS is in use. ATM used to have 4 priority classes. The first
1420	   specification of IPv6 RFC-1883 used a 4 bit type of service field
1421	   along with a 24 bit flow label field. These two were modified to a 8
1422	   bit type of service field and a 20 bit flow label field in the
1423	   current spec RFC-2460.  Too many priority classes may increase
1424	   complexities to process inside switches. If type of service field of
1425	   IPv6 header may be reduced to be of 4 bit length as it was stated in
1426	   RFC-1883 and 'flow label' field gets removed, another three bytes may
1427	   be reduced from the IPv6 header.

1429	   The field 'Hop Limit' has got a 8 bit value in the existing spec. The
1430	   role of this field needs to be discussed properly with a large
1431	   address space.

1433	   RFC4862[16] introduces the concept of "Stateless auto configuration"
1434	   with the goal in mind that no manual configuration is required by
1435	   individual machines before connecting them to the network. It
1436	   generates a link local address with a link-local prefix and the link
1437	   address (e.g. Ethernet/E.164 for ISDN) first. This link local address
1438	   is used to configure global unicast address and any other
1439	   configurable parameters based on router advertisement.  Global
1440	   unicast addresses are generated by the prefix supplied by the router
1441	   advertisement and the link specific interface identifier. This
1442	   identifier can be as large as 64 bit length. So irrespective of the
1443	   size of the network (it may be 10000 or 100 or even less than that)
1444	   every customer network will consume a 64 bit equivalent addresses.

1446	   This seems to be a huge blunder. What is expected is the length of
1447	   the interface identifier is equivalent to support the number of nodes
1448	   supported by that subnet. In order to achieve this the router itself
1449	   or a server in that subnet needs to maintain a storage which will
1450	   generate the interface identifier based on the request from
1451	   individual hosts.  It may be desirable that interface identifiers are
1452	   generated from DHCP servers. With the option of generating interface
1453	   identifier through DHCP, changes in the auto configuration process
1454	   can be looked at as follows:

1456	   From the point of view of a host, it can be considered as a two step
1457	   process. Host needs to send Router Solicitations message to find out
1458	   the presence of a router. Router Advertisement message should include
1459	   an option field which will inform whether prefix information should
1460	   be configured through Router Advertisement or through DHCP.  Host
1461	   needs to send a request message to get the interface identifier.  If
1462	   both the information needs to be obtained from a DHCP server they can
1463	   be obtained through a single message.

1465	   From the server's point of view, it needs to maintain a database for
1466	   a mapping of the link-layer address and subnet specific interface
1467	   identifier. Lifetime of an interface identifier has to be processed
1468	   in the usual manner the way existing DHCP implementation treats IP
1469	   addresses.

1471	   There seem to be another possible danger to obtain prefix information
1472	   through Router Advertisement. As the Router Advertisement comes in
1473	   the form of ICMP messages, once it is received by the ICMP layer, it
1474	   looses information from which interface the message has been received
1475	   (This problem arises for hosts that are having multiple interfaces
1476	   and not all of them are attached to the same subnet).  So, auto
1477	   configuration of a host has to be performed one interface at a time
1478	   by making all other interfaces disabled. Once configuration of all
1479	   the interfaces are done, all of them have to be enabled.

1481	   If it is expected that hosts should reconfigure their addresses
1482	   dynamically based on Router Advertisement message, Router
1483	   Advertisement needs to generate a special message for a certain
1484	   amount of time that needs to include old prefix and the corresponding
1485	   new prefix in the message.

1487	   In order to support multihoming[8], prefix information needs to
1488	   include the fields 'default router' and 'next hop address' to reach
1489	   the default router for each of the prefixes.

1491	   In a 64 bit architecture, link-local address can be formed with a
1492	   link-local prefix and link-layer address in a suitable manner; say it
1493	   can be formed with a 4 bit link-local prefix followed by a 60 bit
1494	   link-layer address. IPv6 supports Modified EUI-64 format for hardware
1495	   that supports 48 bit addressing by inserting a padding of 16 bit (FF
1496	   FE) in between company_id and manufacturer selected extension
1497	   identifier. In order to make things work, this padding has to be
1498	   reduced to 12 bit. For hardware that support E.164 format, uses a 15
1499	   digits number in BCD format followed by a padding of four bits set to
1500	   1111. Thus in this case, link local address can be formed with the
1501	   link-local prefix followed by the most significant 60 bit of E.164
1502	   format.

1504	7. Distributed processing and Multicasting

1506	   With the inherent hierarchy involved in this architecture,
1507	   distributed applications can also be structured in a suitable manner.
1508	   Say, for a commonly used web based application a master level server
1509	   will be there at every top level node. Any change that might happen
1510	   in the application, has to be synchronized within these master level
1511	   servers first. There might be servers at the middle layer (inside
1512	   each inter-AS-bottom) inside each top level node. Once the changes
1513	   get reflected at the master node, all the servers at the middle layer
1514	   needs to update themselves with their master level node. This will
1515	   reduce network traffic substantially. Inherent hierarchy in the
1516	   architecture will also help establishing multicast tree in the
1517	   similar manner. Work on these issues can be progressed only after
1518	   this architecture gets approved.

1520	8. Transition to real IP from private IP

1522	   Both CIDR and mesh structured hierarchy expects a VLSM tree at the
1523	   bottom. In VLSM, in real IP space with provider assigned (PA)
1524	   addresses, assignment of network resources has to be associated with
1525	   the address space to be used with the type of service. Within a
1526	   typical switch supporting multiple types of ports, a line card of
1527	   strength OC48 can be replaced with 4 line cards of strength OC12. An
1528	   OC12 card may also be replaced with 4 OC3 cards. An OC12 card may be
1529	   attached to another switch with DS3 ports and so on. When it reaches
1530	   to the customer network port density of a switch has to be directly
1531	   proportional to the address block that a customer network will be
1532	   assigned to. i.e. each customer network has to be assigned a block of
1533	   address space (say, 128, 256, 512, 1K, 2K etc). Within the switch
1534	   these ports have to be assigned net address/net mask the way VLSM
1535	   works.

1537	   In IPv4 environment, providers have provided services in terms of
1538	   bandwidth of the ports say, 2 Mbps/4 Mbps/1 Gbps line etc. If these
1539	   ports were assigned addresses based on the number of users of the
1540	   customer network, transition from private IP to real IP is simple.
1541	   Consider a switch that has supplied 2 Mbps line to a set of customers
1542	   with number of users within 1K to 2k, each of them will be assigned a
1543	   block of 2K each. But if number of users are not proportional to the
1544	   bandwidth used, say same 2 Mbps line were used to customers of sizes
1545	   1K, 2K 10K and 16K respectively reorganization will be needed if
1546	   possible. This rearrangement may be possible within the switch itself
1547	   or by connecting ports of appropriate sizes from different switch,
1548	   otherwise each of them has to be assigned an address block of 16K
1549	   each or with the way VLSM works whatever is suitable. So, address
1550	   block assignment in the VLSM tree has to grow in a bottom up
1551	   approach.

1553	   Thus, transition of existing provider network without (or very
1554	   little) rearrangement to a real IP space with CIDR based approach is
1555	   apparently not a difficult job. In a CIDR based approach, sizes of
1556	   the VLSM trees are heterogeneous that leads to number of routing
1557	   entries to be very high. Mesh structured hierarchy is convenient to
1558	   reduce the routing overhead as well as for distribution of network
1559	   resources in a suitable manner in the long run. To covert CIDR based
1560	   approach to mesh structured hierarchy requires reorganization mainly
1561	   in the routing domain and by splitting trees of very large sizes (>24
1562	   bit address space) at the top.

1564	   Section 3.1 shows that routing table of the external world need not
1565	   be passed down to the routers inside VLSM tree. With this approach,
1566	   loads on the routers inside VLSM tree will be reduced substantially.
1567	   Same is applicable for CIDR based architecture as well.

1569	   Section 3.2.1 reveals that in mesh structured hierarchy a 64 bit
1570	   architecture will be good enough for our need in a provider assigned
1571	   (PA) address space; the same is true for CIDR based approach as well.

1573	9. IANA Consideration

1575	   IANA is requested to assign a port number and service name for PI
1576	   address resolution for both TCP and UDP. IANA is also requested to
1577	   assign RSVP parameters (i.e. class number) for the objects
1578	   VPN_ATTRIBUTE and VPN_LABEL and an error sub-code for VPN label
1579	   allocation error under Error Code = 24.

1581	10. Security Consideration

1583	   This document does not include any security related issues.

1585	11. Acknowledgments

1587	   The author would like to thank to Professor Amitava Datta of
1588	   University of Western Australia for his review and constructive
1589	   comments.

1591	12. Normative References

1593	   [1]  Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms for
1594	        IPv6 Hosts and Routers", RFC 4213, October 2005.

1596	   [2]  Fuller V., Li. T., "Classless Inter-Domain Routing (CIDR): The
1597	        Internet Address Assignment and Aggregation Plan", RFC 4632,
1598	        August 2006.

1600	   [3]  Huston, G., "Commentary on Inter-Domain Routing in the
1601	        Internet", RFC 3221, December 2001.

1603	   [4]  Q. Vohra, E. Chen., "BGP Support for Four-octet AS Number
1604	        Space", RFC 4893, May 2007.

1606	   [5]  Srisuresh, P. and K. Egevang, "Traditional IP Network Address
1607	        Translator (Traditional NAT)", RFC 3022, January 2001.

1609	   [6]  J. Moy., "OSPF Standardization Report", RFC 2329, April 1998

1611	   [7]  E. Rosen, Y. Rekhter, "BGP/MPLS IP Virtual Private Networks
1612	        (VPNs)", RFC 4364, February 2006.

1614	   [8]  S. Bandyopadhyay, "Solution for Site Multihoming in a Real IP
1615	        Environment", <draft-shyam-site-multi-41> work in progress.

1617	   [9]  C. Perkins, Ed., D. Johnson, J. Arkko, "Mobility Support in
1618	        IPv6" RFC 6275, July 2011.

1620	   [10] P.V. Mockapetris., "Domain names - concepts and facilities",
1621	        RFC 1034, November 1987.

1623	   [11] P.V. Mockapetris, "Domain names - implementation and
1624	        specification", RFC 1035, November 1987.

1626	   [12] D. Eastlake 3rd, "Domain Name System (DNS) IANA
1627	        Considerations", RFC 5395, November 2008.

1629	   [13] E. Lewis, A. Hoenes, Ed., "DNS Zone Transfer Protocol (AXFR)",
1630	        RFC 5936, June 2010.

1632	   [14] M. Ohta, "Incremental Zone Transfer in DNS", RFC 1995,
1633	        August 1996.

1635	   [15] P. Vixie, "A Mechanism for Prompt Notification of Zone Changes
1636	        (DNS NOTIFY)", RFC 1996, August 1996.

1638	   [16] S. Thomson, T. Narten, T. Jinmei, "IPv6 Stateless Address
1639	        Autoconfiguration", RFC 4862, September 2007.

1641	   [17] D. Awduche, L. Berger, D. Gan, T. Li, V. Srinivasan, G. Swallow,
1642	        "RSVP-TE: Extensions to RSVP for LSP Tunnels", RFC 3209,
1643	        December 2001.

1645	13. Informative References

1647	   [18] Postel, J., "Internet Protocol", STD 5, RFC 791,
1648	        September 1981.

1650	   [19] Rekhter, Y., and T., Li, "A Border Gateway Protocol 4 (BGP-
1651	        4)",RFC 1771, March 1995.

1653	   [20] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6)
1654	        Specification, RFC 1883, December 1995.

1656	   [21] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998.

1658	   [22] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6)
1659	        Specification", RFC 2460, December 1998.

1661	   [23] Rosen, E., Viswanathan, A. and R. Callon, "Multiprotocol
1662	        Label Switching Architecture", RFC 3031, January 2001.

1664	14. Author's Address

1666	   Shyamaprasad Bandyopadhyay
1667	   HL No 205/157/7, Kharagpur 721305, India
1668	   Phone: +91 3222 225137
1669	   e-mail: shyamb66@gmail.com