idnits 2.17.1 

draft-shyam-real-ip-framework-53.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 11 instances of lines with non-RFC6890-compliant IPv4
     addresses in the document.  If these are example addresses, they should
     be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 860 has weird spacing: '...lent to  the a...'

  -- The document date (February 10, 2019) is 1902 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  -- Looks like a reference, but probably isn't: 'RFC6177' on line 1505

  -- Looks like a reference, but probably isn't: 'RFC4692' on line 1510

  == Unused Reference: '19' is defined on line 1698, but no explicit
     reference was found in the text

  == Unused Reference: '20' is defined on line 1701, but no explicit
     reference was found in the text

  == Unused Reference: '21' is defined on line 1704, but no explicit
     reference was found in the text

  == Unused Reference: '22' is defined on line 1707, but no explicit
     reference was found in the text

  == Unused Reference: '23' is defined on line 1709, but no explicit
     reference was found in the text

  == Unused Reference: '24' is defined on line 1712, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 4893 (ref. '4') (Obsoleted by RFC 6793)

  ** Obsolete normative reference: RFC 5395 (ref. '12') (Obsoleted by RFC
     6195)

  -- Obsolete informational reference (is this intentional?): RFC 1771 (ref.
     '20') (Obsoleted by RFC 4271)

  -- Obsolete informational reference (is this intentional?): RFC 1883 (ref.
     '21') (Obsoleted by RFC 2460)

  -- Obsolete informational reference (is this intentional?): RFC 2460 (ref.
     '23') (Obsoleted by RFC 8200)


     Summary: 2 errors (**), 0 flaws (~~), 9 warnings (==), 6 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	INTERNET DRAFT                                          S. Bandyopadhyay
3	draft-shyam-real-ip-framework-53.txt                   February 10, 2019
4	Intended status: Experimental
5	Expires: August 10, 2019

7	    An Architectural Framework of the Internet for the Real IP World
8	                  draft-shyam-real-ip-framework-53.txt

10	Abstract

12	   This document tries to propose an architectural framework of the
13	   internet in the real IP world. It describes how a three-tier mesh
14	   structured hierarchy can be established in a large address space
15	   based on fragmenting it into some regions and some sub regions inside
16	   each of them. It shows how to make a transition from private IP to
17	   real IP without making significant changes with the existing network.
18	   With the useful works done through IPv6, it provides all necessary
19	   inputs based on which a specification of IP with 64 bit address space
20	   may be emerged.

22	Status of this Memo

24	   This Internet-Draft is submitted in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF).  Note that other groups may also distribute
29	   working documents as Internet-Drafts.  The list of current Internet-
30	   Drafts is at http://datatracker.ietf.org/drafts/current/.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   This Internet-Draft will expire on August 10, 2019.

39	Copyright Notice

41	   Copyright (c) 2019 IETF Trust and the persons identified as the
42	   document authors. All rights reserved.

44	   This document is subject to BCP 78 and the IETF Trust's Legal
45	   Provisions Relating to IETF Documents
46	   (http://trustee.ietf.org/license-info) in effect on the date of
47	   publication of this document. Please review these documents
48	   carefully, as they describe your rights and restrictions with respect
49	   to this document.

51	Table of Contents
52	   1. Introduction.....................................................2
53	   2. Background.......................................................3
54	   3. A Three tier mesh structured hierarchical network................4
55	      3.1. Route propagation...........................................5
56	      3.2. Determination of prefix lengths.............................8
57	           3.2.1. A pseudo optimal distribution of prefixes in
58	                  a 64 bit architecture................................9
59	           3.2.2. Whether to go for a two tier or three tier hierarchy
60	                  ....................................................11
61	      3.3. Issues related to Satellite communications.................11
62	           3.1.1. Setting default route inside VLSM tree..............12
63	           3.1.2. IP VPN with MPLS inside VLSM tree...................14
64	                  3.1.2.1. Extension to RSVP-TE to support IP
65	                           VPN inside VLSM tree.......................14
66	   4. Provider Independent addressing, name services and multihoming..16
67	      4.1. PI address Resolution......................................18
68	           4.1.1. Record Format.......................................21
69	           4.1.2. Messages............................................23
70	           4.1.3. Master file and data file...........................25
71	           4.1.4. Zone maintenance and transfers......................26
72	   5. Issues related to IP mobility...................................27
73	      5.1. Changes expected with the specifications related
74	           to IP mobility.............................................29
75	   6. Refinements over existing IPv6 specification....................30
76	   7. Distributed processing and Multicasting.........................33
77	   8. Transition to real IP from private IP...........................33
78	   9. IANA Consideration..............................................34
79	   10. Security Consideration.........................................34
80	   11. Acknowledgments................................................34
81	   12. Normative References...........................................35
82	   13. Informative References.........................................36
83	   14. Author's Address...............................................36

85	1. Introduction

87	   Transition from IPv4 to IPv6 is in the process. Work has been done to
88	   upgrade individual nodes (workstations) from IPv4 to IPv6. Also,
89	   there are established documents to make routers/switches to work to
90	   support IPv4 as well as IPv6 packets simultaneously in order to make
91	   the transition possible [1]. CIDR[2] based hierarchical architecture
92	   in the existing 32-bit system is supposed to be continued in IPv6 too
93	   with a large address space. There are documents/concerns over BGP
94	   table entries to become too large in the existing system [3]. There
95	   are proposals to upgrade Autonomous System number to 32-bit from
96	   16-bit to support the demand at the same time [4]. The challenge
97	   relies on how to make the transition smooth from IPv4 to a real IP
98	   world with least changes possible.

100	   The term "real IP environment" is referred to an environment where
101	   hosts in a customer network will possess globally unique IP addresses
102	   and communicate with the rest of the world without the help of
103	   NAT[5]. This document reflects changes required with the BSD 4.4
104	   source code where ever applicable.

106	2. Background

108	   Existing system is in work with Autonomous System (AS) and inter-AS
109	   layer with the approach of CIDR. In order to meet the need within the
110	   32-bit address space, Autonomous Systems of various sizes maintain
111	   CIDR based hierarchical architecture. With the help of NAT [5], a
112	   stub network can maintain an user ID space as large as a class A
113	   network and can meet its useful need to communicate with the rest of
114	   the world with very few real IP addresses. With the combination of
115	   CIDR and NAT applied in the entire space, most of the part of 32-bit
116	   address space gets effectively used as network ID.

118	   With traditional CIDR based hierarchy, a node of higher prefix can be
119	   divided into number of nodes with lower prefixes. Each divided node
120	   can further be subdivided with nodes of further lower prefixes. This
121	   process can be continued till no further division is possible. The
122	   point worth noting is at each point the designer of the network has
123	   to preconceive the future expansion of the network with the concept
124	   in the mind that the resource can not be exhausted at any point of
125	   time. This phenomenon leads the designer to allocate resources much
126	   higher than whatever is needed which leads to a space of unused
127	   address space. The problem gets aggravated once resource gets
128	   exhausted by any chance. e.g. a node of prefix /16 can be divided
129	   with a number of nodes of prefixes /24. If any one of the nodes /24
130	   gets exhausted, resources of other nodes of prefixes /24 can not be
131	   used even if they are available.

133	   In IPv4 environment, there is a desperate attempt of the service
134	   providers to provide internet services with the help of NAT. e.g. a
135	   large educational institute meets its current requirement with 4 real
136	   IP addresses; one for its mail server, one for its web server, one
137	   for its ftp server and another one for its proxy server to provide
138	   web based services to all of its users. In general, these services
139	   are used by an organization of any size(it may be 400 or even 40000).
140	   In the current scenario, the CIDR based tree has been built using
141	   these components together. When private IP will be replaced with real
142	   IP, each customer network will require IP addresses based on its size
143	   and requirement.

145	   Transitioning from private IP to real IP basically requires the
146	   following components:

148	      o A solution for site multihoming with provider assigned
149	        address space
150	      o A strategy to replace private IP to real IP
151	      o A solution to uniquely identify a host in a real IP environment
152	      o A solution to make individual nodes and routers/switches to work
153	        with IPv4 and next generation IP simultaneously.

155	   Solution for site multihoming has been provided in a separate
156	   document [8]. Section 8 shows how to make a transition from private
157	   IP space to real IP space with provider assigned addresses with CIDR
158	   based approach itself without reorganization of the existing provider
159	   network. Section 4 provides a solution for identifying a host
160	   uniquely with a number in a real IP environment. RFC 4213 [1] has
161	   already described the transition mechanism from IPv4 to IPv6 for
162	   individual nodes and routers.

164	   Transitioning to real IP will eliminate the extra routing entries
165	   associated with multihomed sites and thus will reduce the size of the
166	   BGP table substantially. Assignment of addresses requires an
167	   architectural framework. It may continue with the existing CIDR based
168	   architecture (provided transitioning to real IP will be good enough
169	   to handle all routing related issues for ever) or may come out with a
170	   different approach. Mesh structured hierarchy will reduce the growth
171	   of routing entries in a CIDR based environment as well as convenient
172	   for distribution of network resources in a suitable manner in the
173	   long run.

175	   This document also tries to resolve and enhance several issues that
176	   were carried on as part of deployment of IPv6. It shows that a 64 bit
177	   address space is good enough for all practical purposes. With the
178	   useful works done through IPv6, it provides all necessary inputs
179	   based on which a specification of IP with 64 bit address space may be
180	   emerged.

182	3. A Three-tier mesh structured hierarchical network

184	   As Autonomous Systems of various sizes are supported, Autonomous
185	   Systems and the nodes inside the Autonomous Systems can be viewed as
186	   graphically lying on the same plane within the address apace. If
187	   network can be viewed as lying on different planes, routing issues
188	   can be made simpler. If network is designed with a fixed length of
189	   prefix for the Autonomous System everywhere, routing information for
190	   the rest will get confined with the other part of the network prefix.
191	   Which means the maximum size of AS gets assigned to all irrespective
192	   of their actual sizes. This can be made possible with the advantage
193	   of using a large address space and dividing it into number of regions
194	   of fixed sizes inside it. Thus entire network can be viewed as a
195	   network of inter-AS layer nodes. Each node in the inter-AS layer can
196	   act either only as a router in the inter-AS layer or as a router in
197	   the inter-AS layer with an Autonomous System attached to it with a
198	   single point of attachment or as an Autonomous System with multiple
199	   Autonomous System border routers (ASBR) appearing like a mesh. Thus
200	   two tier mesh structured hierarchy gets established between AS layer
201	   and inter-AS layer with each AS having a fixed length of prefix.

203	   Based on the definition of Autonomous System, it is a small area
204	   within the entire network that maintains its own independent identity
205	   that communicates with the rest of the world through some specific
206	   border routers. In the similar manner, if a larger area (say region
207	   or state) can be considered as network of Autonomous Systems, that
208	   can maintain its own identity by communicating with the rest of the
209	   world through some border routers (say, state border router), mesh
210	   structured hierarchy can be established within the inter-AS layer.
211	   The inter-AS layer will be split into inter-AS-top and inter-AS-
212	   bottom. To maintain this hierarchy, each node of inter-AS-top needs
213	   to have multiple regional or state border routers (say, SBR) through
214	   which each one will communicate with the rest of the world in the
215	   similar manner an Autonomous System maintains ASBR. Thus, entire
216	   network will appear as a network of nodes of inter-AS-top layer. To
217	   maintain hierarchy, each node of the inter-AS-top needs to have a
218	   fixed length of prefix. i.e. each node of the inter-AS top will be
219	   assigned a maximum (fixed) number of nodes of Autonomous Systems.

221	   Thus, with three-tier mesh structured hierarchy in the network layer,
222	   network ID can be viewed as A.B.C. If pA, pB and pC be the prefix
223	   lengths of inter-AS-top, inter-AS-bottom and AS layers respectively,
224	   there will be 2^pA nodes at the topmost layer, 2^pB at the inter-AS-
225	   bottom layer and 2^pC nodes at the AS layer. Thus the entire space
226	   gets divided into a fixed number of regions and each region gets
227	   divided into fixed number of sub regions. This division is supposed
228	   to be made based on geography, population density and their demands
229	   and related factors.

231	   Let nMaxInterASTopNodes be the possible maximum number of nodes
232	   assigned at the top most layer and nMaxInterASBottomNodes be that at
233	   the inter-AS-bottom layer and nMaxASNodes at the AS layer. Where
234	   nMaxInterASTopNodes <= 2^pA and nMaxInterASBottomNodes <= 2^pB and
235	   nMaxASNodes <= 2^pC.

237	3.1. Route propagation

239	   With hierarchy established, routing information that gets established
240	   inside a node of inter-AS-top, does not need to be propagated to
241	   another node of inter-AS-top. Entire routing information of inter-AS-
242	   top layer needs to be propagated to inter-AS-bottom layer. So, each
243	   router of inter-AS layer will have two tables of information, one for
244	   the inter-AS-top and another for the inter-AS-bottom of the inter-AS-
245	   top node that it belongs to. BGP (with little modification) will work
246	   very well with a trick applied at the SBRs. Each SBR will not
247	   propagate the routing information of inter-AS-bottom layer of its
248	   domain to another SBR of neighboring domain. i.e. SBR of one top
249	   layer node will propagate routing information only of inter-AS-top
250	   layer to SBR of another top layer node. Inside a node of inter-AS-
251	   top, routing information of inter-AS-top and inter-AS-bottom need to
252	   be propagated from one ASBR to another neighboring ASBR. Inside a top
253	   layer node A, routing information of another top layer node B will
254	   have two parts; one for the list of SBRs through which a packet will
255	   traverse from top layer node A to B and another for the list of ASBRs
256	   through which the packet will traverse from one AS to another inside
257	   A. In terms of BGP, AS_PATH attribute will be split into two parts;
258	   one for the information of the top layer and another for the bottom
259	   layer. Within the same node A routing information of one AS to
260	   another AS will not have any top layer information. i.e. the top
261	   layer information will be set to as NULL.

263	   Similarly, each node of the AS layer will have three tables of
264	   routing entries. One for the inter-AS-top, one for the inter-AS-
265	   bottom and another for the routing information inside the Autonomous
266	   System itself.

268	   Introduction of hierarchy at the inter-AS layer reduces the size of
269	   the routing table substantially. With the availability of hardware
270	   resources if flat address space is maintained at each layer, problems
271	   related to CIDR can be avoided. With flat address space, no
272	   hierarchical relationship needs to be established between any two
273	   nodes in the same layer. So, all the nodes inside each layer can be
274	   used till they get exhausted. With flat address space (i.e.  without
275	   prefix reduction), BGP tables will have maximum nMaxInterASTopNodes +
276	   nMaxInterASBottomNodes entries.

278	   IGP like OSPF has got provision to divide AS into smaller areas. OSPF
279	   hides the topology of an area from the rest of the Autonomous System.
280	   This information hiding enables a significant reduction in routing
281	   traffic. With the support of subnetting, OSPF attaches an IP address
282	   mask to indicate a range of IP addresses being described by that
283	   particular route. With this approach it reduces the size of the
284	   routing traffic instead of describing all the nodes inside it, but
285	   introduces another level of hierarchy. If subnetting concept can be
286	   avoided from the AS layer(with the additional overhead of computation
287	   inside the SPF tree), each area can be configured from a free pool of
288	   addresses based on its requirement dynamically. So, an AS can be
289	   divided into number of areas of heterogeneous sizes with the nodes
290	   from a free pool of address space.

292	   Similarly, the concept of area can be introduced in the inter-AS-
293	   bottom layer the way it works in OSPF. The area border routers in the
294	   inter-AS-bottom layer have to behave exactly in the similar manner
295	   the way an ABR behaves in OSPF. i.e. an area border router will hide
296	   the topology inside an area to the rest of the world and will
297	   distribute the collected information inside the area to the rest. It
298	   will distribute the collected routing information from outside to the
299	   nodes inside as well. In order to implement this, protocol running in
300	   the inter-AS layer (say BGP) will have to introduce a 'cost' factor.
301	   This cost factor can be interpreted as the cost of propagation of a
302	   packet from one AS to another. The protocols running inside AS layer
303	   (RIP/OSPF, etc) will have to the supply the cost information for a
304	   packet to travel from one ASBR to another. All the protocols must
305	   behave in unison for supplying this information. The cost factor is
306	   needed for a remote node while sending a packet to a node inside an
307	   area while more than one area border routers are equidistant from
308	   that remote node. Thus inter-AS-bottom layer (i.e. one inter-AS-top
309	   level node) can be divided into number of areas of heterogeneous
310	   sizes with nodes of AS from a free pool of address space. BGP adopts
311	   a technique called route aggregation. Along with route aggregation it
312	   reduces routing information within a message. In the similar manner,
313	   introduction of area inside inter-AS-bottom layer will not only
314	   reduce the complexity of the protocol, but will reduce the size of a
315	   BGP packet substantially.

317	   With this architecture, each node(router) inside an AS is represented
318	   as A.B.C.  Each node may or may not be attached with a network which
319	   acts as a leaf node (i.e. a network will not act as a transit). In
320	   order to make use of user-id space properly and to support customer
321	   networks of heterogeneous sizes, the user-ID space needs to be
322	   divided as subnet-ID and user-ID. Profoundly, a VLSM (variable length
323	   subnet mask) type of approach (in the form of a tree) has to be
324	   adopted at each node of an AS. So, each node of the AS layer will act
325	   as the root of a tree whose leaves are independent small customer
326	   networks which will act as stub. As the routing information of inter-
327	   AS layer as well as AS layer need not be passed inside any node of
328	   the VLSM tree, each router inside the tree should maintain default
329	   route for any address outside of its network/domain. With this
330	   approach, load on each router of the service providers will become
331	   negligible. Protocols that supports VLSM with MPLS/VPN has to be
332	   implemented inside the tree. Inside the VLSM tree, all the physical
333	   ports of a switch have to be configured with the subnet mask. A light
334	   weight routing protocol can be developed on top of static routing
335	   table by setting default route inside VLSM tree.

337	   The fundamental assumptions based on which this architecture lies can
338	   be summarized as follows:

340	   i) Entire network can be viewed as a network of regions or states
341	   where each region or state can have its own identity by communicating
342	   with the rest of the world through some state border routers. Each
343	   region or state is a network of Autonomous Systems. Each region as
344	   well as each Autonomous System inside them will have a fixed
345	   (maximum) length of prefix.

347	   ii) Availability of hardware resources is such that flat address
348	   space can be maintained at the inter-AS layer.

350	   Introduction of mesh-structured hierarchy will have several
351	   advantages:

353	      o  Load at each router will get reduced substantially.
354	      o  Concept of CIDR style approach and complexity related to
355	           prefix reduction can be easily avoided.
356	      o  Mesh structured hierarchy will make traffic evenly distributed.
357	      o  Physical cable connection can be optimized.
358	      o  Administrative issues will become easier.

360	3.2. Determination of prefix lengths

362	   With this architecture, IP address can be described as A.B.C.D where
363	   the D part represents the user id. Each router in the inter-AS layer
364	   will have two tables of information, one for the inter-AS-top and
365	   another for the inter-AS-bottom of the inter-AS-top node that it
366	   belongs to. Whereas, each node of the AS layer will have three tables
367	   of routing entries; one for the inter-AS-top, one for the inter-AS-
368	   bottom and another for the routing information inside the Autonomous
369	   System itself. In the worst case. a node inside an AS needs to
370	   maintain nMaxInterASTopNodes + nMaxInterASBottomNodes + nMaxASNodes
371	   entries in its routing table.

373	   The dynamic nature of allocating an area from a free pool of address
374	   space is more frequent at the AS layer than at the inter-AS-bottom
375	   layer. As OSPF supports all the features needed, it can be considered
376	   as default choice in the AS layer. Existing implementation of OSPF
377	   (Version 2) supports subnetting, by which an entire area can be
378	   represented as a combination of network address and subnet mask. With
379	   this approach, entire routing table gets reduced substantially. With
380	   the removal of subnetting, all the nodes inside an area will have an
381	   entry inside the routing table (OSPF Version 1). So the deterministic
382	   factor is what is the maximum number of nodes inside an AS OSPF can
383	   support once subnetting support gets removed. So the prefix length of
384	   AS layer will be determined by this factor of OSPF.

386	   With the introduction of hierarchy in the inter-AS layer, number of
387	   entries in the BGP routing table will get reduced substantially. Even
388	   if pA and pB both are selected as 16, number of routing entries come
389	   within the admissible range of existing BGP protocol. But, it is the
390	   responsibility of IANA to come out with a scheme how
391	   nMaxInterASTopNodes and nMaxInterASBottomNodes are to be selected.
392	   Each top level node will have nMaxInterASBottomNodes nodes. It will
393	   be a waste of address space if each country gets assigned a top level
394	   nodes (e.g. china has got a population of 1,306,313,800 people where
395	   as Vatican City has got only 920 according to a census of 2006). So a
396	   moderate value of nMaxInterASBottomNodes is desirable, with which
397	   larger countries will have a number of top level nodes. e.g. each
398	   state of USA can be assigned a top level node. With the introduction
399	   of area in the inter-AS-bottom layer, each top level node can be
400	   divided into number of areas of heterogeneous sizes. So, a group of
401	   neighboring countries with less population can share the address
402	   space of a top level node. Similarly, user-id space has to be decided
403	   based on the largest area VLSM tree should be spanned through. All
404	   these issues are completely geo political and have to be decided by
405	   IANA.

407	3.2.1. A pseudo optimal distribution of prefixes in a 64 bit
408	   architecture

410	   In order to have optimal use of cable connections, length of the VLSM
411	   tree is expected to be as short as possible. Also any single
412	   organization may prefer to have its user id space to be under the
413	   same network id. So, a 16 bit user-id may become insufficient for
414	   places like large university campus, where as 32 bit will become too
415	   large. Hence, 24 bit user-id will be a moderate one which is the
416	   class A address space in IPv4 (also used as the space for private
417	   IP). As published in 1998 [6], OSPF can support an area with 1600
418	   routers and 30K external LSAs. So, 11 bits are needed to support this
419	   space. With the assumption that OSPF can support much more address
420	   space with the advancement of hardware technology as well as to keep
421	   the space open for future expansions, 12 bits are assigned for the AS
422	   layer. 16 bits are assigned for the inter-AS-bottom layer. So, if on
423	   the average, 16 bit equivalent space gets used within the user-id
424	   space (i.e. one out of 256) and 8 bit equivalent nodes gets used
425	   inside an AS (16% of 1600), for a top level node (with 16 bit
426	   equivalent AS nodes), it will generate 2^40 IP addresses, which will
427	   give 8629 IP addresses per person in Japan (with a population of
428	   127417200; Japan is at the 10th position from the top in the
429	   population list of the world). So, even if all the countries with
430	   population less than or equal to Japan are assigned a top level node
431	   and all the provinces/states of countries with larger population are
432	   assigned a top level node each, total number of nodes will come well
433	   under 1024. If a number of neighboring countries with lesser
434	   population shares a top level node, total number of top level nodes
435	   will come down further.  This suggests that 62 bit equivalent
436	   (10(pA)+16(pB)+12(pC)+24(user-id)) space will be good enough for
437	   unicast addresses. This distribution expects OSPF to support 65K
438	   (64K+1K) external LSAs.

440	   Distribution of address space will be finalized based on the
441	   consultation with IANA. Primarily, they may appear to be as follows:

443	   64 bit address space may be divided into two 63 bits blocks:

445	   i. Global unicast addresses with the most significant bit set to 0.
446	   This space is equally divided between provider assigned (PA) address
447	   space and provider independent (PI) address space.

449	   a) Provider assigned address space with prefix 00.

451	   b) Provider independent (PI) address space with prefix 01.  Provider
452	   independent address space will be used for the customers who would
453	   like to retain their number even after changing their providers. As
454	   routing will be based on PA addresses, each PI address will be
455	   associated to at least one PA address. Most significant part of PI
456	   addressing is, it is independent of the architectural framework of
457	   the provider network; even if the architectural framework changes,
458	   same format of PI addressing can be maintained. Once implemented, PI
459	   address of a node will be the number that will be generally used by
460	   the common people. Section 4 describes issues related to PI
461	   addressing in detail.

463	   ii. Address space with the MSB set to 1 will be distributed within
464	   the rest. Each of them will have a fixed prefix. This distribution
465	   will be based on the requirements and the work that have already been
466	   done in connection to IPv6:

468	   a) Address space for multicasting with a prefix set to 1111.

470	   b) Address space for link-local address: Link local addresses will
471	   have a prefix 1110.

473	   c) Router address space: This space will be used by the routers and
474	   will have a prefix 1101.

476	   d) Address space for private IP: Each customer network can maintain
477	   private address space to communicate within its users. This space
478	   will be distributed within all the customer sites of a corporate that
479	   can maintain VPN services. A 32 bit address space should be good
480	   enough for private IP. Private address space will have a 32 bit
481	   prefix with leading 4 bits are set to 1100 and the rest are set to 1.

483	   Rest of the address space has been kept for future use.

485	3.2.2. Whether to go for a two-tier or three-tier hierarchy

487	   Establishment of hierarchy in the inter-AS layer reduces the size of
488	   BGP entries to a great extent, but leads to an improper use of
489	   address space due to geo-political reason. If hierarchy in the inter-
490	   AS space gets removed, entire 26 bit (10+16) space will be available
491	   for a single layer and use of inter-AS space will be true to its
492	   sense, but will increase external LSA (and/or number of entries in
493	   the BGP table) dramatically. So, it depends on to what extent OSPF
494	   can support external LSAs. BGP expects the packet length to be
495	   limited to 4096 bytes. BGP manages to make it work with this
496	   limitation with the concept of prefix reduction in the CIDR based
497	   environment. As the number of inter-AS nodes increases, BGP has to
498	   change this limit in order to make it work in flat address space. The
499	   alternate will be to divide the inter-AS space into number of areas
500	   as defined in section 2.1. The area border routers will advertise the
501	   aggregated information to the rest of the world. BGP may have to
502	   incorporate both the options at the same time. As the number of nodes
503	   in the inter-AS layer increases, in order to reduce the number of
504	   entries in the routing table, inter-AS space has to be split into two
505	   separate planes. So, two-tier hierarchy can be considered as an
506	   interim state to go for three-tier hierarchy. If it so happen that
507	   current available data is good enough to support the present need, it
508	   will be worth to look for to what extent it can support in the
509	   future. Assignment of inter-AS nodes in two-tier hierarchy should be
510	   based on the geographical distribution as if it is part of three-tier
511	   hierarchy. Otherwise, introduction of three-tier hierarchy in the
512	   future will become another difficult task to go through. Based on the
513	   report of year 2011, BGP supports ~400,000 entries in the routing
514	   table. With this growing trend, BGP may have to change the limit of
515	   packet length even in a CIDR based environment. With the introduction
516	   of two-tier hierarchy, number of entries in the routing table will
517	   come down drastically and with the three-tier approach, it will come
518	   down further.

520	3.3. Issues related to Satellite communications

522	   Establishment of hierarchy in the inter-AS layer expects the only way
523	   any two autonomous systems in two different top level nodes
524	   communicate is through their SBRs. If two autonomous systems inside
525	   the same top level node communicate through satellite, it will be
526	   considered as a direct link between them. Whenever autonomous system
527	   'ASa' of top level node 'A' communicates with autonomous system 'ASb'
528	   of top level node 'B' through satellite, they have to go through
529	   their state border routers. i.e.  satellite port inside 'A' that
530	   communicates with a satellite port inside 'B' will be considered as
531	   state border router. If multiple such ports exists inside node 'A',
532	   all of them will be equidistant from any port inside 'B'. Which
533	   expects any satellite port inside 'B' to have prior knowledge of list
534	   of autonomous systems that will be under the purview of any port
535	   inside 'A'. So, all the satellite ports of 'A' have to exchange such
536	   group of information with all the satellite ports of 'B' and vice
537	   versa. These group of autonomous systems can be considered as a
538	   cluster of autonomous systems inside an area of a top level node. If
539	   number of such ports is small, some heuristics can be applied while
540	   assigning AS numbers in order to reduce the processing time during
541	   the circuit establishment phase.  It will become difficult to
542	   maintain such heuristics once the number of such ports becomes large.
543	   So, in case of satellite communication, the advantage of establishing
544	   hierarchy inside inter-AS layer diminishes as the number of satellite
545	   ports increases. If any private corporate maintains its own satellite
546	   channel to communicate between its offices at distant locations, all
547	   of these offices are going to be considered as under the user-id
548	   space of its network. Service providers that provide satellite
549	   services to the end-site customers, can operate in the usual manner
550	   as they will provide connection to customer networks which will act
551	   as stub.

553	3.4. Setting default route inside VLSM tree

555	   Section 3.1 describes that there is no need to pass down the routing
556	   information of the external world inside VLSM tree that acts as a
557	   stub. Inside a VLSM tree, a node of higher prefix can be divided into
558	   number of nodes with lower prefixes. Each divided node can further be
559	   subdivided with nodes of further lower prefixes. This process can be
560	   continued as long as it is desired or no more division is further
561	   possible.

563	   Following figure shows a typical arrangement of VLSM tree of a
564	   service provider's network with IPv4 address space. Switch SW-A is
565	   connected to the outside world and maintains global routing table. It
566	   acts as the root of a VLSM tree that acts as a stub. It has been
567	   assigned an address block 11.1.16.0/20 which is distributed among its
568	   four children SW-B, SW-C, SW-D and SW-E with the approach of VLSM.
569	   Switch SW-B further divides its address space between switches SW-F
570	   and SW-G. Switch SW-F assigns an address block 11.1.16.0/24 to
571	   customer network CN-A. Switch SW-G assigns address block 11.1.20.0/24
572	   and 11.1.21.0/24 to two customers CN-B and CN-C; where as switch SW-E
573	   assigns address block 11.1.30.0/24 to customer network CN-D.

575	   Routing inside the tree takes place with the following principle.

577	   Inside the tree, if a node (switch/router) that is assigned a domain
578	   (NetAddr/NetMask) receives a packet which is destined to somewhere
579	   outside of its domain, needs to forward the packet to its parent in
580	   the hierarchy.

582	                               +--------------+
583	                               |     SW-A     |
584	                               | 11.1.16.0/20 |
585	                               +-+-+------+-+-+
586	                                 | |      | |
587	                 +---------------+ |      | +----------------+
588	                 |                 |      |                  |
589	          +------+-----+ +---------+--+ +-+----------+ +-----+------+
590	          |    SW-B    | |    SW-C    | |    SW-D    | |   SW-E     |
591	          |11.1.16.0/21| |11.1.24.0/22| |11.1.28.0/23| |11.1.30.0/23|
592	          +---+----+---+ +------------+ +------------+ +--+---------+
593	              |    |                                      |
594	              |    +-------+                              |
595	              |            |                           +--+--+
596	      +-------+----+  +----+-------+                   |CN-D |
597	      |   SW-F     |  |    SW-G    |                   +-----+
598	      |11.1.16.0/22|  |11.1.20.0/22|                11.1.30.0/24
599	      +--+---------+  +--+------+--+
600	         |               |      |
601	         |               |      |
602	      +--+--+         +--+--+ +-+---+
603	      |CN-A |         |CN-B | |CN-C |
604	      +-----+         +-----+ +-----+
605	   11.1.16.0/24  11.1.20.0/24 11.1.21.0/24

607	   If a host in CN-A wants to send a packet to an address 11.1.21.116,
608	   CE router of CN-A forwards it to SW-F. SW-F finds the destination
609	   address of the packet to be outside of its domain and forwards the
610	   packet to its parent SW-B. SW-B finds that a port that has been
611	   configured with the matching destination address and forwards it to
612	   its child SW-G. Switch SW-G sends the packet to customer network CN-
613	   B.

615	   If a host in CN-B wants to send a packet to 11.1.17.120, CE router of
616	   CN-B forwards the packet to SW-G. SW-G finds the destination address
617	   of the packet to be outside of its domain and forwards the packet to
618	   its parent SW-B. SW-B finds that a port that has been configured with
619	   the matching destination address and forwards the packet to its child
620	   SW-F. SW-F finds the destination address to be within its domain, but
621	   no port has been configured with the matching destination address and
622	   generates ICMP UNREACHABLE.

624	   If a host in CN-C wants to send a packet to 16.2.22.116, CE router of
625	   CN-C forwards the packet to SW-G. SW-G finds the destination address
626	   of the packet to be outside its domain and forwards the packet to SW-
627	   B. SW-B forwards the packet to its parent SW-A. SW-A find the
628	   destination address of the packet to be outside its domain and
629	   consults with the global forwarding table and forwards the packet
630	   through the right port.

632	3.4.1. IP VPN with MPLS inside VLSM tree

634	   Section 3.1 describes that there is no need to pass down the routing
635	   information of the external world inside VLSM tree. This section
636	   describes how to make IP VPN work inside VLSM tree without using BGP.

638	   RFC4364 [7] describes "IP VPN" with BGP/MPLS. To support VPN, PE
639	   routers maintain per-site forwarding table. When a packet arrives
640	   from an associated CE router, PE router consults with this forwarding
641	   table to forward the packet. If the packet is supposed to be
642	   forwarded to another site of VPN through the backbone, it uses two-
643	   level label stack. The upper label is used to forward the packet from
644	   ingress PE router to the egress PE router; where as, the inner label
645	   is used for the egress PE router to identify the associated CE router
646	   where the packet is supposed to be forwarded. BGP is used by the
647	   Service Provider to exchange the routes of a particular VPN among the
648	   PE routers that are attached to that VPN. Configuration takes place
649	   on PE routers of both the sides of LSP. The simplest way to achieve
650	   this is to configure these attributes manually on PE routers. In
651	   order to have dynamic allocation of inner label, MPLS signaling
652	   protocols (in place of BGP) need to be extended. Allocation of inner
653	   label has to be done by the egress PE router. Same message that is
654	   used for the assignment of upper label may be used for the assignment
655	   of inner label. Inside the forwarding table, each entry contains the
656	   forwarding destination address based on a set of destination
657	   addresses (NetAddress/NetMask) of the IP packets received from
658	   ingress CE router. While establishing inner label, ingress PE router
659	   needs to send these attributes with the signaling message and the
660	   egress PE router needs to validate those before assigning label.

662	3.4.1.1. Extension to RSVP-TE to support IP VPN inside VLSM tree

664	   This section describes extension to RSVP-TE[17] to support dynamic
665	   allocation of inner label of two-level label stack used to support
666	   VPN services.

668	   In order to establish LSP using RSVP-TE, ingress PE router sends Path
669	   message to the egress PE router. Path message is augmented with a
670	   LABEL_REQUEST object.  Labels are allocated downstream and
671	   distributed (propagated upstream) by means of RSVP Resv message. For
672	   this purpose, the RSVP Resv message is extended with a special LABEL
673	   object. In order to support VPN to establish the inner label, Path
674	   message is augmented with a VPN_ATTRIBUTE label. Similarly, RSVP Resv
675	   message is extended with a VPN_LABEL object. When an egress PE router
676	   receives a Path message, it checks the presence of VPN_ATTRIBUTE
677	   object. On finding this object, egress PE router checks the viability
678	   of assignment of VPN label with the parameters from the VPN_ATTRIBUTE
679	   object and the attributes that are already configured with the egress
680	   PE router. If the test is positive, it assigns a VPN label and does
681	   the rest of the processing of LSP label assignment and sends the RSVP
682	   Resv message with the extension of VPN_LABEL object towards the
683	   ingress PE router. On receiving Resv message with VPN_LABEL object,
684	   ingress PE router assigns VPN label along with the rest of the
685	   processing of Resv message and completes the operation. VPN_ATTRIBUTE
686	   and VPN_LABEL objects are described below.

688	   VPN_LABEL class=<IANA_TBD1>, C-Type=1
689	    0                   1                   2                   3
690	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
691	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
692	   |                         (inner label)                         |
693	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

695	   VPN_ATTRIBUTE  class=<IANA_TBD2>, C-Type=1
696	    0                   1                   2                   3
697	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
698	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
699	   |         Global Unicast Address of Ingress CE Router           |
700	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
701	   |         Global Unicast Address of Egress CE Router            |
702	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
703	   |             Net Address of Destination IP Packet              |
704	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
705	   |             Net Mask of Destination IP Packet                 |
706	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

708	   The format of the Path message is as follows:

710	      <Path Message> ::=       <Common Header> [ <INTEGRITY> ]
711	                               <SESSION> <RSVP_HOP>
712	                               <TIME_VALUES>
713	                               [ <EXPLICIT_ROUTE> ]
714	                               <LABEL_REQUEST>
715	                               [ <VPN_ATTRIBUTE> ]
716	                               [ <SESSION_ATTRIBUTE> ]
717	                               [ <POLICY_DATA> ... ]
718	                               <sender descriptor>

720	      <sender descriptor> ::=  <SENDER_TEMPLATE> <SENDER_TSPEC>
721	                               [ <ADSPEC> ]
722	                               [ <RECORD_ROUTE> ]

724	   The format of the Resv message is as follows:

726	      <Resv Message> ::=       <Common Header> [ <INTEGRITY> ]
727	                               <SESSION>  <RSVP_HOP>
728	                               <TIME_VALUES>
729	                               [ <RESV_CONFIRM> ]  [ <SCOPE> ]
730	                               [ <POLICY_DATA> ... ]
731	                               [ <VPN_LABEL> ]
732	                               <STYLE> <flow descriptor list>

734	      <flow descriptor list> ::= <FF flow descriptor list>
735	                               | <SE flow descriptor>

737	      <FF flow descriptor list> ::= <FLOWSPEC> <FILTER_SPEC> <LABEL>
738	                               [ <RECORD_ROUTE> ]
739	                               | <FF flow descriptor list>
740	                               <FF flow descriptor>

742	      <FF flow descriptor> ::= [ <FLOWSPEC> ] <FILTER_SPEC> <LABEL>
743	                               [ <RECORD_ROUTE> ]

745	      <SE flow descriptor> ::= <FLOWSPEC> <SE filter spec list>

747	      <SE filter spec list> ::= <SE filter spec>
748	                               | <SE filter spec list> <SE filter spec>

750	      <SE filter spec> ::=     <FILTER_SPEC> <LABEL> [ <RECORD_ROUTE> ]

752	   Egress router generates an error with Error Code = 24, sub-code =
753	   <IANA_TBD3> (VPN label allocation error) if the operation fails.

755	4. Provider Independent addressing, name services and multihoming

757	   Provider independent addressing can be conceived as naming a host
758	   with a number. It can be used by customer networks who would like to
759	   retain their number even after changing their service provider; also
760	   it is useful to designate a host uniquely if the customer network is
761	   multihomed. Just like in name services, as address corresponding to a
762	   name needs to be resolved first to initiate communication, the same
763	   is required for PI addressing. Each globally unique PI address will
764	   be associated to at least one global unicast provider assigned
765	   address. For a host with single interface, this number will be same
766	   as the number of service providers the customer network is associated
767	   with.

769	   As either source or destination or both may be multihomed, there
770	   could be multiple paths to communicate between two hosts. This is
771	   required both for name services as well as for PI addressing.

773	   A system call needs to be introduced to get the source address based
774	   on the destination address. If application program needs to use the
775	   destination address directly, it needs to use this system call.

777	   int getcommaddr(int sockfd, struct in_addr *dst, struct addr_pair
778	   *endpts);

780	   'addr_pair' holds the addresses of communication end points as
781	   follows:

783	   struct addr_pair {
784	       struct in_addr src;
785	       struct in_addr dst;
786	   };

788	   'getcommaddr'[8] returns the number of source-destination pairs for
789	   communication; the field 'endpt' will hold the array of these
790	   addresses. The array will be in sorted manner based on the best
791	   possible route.  'sockfd' is used to get the 'type of service'
792	   assigned. So, an application program needs to set its type of service
793	   before using this call.

795	   'getcommaddr needs to call a routine 'getmappedaddr' to resolve the
796	   mapped provider assigned addresses of a provider independent address.

798	   int getmappedaddr(struct in_addr *piaddr, struct in_addr *mpiaddr);

800	   'getmappedaddr' will return number of mapped addresses and 'mpiaddr'
801	   will hold their values.

803	   Users may use name instead of IP address to reach the destination. A
804	   new system call needs to be introduced 'gethostbynamewithsrcaddr',
805	   which is an extension to 'gethostbyname' as follows:

807	   struct hostent *gethostbynamewithsrcaddr(int sockfd,const char *name,
808	                  int *nroutes, struct addr_pair *endpts);

810	   'gethostbynamewithsrcaddr'[8] takes 'name' and 'sockfd' as input
811	   parameters and finds out the best possible route to reach the
812	   destination. It returns the pointer to the 'hostent' structure as
813	   returned by 'gethostbyname' system call.  The parameter 'nroutes'
814	   gets the number of possible routes to be used and the corresponding
815	   source and destination addresses gets assigned to 'endpts' in sorted
816	   manner. 'sockfd' is used to get the 'type of service' assigned. So,
817	   an application program needs to set its type of service before using
818	   this call.

820	   An application program needs to use these source addresses from the
821	   top (i.e. the 0th) to establish connection with the destination. It
822	   needs to bind source address 'src' and then connect with the
823	   destination address 'dst'.

825	4.1. PI address Resolution

827	   This section tries to come up with a solution for PI address
828	   resolution with the approach of DNS[10] with necessary differences.
829	   Just like name space in DNS, entire address range with prefix 01 will
830	   be the address space used by PI addresses. Servers that will hold the
831	   information of mapping between PI addresses and corresponding PA
832	   addresses will be called as PIMapServers and the programs that will
833	   be used to resolve addresses will be called as PIMapResolvers.

835	   In case of DNS where name is used in hierarchical format to resolve
836	   the addresses, PI address resolution will be based on the prefix of
837	   the PI address used for resolution.  The prefix is determined based
838	   on the architectural model used for the internet. Based on the prefix
839	   information addresses of a list of servers can be found out that will
840	   act as regional servers which will be used to resolve mapped PA
841	   addresses corresponding to that PI address. A prefix will serve a
842	   fixed address space within entire PI address space. Address space
843	   belonging to a prefix will be distributed within customer networks of
844	   heterogeneous sizes. Address space allocation and the mapping of
845	   associated PA address(es) will be assigned by a regional authority.
846	   The regional authority will be fully responsible for the operation of
847	   regional servers in that region.

849	   Like DNS, there are some root servers which will have some fixed
850	   addresses, under which there are some prefixes which will act as top-
851	   level-domains. In case of CIDR based hierarchy, these prefixes may be
852	   of different prefix lengths which are selected based on the
853	   requirements. Each prefix in a top level domain can further be split
854	   into number of prefixes with the approach of CIDR. This tree
855	   structured hierarchy will be kept on growing till we get prefixes
856	   associated with regional servers. Each prefix associated with a
857	   regional server will be distributed amongst customer networks of
858	   various sizes as well as prefixes that will again be associated with
859	   some regional servers with the approach of CIDR. These regional
860	   servers can be considered as equivalent to  the authoritative name
861	   servers of DNS which are associated with zones. As stated earlier,
862	   prefixes starting with "00" will be assigned for provider assigned
863	   addresses and prefix starting with "01" will be assigned for provider
864	   independent addresses where as prefix starting with "1" will be
865	   assigned for addresses of all other types.

867	   As inherent hierarchy is involved in "Mesh structured hierarchy",
868	   this hierarchy goes up to two levels. As usual, there will be some
869	   root servers with fixed assigned addresses. Each root server will
870	   have prefixes with "01.A" that will act like top level domain. Under
871	   each top level domain, there will be entries with prefixes "01.A.B".
872	   Within a region "A.B", every global PA address is represented as
873	   "00.A.B.C.user-id". In order to support customer networks of
874	   heterogeneous sizes with the approach of VLSM, the "user-id" portion
875	   is further divided as "subnet-id.user-id". So, the effective network
876	   prefix of a customer network in PA address space is "00.A.B.C.pa-
877	   subnet-id". Within an "A.B", entire PI address space with prefix
878	   "01.A.B" will be distributed within customer networks of
879	   heterogeneous sizes. So, effective network prefix of a customer
880	   network with PI address will be "01.A.B.pi-subnet-id". A particular
881	   prefix "01.A.B.pi-subnet-id" will be mapped to at least one provider
882	   assigned prefix of same prefix length. For a multihomed customer
883	   network within "A.B" that receives services from two service
884	   providers will have prefixes "00.A.B.C1.pa-subnet-id1" and
885	   "00.A.B.C2.pa-subnet-id2". A PI address prefix "01.A.B.pi-subnet-id"
886	   of same length will be mapped to both these prefixes of PA address
887	   space. Every region "A.B" will have regional server and backup
888	   server(s) with a maximum limit (say 4) with net addresses
889	   "00.A.B.server1", "00.A.B.server2", "00.A.B.server3" and
890	   "00.A.B.server4".

892	   Each PIMapServer will have a database of records that will have
893	   information to resolve PI addresses. In memory copy of a region will
894	   have an array of records where each record will have the following
895	   format:

897	   +------------+---------+------+-----+-------+-----------+
898	   | NetAddress | NetMask | Type | TTL | NAddr | Addr(1-4) |
899	   +------------+---------+------+-----+-------+-----------+

901	   First two fields "NetAddress/NetMask" represents the PI address range
902	   of a network. "Type" will be either Domain/Referral/Individual/
903	   SingleEntry/Default based on which a query and rest of the fields of
904	   a record have to be processed. A PI address can have maximum four
905	   mapped PA addresses. "Addr1", "Addr2", "Addr3", "Addr4" will hold the
906	   corresponding PA addresses and "NAddr" will hold the number of such
907	   addresses. The field "TTL" is a 32 bit integer measured in seconds
908	   which will hold same meaning and approach as defined in the
909	   specification of DNS[10]. When a server receives a query for an
910	   address "X", it extracts the record of the network based on
911	   "NetAddress/NetMask" and "X" from its database. If no matching record
912	   is found, a negative response is sent. Based on the "Type" of the
913	   record, the query is processed in the following manner.

915	   Type=Domain:

917	   This is the most common type. If a customer network would not like to
918	   maintain a map server opts for this option. In this case there will
919	   be one to one mapping between a PI address and corresponding PA
920	   addresses. The fields "Addr1"/"Addr2"/"Addr3"/"Addr4" will hold the
921	   PA Net Addresses corresponding to the PI address of the network.
922	   Server will send the matching record to the resolver with
923	   Type=Domain. Resolver will extract the user-id portion of "X" and
924	   find the corresponding mapped PA addresses based on
925	   "Addr1"/"Addr2"/...etc.

927	   Theoretically, "A.B" portion of a PI address need not match with the
928	   "A.B" portion of the corresponding PA addresses. Consider a large
929	   corporate that has its corporate office and a branch office within
930	   the same region of a particular "A.B" and some other offices with
931	   different values of "A.B". The corporate can maintain a contiguous
932	   range of PI addresses for the ease of its operation. It needs to
933	   split entire PI address range based on its offices and assign the
934	   corresponding PA addresses. In order to minimize the path of a query
935	   it is desirable that "A.B" of a PI address and its corresponding
936	   mapped PA addresses belong to the same region.

938	   Type=Referral:

940	   This is used when an address within the domain "NetAddress"/"NetMask"
941	   has to be processed by another map server. The map server may itself
942	   be another regional server or a server within a customer network.

944	   When a customer network would like to have a direct control for the
945	   mapping of its addresses it needs to opt for this option.
946	   "Addr1"/"Addr2"/"Addr3"/"Addr4" of the database entry will hold the
947	   pointer to the information associated to each map server. "NAddr"
948	   will hold the number of map servers that can be referred. Information
949	   of each server will hold the following values: PI address of the map
950	   server + Number of PA addresses to reach the map server + PA
951	   addresses of the map server. Any one of these map servers need to be
952	   queried for further processing. A server may act either in recursive
953	   mode or in iterative mode based on its implementation just like in
954	   DNS. A large corporate may have different offices and each (or some
955	   of them) may maintain a map server based on their policies.

957	   When a server needs to handle a particular address separately, it
958	   needs to set "NetAddress" with that particular address and all the
959	   bits of "NetMask" will be set to "1". The "Type" field has to be set
960	   as "SingleEntry"(which is similar to the Type Address(A) in terms of
961	   DNS). If some of its addresses need to be handled separately but for
962	   the rest common rule may apply (like Type=Domain), records of the
963	   individual entries should be processed first and then for the rest.
964	   In these cases "Type" has to be set as "Default". So, a server of a
965	   customer network may have database entries with Type=Domain/Referral
966	   /SingleEntry/Default. It makes sense for a server (or a master file)
967	   to have entries with Type=Default, but from the point of a resolver,
968	   it does not make any sense. So a server needs to extract the PA
969	   addresses and form a record with Type=SingleEntry and send it back to
970	   the resolver.

972	   For a host having multiple interfaces, each interface may be assigned
973	   PA addresses supplied by all the service providers, but it is
974	   desirable that PI address gets mapped to only one of them (preferably
975	   for a CE router, the interface which will have the shortest path will
976	   be mapped PI address with the PA address associated with that CE
977	   router).

979	   Type=Individual:

981	   This is meant for the individual users opting for services like
982	   telephonic services that need to maintain PI address. With this
983	   option a mobile user may maintain its PI address after changing its
984	   service provider. A map server needs to maintain some networks with a
985	   range of PI addresses in its database. When a query for an address
986	   "X" is received, server needs to get the corresponding record where
987	   "Addr1" will hold the pointer to a open file descriptor (or pointer
988	   to the in memory copy) of a separate data file where there will be
989	   one to one mapping between PI address and its corresponding PA
990	   address of all the assigned PI addresses. These networks and
991	   assignment of individual PI addresses have to be done by the regional
992	   authority.

994	   As with Type=Default, Type=Individual does not make any sense to a
995	   resolver. So, server needs to extract PA address and form a record
996	   with Type=SingleEntry and send it back to the resolver.

998	   As stated above, this solution is based on the approach of DNS. For
999	   the ease of implementation and to make use of the existing source
1000	   code related to DNS (e.g. BIND) most of the features have been taken
1001	   from DNS. DNS supports multiple entry output, but they appear in a
1002	   sequential manner. In order to make processing easier, they are
1003	   arranged in a structured manner in this document.

1005	   IANA has assigned a port <IANA_TBD4> for its UDP/TCP based
1006	   implementation.

1008	4.1.1. Record Format

1010	   Each record (the way they will appear in a master file or will be
1011	   used for communication) will have the following format:

1013	   NetAddress/NetMask + Type (8 bit unsigned int) + <TTL> + RDATA (Type
1014	   specific information)

1016	   Record types are primarily the types of records as described above
1017	   along with three other types: SOA (Start of a zone of authority), MPS
1018	   (host with Type=SingleEntry that acts as a Map server for this zone)
1019	   and DFL (Data File). These types are mainly useful in the context of
1020	   processing AXFR/IXFR/NOTIFY/DFAXFR/DFIXFR messages.

1022	   Types are defined as follows:

1024	   Types               values          comments
1025	   -----------------------------------------------------------
1026	   SEN (SingleEntry)      1    same as type A(address) in DNS
1027	   MPS (MapServer)        2    Map server
1028	   DMN (Domain)           3
1029	   DEF (Default)          4
1030	   REF (Referral)         5
1031	   SOA (Start of a zone)  6
1032	   IND (Individual)       7
1033	   DFL (Data File)        8
1034	   -----------------------------------------------------------

1036	   RDATA of different types will appear as follows:

1038	   Type=SOA:
1039	   PI address of server+SERIAL+REFRESH+RETRY+EXPIRE+MINIMUM (meaning and
1040	   values of SERIAL/REFRESH/RETRY/EXPIRE/MINIMUM are same as they were
1041	   defined in section 3.3.13 of RFC 1035[11])

1043	   Type=(SEN/MPS):
1044	   NAddr(Number of addresses) + corresponding PA addresses

1046	   Type=(DMN/DEF):
1047	   NAddr(Number of addresses) + corresponding Net addresses

1049	   Type=REF:
1050	   NAddr(Number of map server) + for each map server (PI address of map
1051	   server + NAddr(Number of addresses of map server) + corresponding PA
1052	   addresses))

1054	   Type=IND:
1055	   NAddr(=1) + full path name of the data file

1057	   Type=DFL:
1058	   Data file name + SERIAL + Number of records in the data file(32 bit
1059	   unsigned int)
1060	   While used in communication data file name is used as its length (8
1061	   bit unsigned int) followed by the octets of the string.

1063	   TTL value of a record has to be set to 0 if it is not relevant or to
1064	   accept the value associated with the record of SOA.

1066	4.1.2. Messages

1068	   In order to support most of the features of DNS, message format has
1069	   been retained almost same as that of DNS. So, all the relevant fields
1070	   will be processed exactly in the same manner as that have been done
1071	   in DNS and all the irrelevant issues have to be ignored. Rest of this
1072	   section describes where and how changes have to be made.

1074	   As defined in RFC 1035, the top level format of message is divided
1075	   into 5 sections (some of which are empty in certain cases) shown
1076	   below:

1078	       +---------------------+
1079	       |        Header       |
1080	       +---------------------+
1081	       |       Question      | the question for the name server
1082	       +---------------------+
1083	       |        Answer       | answering part of the question
1084	       +---------------------+
1085	       |      Authority      | authoritative map server
1086	       +---------------------+
1087	       |      Additional     | additional information
1088	       +---------------------+

1090	   The header section has been retained as defined in RFC 5395[12] as
1091	   follows:

1093	        0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15
1094	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
1095	       |                      ID                       |
1096	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
1097	       |QR|   OpCode  |AA|TC|RD|RA| Z|AD|CD|   RCODE   |
1098	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
1099	       |                QDCOUNT/ZOCOUNT                |
1100	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
1101	       |                ANCOUNT/PRCOUNT                |
1102	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
1103	       |                NSCOUNT/UPCOUNT                |
1104	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
1105	       |                    ARCOUNT                    |
1106	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

1108	   The question section will have two parts:
1109	   QType(one octet unsigned int)+QData.

1111	   Query types are defined as follows:

1113	   QTypes       values          comments
1114	   -----------------------------------------------------------
1115	   SEN            1    query for mapped PA address
1116	   SOA            6    query information related to SOA
1117	   DFL            8    query information related to data file
1118	   DFXFR          249  data file transfer
1119	   DFIXFR         250  incremental data file transfer
1120	   IXFR           251  incremental authoritative data file xfr
1121	   AXFR           252  authoritative data file transfer
1122	   -----------------------------------------------------------

1124	   QData will hold values based on QType.

1126	   Following section describes issues related to QType=SEN.  Issues
1127	   related to all other QTypes (i.e. related to file transfer) will be
1128	   discussed afterwords.

1130	   For QType=SEN(1): QData=PI address that needs to be resolved.

1132	   The answer section, authority section and additional section will
1133	   have a number of resource records where the number will be specified
1134	   in the header.

1136	   On receiving a query, map server will return the matching record from
1137	   its database.  If response is address, the answer section will hold
1138	   the record of any one of these two types: SEN/DMN.

1140	   If Type=DMN, resolver needs to extract the mapped addresses as
1141	   described in section 4.1.

1143	   If Type=DMN, entire address range will appear in the form of
1144	   NetAddress/NetMask. This will have advantages while catching data for
1145	   any particular address, but getting the information of the entire
1146	   address range.

1148	   If the response is referral, answer section will be empty and the
1149	   authoritative section will hold the record with Type=REF.

1151	   If server supports recursion, for each iterative process that it
1152	   receives a record with Type=REF, it needs to push the record to the
1153	   additional section of the message that needs to be sent to the
1154	   resolver. So, additional section will hold the records of Type=REF of
1155	   the chain of the tree through which PA addresses have been resolved.

1157	4.1.3. Master file and data file

1159	   Section 5 of RFC 1035 states:

1161	   "Master files are text files that contain RRs in text form.  Since
1162	   the contents of a zone can be expressed in the form of a list of RRs
1163	   a master file is most often used to define a zone, though it can be
1164	   used to list a cache's contents."

1166	   Section 5.1 of RFC 1035 states:

1168	   "The format of these files is a sequence of entries.  Entries are
1169	   predominantly line-oriented, though parentheses can be used to
1170	   continue a list of items across a line boundary, and text literals
1171	   can contain CRLF within the text.  Any combination of tabs and spaces
1172	   act as a delimiter between the separate items that make up an entry.
1173	   The end of any line in the master file can end with a comment.  The
1174	   comment starts with a ";" (semicolon)."

1176	   Master files follow the same approach and format in the line of DNS
1177	   as described in section 5 of RFC 1035 with necessary differences.

1179	   An example master file may look like as follows:

1181	   @ "PI NetAddr"/"Net Mask"  SOA  "PI address of primary server" (
1182	                                    20     ; SERIAL
1183	                                    7200   ; REFRESH
1184	                                    600    ; RETRY
1185	                                    3600000; EXPIRE
1186	                                    60)    ; MINIMUM
1187	   "PI NetAddr"/"Net Mask"    MPS  0  NAddr "PA addresses"
1188	   "PI NetAddr"/"Net Mask"    SEN  0  NAddr "PA addresses"
1189	   "PI NetAddr"/"Net Mask"    DMN  0  NAddr "Net addresses"
1190	   "PI NetAddr"/"Net Mask"    DEF  0  NAddr "Net addresses"
1191	   "PI NetAddr"/"Net Mask"    IND  0  NAddr(=1) "Data file name"

1193	   A data file contains a sequence of entries where each entry appears
1194	   in a separate line. Each entry is a mapping between a PI address and
1195	   its associated PA address separated by space(s). Entries are
1196	   generally sorted with PI address.  As in case of master file comments
1197	   can be inserted with the start of a ";" (semicolon) that will end at
1198	   the end of the line.  Data files are commonly associated with the map
1199	   servers maintained by regional authority, but they are not generally
1200	   associated with the map servers maintained by individual customer
1201	   networks. A data file entry may appear to be as follows:

1203	   "PI Address" NAddr "PA Addresses"
1204	   A map server may have a number of data files. These files have to be
1205	   defined in another file (a supporting file, the way boot file
1206	   "named.boot" is used in BIND) that will have information of each of
1207	   them. An entry in that file will follow the same format of a record
1208	   (Type=DFL) and will have the following fields:

1210	   "PI NetAddr"/"NetMask" Type(DFL) TTL "Data File Name" SERIAL "Number
1211	   of records".

1213	   This file will be used to process message with QType=DFL which will
1214	   be used to support data file transfer/incremental data file transfer.

1216	   For QType=DFL(8): QData="PI NetAddr"/"NetMask" of the desired network
1217	   For QType=SOA(6): QData="PI NetAddr"/"NetMask" of the desired zone

1219	   A map server will return a record of Type=DFL on receiving a query
1220	   with QType=DFL where as it will return a record of Type=SOA on
1221	   receiving a query with QType=SOA.

1223	4.1.4. Zone maintenance and transfers

1225	   Section 4.3.5 of RFC 1034 states:

1227	   "The general model of automatic zone transfer or refreshing is that
1228	   one of the name servers is the master or primary for the zone.
1229	   Changes are coordinated at the primary, typically by editing a master
1230	   file for the zone.  After editing, the administrator signals the
1231	   master server to load the new zone.  The other non-master or
1232	   secondary servers for the zone periodically check for changes (at a
1233	   selectable interval) and obtain new zone copies when changes have
1234	   been made.

1236	   To detect changes, secondaries just check the SERIAL field of the SOA
1237	   for the zone.  In addition to whatever other changes are made, the
1238	   SERIAL field in the SOA of the zone is always advanced whenever any
1239	   change is made to the zone."

1241	   Section 1.2 of RFC 5936 states:

1243	   "A DNS implementation is not required to support AXFR, IXFR, and
1244	   NOTIFY, but it should have some means for maintaining name server
1245	   coherency.  A general-purpose DNS implementation will likely support
1246	   AXFR (and in the same vein IXFR and NOTIFY), but turnkey DNS
1247	   implementations may exist without AXFR."

1249	   Zone maintenance and transfer will follow the same approach as DNS
1250	   with few minor updates. Frequency of update of data files will be
1251	   high compared to the frequency of update of master file. That is why
1252	   transfer(/incremental transfer) of data file has been treated
1253	   separately from the transfer(/incremental transfer) of master file.

1255	   For all the messages of QType=AXFR/DFXFR/IXFR/DFIXFR, QData="PI
1256	   NetAddr"/"NetMask" of the desired zone or the desired network. NOTIFY
1257	   message needs to include which file has been updated followed by the
1258	   related information. So, if master file has been changed, NOTIFY
1259	   message with query type SOA will be sent and query type DFL will be
1260	   sent if a data file has been changed.

1262	   Transfer of master file will be same as transfer of master file in
1263	   DNS followed by transfer of all the data files. i.e. processing of
1264	   AXFR will have the same approach as DNS followed by DFXFR for all the
1265	   data files. In order to make this happen, at the end of transferring
1266	   the contents of the master file, server (of AXFR message) needs to
1267	   send NOTIFY message for all of the data files belonging to that zone
1268	   to the client(i.e. the secondary server). Processing of NOTIFY of a
1269	   data file by the secondary server needs to send DFIXFR to the primary
1270	   if data file already exist; otherwise it needs to send DFXFR.
1271	   Incremental update of master file (IXFR) will be same as IXFR in DNS
1272	   with a minor update. If client of IXFR finds a new data file gets
1273	   introduced, it calls DFXFR corresponding to that data file. Similarly
1274	   if an entry of a data file gets deleted, client deletes corresponding
1275	   data file.

1277	   Processing of DFXFR will have same approach of AXFR in DNS.
1278	   Similarly processing of DFIXFR will have same approach as IXFR in
1279	   DNS.  While transferring a data file record, an equivalent record of
1280	   type SEN needs to be sent with the values of PI address and mapped PA
1281	   address(es) from the record of data file. Where ever a record of type
1282	   SOA is sent while processing AXFR/IXFR in case of DNS, record of type
1283	   DFL needs to be sent while processing DFXFR/DFIXFR.

1285	   For AXFR, IXFR and NOTIFY in DNS, one needs to follow RFC 5936[13],
1286	   RFC 1995[14] and RFC 1996[15] respectively.

1288	5. Issues related to IP mobility

1290	   An interface of a customer network may have several IP addresses
1291	   (e.g. for a multihomed customer site, each interface will have
1292	   multiple global unicast addresses also it may have private
1293	   addresses). For a mobile node that has been moved to a customer
1294	   network which gets service from a service provider and maintains
1295	   private IP addresses, will have at least three IP addresses; provider
1296	   assigned unicast address, private address and its permanent "Home
1297	   Address". The "Home Address" will be aliased with the provider
1298	   assigned address (i.e. the co-located care-of address). So the
1299	   interface structure needs to have an additional field to hold the
1300	   value of care-of address. The PCB structure will have an additional
1301	   field 'inp_lcladdr'.  So 'inp_lcladdr' will have the current provider
1302	   assigned address that a foreign node needs to use for communication.
1303	   The field 'inp_laddr' that is used to hold the value of local address
1304	   will hold the value of "Home Address" of a mobile node. Similarly,
1305	   PCB needs to introduce another field 'inp_fcladdr' to support the
1306	   destination address to be mobile.  The existing field 'inp_faddr'
1307	   which is used to address a foreign address will hold the value of
1308	   "Home Address" of the mobile node. Customers with PI address who
1309	   would like to have mobility support, the mapped address will be
1310	   considered as the "Home Address" of the mobile node.

1312	   An outgoing packet from a mobile node in a foreign site needs to be
1313	   stacked with the associated care-of address. While initiating
1314	   communication, the 'bind' system call needs to go through the
1315	   interface list and fetch the associated structure to check whether
1316	   the source address is aliased or not and needs to fill the value of
1317	   'inp_lcladdr' of PCB accordingly.

1319	   When TCP receives a SYN for connection establishment, it allocates a
1320	   PCB and assigns the values for 'inp_laddr', and related fields.
1321	   During this phase, TCP also needs to check whether the local address
1322	   is aliased or not (based on the fields of interface structure; which
1323	   is applicable for a mobile node at foreign site) and needs to fill
1324	   the values of 'inp_lcladdr' accordingly. Similarly if destination
1325	   address is found to be aliased, based on the stacking type, it needs
1326	   to fill up the field 'inp_fcladdr'.

1328	   IP address stacking can be performed with the approach introduced in
1329	   section 6.4 of RFC6275[9]. RFC6275 talks about the stacking of IP
1330	   addresses for a destination address (Let us call it as type 0
1331	   stacking). Two more types of stacking need to be introduced; type 1
1332	   stacking where only source address will appear in the stack and type
1333	   2 stacking where both source address and destination address will
1334	   appear in the stack with a particular type of ordering.

1336	   Protocol output routine like 'tcp_output' or 'udp_output' needs to
1337	   fill the IP packet in the following manner.

1339	   If the socket contains a valid 'inp_lcladdr', use 'inp_lcladdr' as
1340	   the source address and 'inp_laddr' will appear in the stack. If the
1341	   socket contains a valid 'inp_fcladdr' use 'inp_fcladdr' as the
1342	   destination address and 'inp_faddr' will appear in the stack. If only
1343	   'inp_fcladdr' contains a valid address where as 'inp_lcladdr' is
1344	   NULL, use type 0 stacking. If only 'inp_lcladdr' contains a valid
1345	   address where as 'inp_fcladdr' is set as NULL, use type 1 stacking.
1346	   If both 'inp_lcladdr' and 'inp_fcladdr' contains valid addresses, use
1347	   type 2 stacking.

1349	   Protocol input routine like 'tcp_input' or 'udp_input' needs to
1350	   process the packet in the reverse order based on the type of
1351	   stacking.  For type 0 stacking, use the address in the stack as the
1352	   destination address; for type 1 stacking, use the address in the
1353	   stack as the source address; for type 2 stacking use both source
1354	   address and destination address from the stack.

1356	5.1. Changes expected with the specifications related to IP mobility

1358	   RFC6275 demands correspondent node binding from mobile nodes for
1359	   route optimization. This binding is required when a connection gets
1360	   established as well as when the mobile node changes it address space.
1361	   There are application like HTTP which opens up multiple connections
1362	   on the run time which are very short lived. If mobile nodes need to
1363	   send binding messages for all the connections, network will be
1364	   unnecessarily congested. This congestion can be avoided with the
1365	   establishment of binding at the time of connection establishment
1366	   itself.  So, if TCP server happens to be mobile, it will set the
1367	   value of 'inp_lcladdr' in the stack while sending SYN+ACK. TCP client
1368	   which initiates communication through 'connect' needs to set
1369	   'inp_fcladdr' field on receiving TCP+ACK. With this approach
1370	   correspondent node binding messages need to be sent only when a
1371	   mobile node changes its position from one address space to another.

1373	   Route optimization is not applicable to applications which are of
1374	   multicast type.  In these cases packets need to be forwarded with the
1375	   mechanism of reverse tunneling with the approach of "IP Encapsulation
1376	   within IP" as defined in RFC2003.  In order to support packet
1377	   delivery with route optimization method as well as with
1378	   "Encapsulating Delivery Style" based on the application type the
1379	   protocol control block needs to introduce another field
1380	   'inp_hagentaddr' to hold the address of the home agent of the mobile
1381	   node. The interface structure also needs to have same field. The
1382	   'bind' system call needs to go through the interface list to fetch
1383	   'inp_hagentaddr' to the PCB along with 'inp_lcladdr' as described
1384	   earlier. So, protocol output routines like 'tcp_output', 'udp_output'
1385	   need to fill up the packets based on the application type. In
1386	   "Encapsulating Delivery Style" packets need to be formed in the
1387	   following manner.

1389	   The inner IP header will contain
1390	      Source Address: Home address of the mobile node
1391	      (i.e. 'inp_laddr')
1392	      Destination address: Address of the correspondent node
1393	      (i.e. 'inp_faddr')
1394	   The outer IP header will contain
1395	      Source Address: co-located care of address of the mobile node
1396	      (i.e. 'inp_lcladdr')
1397	      Destination Address: Address of the home agent of the mobile node
1398	      (i.e. 'inp_hagentaddr')
1399	   Protocol field: IP in IP

1401	6. Refinements over existing IPv6 specification

1403	   As IPv6 was envisioned long before some of the newer technologies
1404	   e.g. MPLS came into picture, some refinements can be made over the
1405	   existing specification. These considerations are related to bandwidth
1406	   usages and performance inside switches. Experimental results show
1407	   that smaller packet size gives better result for the processing of RT
1408	   packets.  So, it is desirable to have IP packet header to be as small
1409	   as possible.

1411	   As described earlier, evaluation of the parameters
1412	   nMaxInterASTopNodes, nMaxInterASBottomNodes and nMaxASNodes is geo-
1413	   political and have to be decided by IANA. Once these parameters are
1414	   determined with mutual agreements, values of pA, pB, pC and prefix
1415	   length of user id can be determined. With 64 bit address space, IP
1416	   header will be reduced by 16 bytes.

1418	   The 'flow label' field of IPv6 packet header may not be of any use
1419	   with MPLS is in use. ATM used to have 4 priority classes. The first
1420	   specification of IPv6 RFC-1883 used a 4 bit type of service field
1421	   along with a 24 bit flow label field. These two were modified to a 8
1422	   bit type of service field and a 20 bit flow label field in the
1423	   current spec RFC-2460.  Too many priority classes may increase
1424	   complexities to process inside switches. If type of service field of
1425	   IPv6 header may be reduced to be of 4 bit length as it was stated in
1426	   RFC-1883 and 'flow label' field gets removed, another three bytes may
1427	   be reduced from the IPv6 header.

1429	   The field 'Hop Limit' has got a 8 bit value in the existing spec. The
1430	   role of this field needs to be discussed properly with a large
1431	   address space.

1433	   RFC4862[16] introduces the concept of "Stateless auto configuration"
1434	   with the goal in mind that no manual configuration is required by
1435	   individual machines before connecting them to the network. It
1436	   generates a link local address with a link-local prefix and the link
1437	   address (e.g. Ethernet/E.164 for ISDN) first. This link local address
1438	   is used to configure global unicast address and any other
1439	   configurable parameters based on router advertisement.  Global
1440	   unicast addresses are generated by the prefix supplied by the router
1441	   advertisement and the link specific interface identifier. This
1442	   identifier can be as large as 64 bit length. So irrespective of the
1443	   size of the network (it may be 10000 or 100 or even less than that)
1444	   every subnet of a customer network will consume a 64 bit equivalent
1445	   addresses. This seems to be a huge blunder. What is expected is the
1446	   length of the interface identifier is equivalent to support the
1447	   number of nodes supported by that subnet. In order to achieve this,
1448	   the router itself or a server in that subnet needs to maintain a
1449	   storage which will generate the interface identifier based on the
1450	   request from individual hosts.  It may be desirable that interface
1451	   identifiers are generated from DHCP servers. With the option of
1452	   generating interface identifier through DHCP, changes in the auto
1453	   configuration process can be looked at as follows:

1455	   From the point of view of a host, it can be considered as a two step
1456	   process. Host needs to send Router Solicitations message to find out
1457	   the presence of a router. Router Advertisement message should include
1458	   an option field which will inform whether prefix information should
1459	   be configured through Router Advertisement or through DHCP.  Host
1460	   needs to send a request message to get the interface identifier.  If
1461	   both the information needs to be obtained from a DHCP server they can
1462	   be obtained through a single message.

1464	   From the server's point of view, it needs to maintain a database for
1465	   a mapping of the link-layer address and subnet specific interface
1466	   identifier. Lifetime of an interface identifier has to be processed
1467	   in the usual manner the way existing DHCP implementation treats IP
1468	   addresses.

1470	   There seem to be another possible danger to obtain prefix information
1471	   through Router Advertisement. As the Router Advertisement comes in
1472	   the form of ICMP messages, once it is received by the ICMP layer, it
1473	   looses information from which interface the message has been received
1474	   (This problem arises for hosts that are having multiple interfaces
1475	   and not all of them are attached to the same subnet).  So, auto
1476	   configuration of a host has to be performed one interface at a time
1477	   by making all other interfaces disabled. Once configuration of all
1478	   the interfaces are done, all of them have to be enabled.

1480	   If it is expected that hosts should reconfigure their addresses
1481	   dynamically based on Router Advertisement message, Router
1482	   Advertisement needs to generate a special message for a certain
1483	   amount of time that needs to include old prefix and the corresponding
1484	   new prefix in the message.

1486	   In order to support multihoming[8], prefix information needs to
1487	   include the fields 'default router' and 'next hop address' to reach
1488	   the default router for each of the prefixes.

1490	   In a 64 bit architecture, link-local address can be formed with a
1491	   link-local prefix and link-layer address in a suitable manner; say it
1492	   can be formed with a 4 bit link-local prefix followed by a 60 bit
1493	   link-layer address. IPv6 supports Modified EUI-64 format for hardware
1494	   that supports 48 bit addressing by inserting a padding of 16 bit (FF
1495	   FE) in between company_id and manufacturer selected extension
1496	   identifier. In order to make things work, this padding has to be
1497	   reduced to 12 bit. For hardware that support E.164 format, uses a 15
1498	   digits number in BCD format followed by a padding of four bits set to
1499	   1111. Thus in this case, link local address can be formed with the
1500	   link-local prefix followed by the most significant 60 bit of E.164
1501	   format.

1503	   Section 3.1 of RFC 7421[18] states "It is sometimes suggested that
1504	   assigning a prefix such as /48 or /56 to every user site (including
1505	   the smallest) as recommended by [RFC6177] is wasteful.  In fact, the
1506	   currently released unicast address space, 2000::/3, contains 35
1507	   trillion /48 prefixes ((2**45 = 35,184,372,088,832), of which only a
1508	   small fraction have been allocated.  Allowing for a conservative
1509	   estimate of allocation efficiency, i.e., an HD-ratio of 0.94
1510	   [RFC4692], approximately 5 trillion /48 prefixes can be allocated.
1511	   Even with a relaxed HD-ratio of 0.89, approximately one trillion /48
1512	   prefixes can be allocated.  Furthermore, with only 2000::/3 currently
1513	   committed for unicast addressing, we still have approximately 85% of
1514	   the address space in reserve.  Thus, there is no objective risk of
1515	   prefix depletion by assigning /48 or /56 prefixes even to the
1516	   smallest sites."

1518	   So, each customer network can be assigned a /48 prefix, i.e 80 bits
1519	   address space.

1521	   In IPv4, class A(24 bits), class B(16 bits) and class C(8 bits)
1522	   networks were classified with the thoughts in mind that there will be
1523	   very few large networks (class A), a large number of mid sized
1524	   networks (class B) and a very large number of small sized networks
1525	   (class C).  If we go back to the assignment of address space in IPv4,
1526	   before the emergence of CIDR, class B address space were getting
1527	   exhausted very fast.  Moreover, it was realized that 16 bits class B
1528	   address space is way too large compared to the requirement of most of
1529	   the mid sized networks [2]. So, if we look at the actual need of
1530	   customer networks, on the average, it needs less than 16 bits (say, m
1531	   bits) address space.

1533	   So, if 80 bits address space is used for each customer network in
1534	   IPv6, more than 64 bits will remain unused on the average. In effect,
1535	   out of 128 bits, less than 64 bits will be of actual use. i.e. if RFC
1536	   7421 justifies 128 bits address space as good enough for the need of
1537	   this world, 64 bits address space will satisfy the need of this world
1538	   when customer networks are assigned address space based on their
1539	   sizes.

1541	   Where ever one network gets satisfied with 80 bits address space
1542	   based on RFC 7421, 2^(16-m) networks get satisfied with 16 bits
1543	   address space if customer networks are assigned address space based
1544	   on their sizes. If total M networks with /48 prefixes can be
1545	   satisfied with 128 bits address space based on RFC 7421, total
1546	   M*2^(16-m) networks will be satisfied with 64 bits address space once
1547	   networks are assigned address space based on their sizes.

1549	7. Distributed processing and Multicasting

1551	   With the inherent hierarchy involved in this architecture,
1552	   distributed applications can also be structured in a suitable manner.
1553	   Say, for a commonly used web based application a master level server
1554	   will be there at every top level node. Any change that might happen
1555	   in the application, has to be synchronized within these master level
1556	   servers first. There might be servers at the middle layer (inside
1557	   each inter-AS-bottom) inside each top level node. Once the changes
1558	   get reflected at the master node, all the servers at the middle layer
1559	   needs to update themselves with their master level node. This will
1560	   reduce network traffic substantially. Inherent hierarchy in the
1561	   architecture will also help establishing multicast tree in the
1562	   similar manner. Work on these issues can be progressed only after
1563	   this architecture gets approved.

1565	8. Transition to real IP from private IP

1567	   Both CIDR and mesh structured hierarchy expects a VLSM tree at the
1568	   bottom. In VLSM, in real IP space with provider assigned (PA)
1569	   addresses, assignment of network resources has to be associated with
1570	   the address space to be used with the type of service. Within a
1571	   typical switch supporting multiple types of ports, a line card of
1572	   strength OC48 can be replaced with 4 line cards of strength OC12. An
1573	   OC12 card may also be replaced with 4 OC3 cards. An OC12 card may be
1574	   attached to another switch with DS3 ports and so on. When it reaches
1575	   to the customer network port density of a switch has to be directly
1576	   proportional to the address block that a customer network will be
1577	   assigned to. i.e. each customer network has to be assigned a block of
1578	   address space (say, 128, 256, 512, 1K, 2K etc). Within the switch
1579	   these ports have to be assigned net address/net mask the way VLSM
1580	   works.

1582	   In IPv4 environment, providers have provided services in terms of
1583	   bandwidth of the ports say, 2 Mbps/4 Mbps/1 Gbps line etc. If these
1584	   ports were assigned addresses based on the number of users of the
1585	   customer network, transition from private IP to real IP is simple.
1586	   Consider a switch that has supplied 2 Mbps line to a set of customers
1587	   with number of users within 1K to 2k, each of them will be assigned a
1588	   block of 2K each. But if number of users are not proportional to the
1589	   bandwidth used, say same 2 Mbps line were used to customers of sizes
1590	   1K, 2K 4K and 16K respectively reorganization will be needed if
1591	   possible. This rearrangement may be possible within the switch itself
1592	   or by connecting ports of appropriate sizes from different switch,
1593	   otherwise each of them has to be assigned an address block of 16K
1594	   each or with the way VLSM works whatever is suitable. So, address
1595	   block assignment in the VLSM tree has to grow in a bottom up
1596	   approach.

1598	   Thus, transition of existing provider network without (or very
1599	   little) rearrangement to a real IP space with CIDR based approach is
1600	   apparently not a difficult job. In a CIDR based approach, sizes of
1601	   the VLSM trees are heterogeneous that leads to number of routing
1602	   entries to be very high. Mesh structured hierarchy is convenient to
1603	   reduce the routing overhead as well as for distribution of network
1604	   resources in a suitable manner in the long run. To covert CIDR based
1605	   approach to mesh structured hierarchy requires reorganization mainly
1606	   in the routing domain and by splitting trees of very large sizes (>24
1607	   bit address space) at the top.

1609	   Mesh structured hierarchy makes use of a large address space and
1610	   distributes the entire space into some regions and sub regions inside
1611	   each region by maintaining flat address space in each layer for the
1612	   convenience of routing and distribution. It shows that 64 bit address
1613	   space is good enough for all practical purposes. If address space
1614	   gets assigned based on the actual need of the customer networks,
1615	   there will be lots of unused address space within 64 bit address
1616	   space. If CIDR based hierarchy is maintained, unused address space
1617	   will be much higher.

1619	9. IANA Consideration

1621	   IANA has assigned RSVP class number <IANA_TBD1> for the object
1622	   VPN_LABEL and RSVP class number <IANA_TBD2> for VPN_ATTRIBUTE. IANA
1623	   has also assigned an error sub-code <IANA_TBD3> for VPN label
1624	   allocation error under Error Code = 24. IANA has assigned a port
1625	   number <IANA_TBD4> and service name <IANA_TBD5> for PI address
1626	   resolution for both TCP and UDP.

1628	10. Security Consideration

1630	   This document does not include any security related issues.

1632	11. Acknowledgments

1634	   The author would like to thank to Professor Amitava Datta of
1635	   University of Western Australia for his review and constructive
1636	   comments.

1638	12. Normative References

1640	   [1]  Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms for
1641	        IPv6 Hosts and Routers", RFC 4213, October 2005.

1643	   [2]  Fuller V., Li. T., "Classless Inter-Domain Routing (CIDR): The
1644	        Internet Address Assignment and Aggregation Plan", RFC 4632,
1645	        August 2006.

1647	   [3]  Huston, G., "Commentary on Inter-Domain Routing in the
1648	        Internet", RFC 3221, December 2001.

1650	   [4]  Q. Vohra, E. Chen., "BGP Support for Four-octet AS Number
1651	        Space", RFC 4893, May 2007.

1653	   [5]  Srisuresh, P. and K. Egevang, "Traditional IP Network Address
1654	        Translator (Traditional NAT)", RFC 3022, January 2001.

1656	   [6]  J. Moy., "OSPF Standardization Report", RFC 2329, April 1998

1658	   [7]  E. Rosen, Y. Rekhter, "BGP/MPLS IP Virtual Private Networks
1659	        (VPNs)", RFC 4364, February 2006.

1661	   [8]  S. Bandyopadhyay, "Solution for Site Multihoming in a Real IP
1662	        Environment", <draft-shyam-site-multi-44> work in progress.

1664	   [9]  C. Perkins, Ed., D. Johnson, J. Arkko, "Mobility Support in
1665	        IPv6" RFC 6275, July 2011.

1667	   [10] P.V. Mockapetris., "Domain names - concepts and facilities",
1668	        RFC 1034, November 1987.

1670	   [11] P.V. Mockapetris, "Domain names - implementation and
1671	        specification", RFC 1035, November 1987.

1673	   [12] D. Eastlake 3rd, "Domain Name System (DNS) IANA
1674	        Considerations", RFC 5395, November 2008.

1676	   [13] E. Lewis, A. Hoenes, Ed., "DNS Zone Transfer Protocol (AXFR)",
1677	        RFC 5936, June 2010.

1679	   [14] M. Ohta, "Incremental Zone Transfer in DNS", RFC 1995,
1680	        August 1996.

1682	   [15] P. Vixie, "A Mechanism for Prompt Notification of Zone Changes
1683	        (DNS NOTIFY)", RFC 1996, August 1996.

1685	   [16] S. Thomson, T. Narten, T. Jinmei, "IPv6 Stateless Address
1686	        Autoconfiguration", RFC 4862, September 2007.

1688	   [17] D. Awduche, L. Berger, D. Gan, T. Li, V. Srinivasan, G. Swallow,
1689	        "RSVP-TE: Extensions to RSVP for LSP Tunnels", RFC 3209,
1690	        December 2001.

1692	   [18] B. Carpenter, Ed., T. Chown, F. Gont, S. Jiang, A. Petrescu,
1693	        A. Yourtchenko, "Analysis of the 64-bit Boundary in IPv6
1694	        Addressing", RFC 7421, January 2015.

1696	13. Informative References

1698	   [19] Postel, J., "Internet Protocol", STD 5, RFC 791,
1699	        September 1981.

1701	   [20] Rekhter, Y., and T., Li, "A Border Gateway Protocol 4 (BGP-
1702	        4)",RFC 1771, March 1995.

1704	   [21] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6)
1705	        Specification, RFC 1883, December 1995.

1707	   [22] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998.

1709	   [23] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6)
1710	        Specification", RFC 2460, December 1998.

1712	   [24] Rosen, E., Viswanathan, A. and R. Callon, "Multiprotocol
1713	        Label Switching Architecture", RFC 3031, January 2001.

1715	14. Author's Address

1717	   Shyamaprasad Bandyopadhyay
1718	   HL No 205/157/7, Kharagpur 721305, India
1719	   Phone: +91 3222 225137
1720	   e-mail: shyamb66@gmail.com