idnits 2.17.1 

draft-shyam-real-ip-framework-39.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 605 has weird spacing: '...lent to  the a...'

  -- The document date (July 23, 2017) is 2468 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Unused Reference: '16' is defined on line 1369, but no explicit
     reference was found in the text

  == Unused Reference: '17' is defined on line 1372, but no explicit
     reference was found in the text

  == Unused Reference: '18' is defined on line 1375, but no explicit
     reference was found in the text

  == Unused Reference: '19' is defined on line 1378, but no explicit
     reference was found in the text

  == Unused Reference: '20' is defined on line 1380, but no explicit
     reference was found in the text

  == Unused Reference: '21' is defined on line 1383, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 4893 (ref. '4') (Obsoleted by RFC 6793)

  ** Obsolete normative reference: RFC 5395 (ref. '12') (Obsoleted by RFC
     6195)

  -- Obsolete informational reference (is this intentional?): RFC 1771 (ref.
     '17') (Obsoleted by RFC 4271)

  -- Obsolete informational reference (is this intentional?): RFC 1883 (ref.
     '18') (Obsoleted by RFC 2460)

  -- Obsolete informational reference (is this intentional?): RFC 2460 (ref.
     '20') (Obsoleted by RFC 8200)


     Summary: 2 errors (**), 0 flaws (~~), 8 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	INTERNET DRAFT                                          S. Bandyopadhyay
3	draft-shyam-real-ip-framework-39.txt                       July 23, 2017
4	Intended status: Experimental
5	Expires: January 23, 2018

7	    An Architectural Framework of the Internet for the Real IP World
8	                  draft-shyam-real-ip-framework-39.txt

10	Abstract

12	   This document tries to propose an architectural framework of the
13	   internet in the real IP world. It describes how a three-tier mesh
14	   structured hierarchy can be established in a large address space
15	   based on fragmenting it into some regions and some sub regions inside
16	   each of them. It addresses issues which could be relevant to this
17	   architecture in the context of IPv6. It shows how to make a
18	   transition from private IP to real IP without making significant
19	   changes with the existing network.

21	Status of this Memo

23	   This Internet-Draft is submitted in full conformance with the
24	   provisions of BCP 78 and BCP 79.

26	   Internet-Drafts are working documents of the Internet Engineering
27	   Task Force (IETF).  Note that other groups may also distribute
28	   working documents as Internet-Drafts.  The list of current Internet-
29	   Drafts is at http://datatracker.ietf.org/drafts/current/.

31	   Internet-Drafts are draft documents valid for a maximum of six months
32	   and may be updated, replaced, or obsoleted by other documents at any
33	   time.  It is inappropriate to use Internet-Drafts as reference
34	   material or to cite them other than as "work in progress."

36	   This Internet-Draft will expire on January 23, 2018.

38	Copyright Notice

40	   Copyright (c) 2017 IETF Trust and the persons identified as the
41	   document authors. All rights reserved.

43	   This document is subject to BCP 78 and the IETF Trust's Legal
44	   Provisions Relating to IETF Documents
45	   (http://trustee.ietf.org/license-info) in effect on the date of
46	   publication of this document. Please review these documents
47	   carefully, as they describe your rights and restrictions with respect
48	   to this document.

50	Table of Contents
51	   1. Introduction.....................................................2
52	   2. Background.......................................................3
53	   3. A Three tier mesh structured hierarchical network................4
54	      3.1. Route propagation...........................................5
55	      3.2. Determination of prefix lengths.............................7
56	           3.2.1. A pseudo optimal distribution of prefixes in
57	                  a 64bit architecture.................................8
58	           3.2.2. Whether to go for a two tier or three tier hierarchy
59	                  .....................................................9
60	      3.3. Issues related to Satellite communications.................10
61	   4. Provider Independent addressing, name services and multihoming..11
62	      4.1. PI address Resolution......................................12
63	           4.1.1. Record Format.......................................16
64	           4.1.2. Messages............................................17
65	           4.1.3. Master file and data file...........................19
66	           4.1.4. Zone maintenance and transfers......................21
67	   5. Issues related to IP mobility...................................22
68	      5.1. Changes expected with the specifications related
69	           to IP mobility.............................................23
70	   6. Refinements over existing IPv6 specification....................24
71	   7. Distributed processing and Multicasting.........................26
72	   8. Transition to real IP from private IP...........................27
73	   9. IANA Consideration..............................................28
74	   10. Security Consideration.........................................28
75	   11. Acknowledgments................................................28
76	   12. Normative References...........................................28
77	   13. Informative References.........................................29
78	   14. Author's Address...............................................29

80	1. Introduction

82	   Transition from IPv4 to IPv6 is in the process. Work has been done to
83	   upgrade individual nodes (workstations) from IPv4 to IPv6. Also,
84	   there are established documents to make routers/switches to work to
85	   support IPv4 as well as IPv6 packets simultaneously in order to make
86	   the transition possible [1].  CIDR[2] based hierarchical architecture
87	   in the existing 32-bit system is supposed to be continued in IPv6 too
88	   with a large address space. There are documents/concerns over BGP
89	   table entries to become too large in the existing system [3]. There
90	   are proposals to upgrade Autonomous System number to 32-bit from
91	   16-bit to support the demand at the same time [4]. The challenge
92	   relies on how to make the transition smooth from IPv4 to a real IP
93	   world with least changes possible.

95	   The term "real IP environment" is referred to an environment where
96	   hosts in a customer network will possess globally unique IP addresses
97	   and communicate with the rest of the world without the help of
98	   NAT[5]. This document reflects changes required with the BSD 4.4
99	   source code where ever applicable.

101	2. Background

103	   Existing system is in work with Autonomous System (AS) and inter-AS
104	   layer with the approach of CIDR. In order to meet the need within the
105	   32-bit address space, Autonomous Systems of various sizes maintain
106	   CIDR based hierarchical architecture. With the help of NAT [5], a
107	   stub network can maintain an user ID space as large as a class A
108	   network and can meet its useful need to communicate with the rest of
109	   the world with very few real IP addresses. With the combination of
110	   CIDR and NAT applied in the entire space, most of the part of 32-bit
111	   address space gets effectively used as network ID. If the same gets
112	   continued with a larger network ID, load in the switches will become
113	   too high.

115	   With traditional CIDR based hierarchy, a node of higher prefix can be
116	   divided into number of nodes with lower prefixes. Each divided node
117	   can further be subdivided with nodes of further lower prefixes. This
118	   process can be continued till no further division is possible. The
119	   point worth noting is at each point the designer of the network has
120	   to preconceive the future expansion of the network with the concept
121	   in the mind that the resource can not be exhausted at any point of
122	   time. This phenomenon leads the designer to allocate resources much
123	   higher than whatever is needed which leads to a space of unused
124	   address space and the concept of H-D (host-density) ratio comes into
125	   play. The problem gets aggravated once resource gets exhausted by any
126	   chance. e.g. a node of prefix /16 can be divided with a number of
127	   nodes of prefixes /24. If any one of the nodes /24 gets exhausted,
128	   resources of other nodes of prefixes /24 can not be used even if they
129	   are available.

131	   In IPv4 environment, there is a desperate attempt of the service
132	   providers to provide internet services with the help of NAT. e.g. a
133	   large educational institute meets its current requirement with 4 real
134	   IP addresses; one for its mail server, one for its web server, one
135	   for its ftp server and another one for its proxy server to provide
136	   web based services to all of its users. These four types of services
137	   are used by any organization of any size(it may be 400 or even
138	   40000). In the current provider network these organizations are
139	   supported their need with 4 IP addresses and the CIDR based tree has
140	   been built using these components together. When private IP will be
141	   replaced with real IP, each customer network will require IP
142	   addresses based on its size and requirement. Transitioning to real IP
143	   space with provider assigned addresses with CIDR based approach
144	   itself without reorganization of the existing provider network may
145	   not be a difficult task. This will continue with all the problems
146	   associated with routing and problems related to distribution. Mesh
147	   structured hierarchy is convenient to reduce the routing overhead as
148	   well as for distribution of network resources in a suitable manner in
149	   the long run.

151	3. A Three-tier mesh structured hierarchical network

153	   As Autonomous Systems of various sizes are supported, Autonomous
154	   Systems and the nodes inside the Autonomous Systems can be viewed as
155	   graphically lying on the same plane within the address apace. If
156	   network can be viewed as lying on different planes, routing issues
157	   can be made simpler. If network is designed with a fixed length of
158	   prefix for the Autonomous System everywhere, routing information for
159	   the rest will get confined with the other part of the network prefix.
160	   Which means the maximum size of AS gets assigned to all irrespective
161	   of their actual sizes. This can be made possible with the advantage
162	   of using a large address space and dividing it into number of regions
163	   of fixed sizes inside it. Thus entire network can be viewed as a
164	   network of inter-AS layer nodes. Each node in the inter-AS layer can
165	   act either only as a router in the inter-AS layer or as a router in
166	   the inter-AS layer with an Autonomous System attached to it with a
167	   single point of attachment or as an Autonomous System with multiple
168	   Autonomous System border routers (ASBR) appearing like a mesh. Thus
169	   two tier mesh structured hierarchy gets established between AS layer
170	   and inter-AS layer with each AS having a fixed length of prefix.

172	   Based on the definition of Autonomous System, it is a small area
173	   within the entire network that maintains its own independent identity
174	   that communicates with the rest of the world through some specific
175	   border routers. In the similar manner, if a larger area (say region
176	   or state) can be considered as network of Autonomous Systems, that
177	   can maintain its own identity by communicating with the rest of the
178	   world through some border routers (say, state border router), mesh
179	   structured hierarchy can be established within the inter-AS layer.
180	   The inter-AS layer will be split into inter-AS-top and inter-AS-
181	   bottom. To maintain this hierarchy, each node of inter-AS-top needs
182	   to have multiple regional or state border routers (say, SBR) through
183	   which each one will communicate with the rest of the world in the
184	   similar manner an Autonomous System maintains ASBR. Thus, entire
185	   network will appear as a network of nodes of inter-AS-top layer. To
186	   maintain hierarchy, each node of the inter-AS-top needs to have a
187	   fixed length of prefix. i.e. each node of the inter-AS top will be
188	   assigned a maximum (fixed) number of nodes of Autonomous Systems.

190	   Thus, with three-tier mesh structured hierarchy in the network layer,
191	   network ID can be viewed as A.B.C. If pA, pB and pC be the prefix
192	   lengths of inter-AS-top, inter-AS-bottom and AS layers respectively,
193	   there will be 2^pA nodes at the topmost layer, 2^pB at the inter-AS-
194	   bottom layer and 2^pC nodes at the AS layer. Thus the entire space
195	   gets divided into a fixed number of regions and each region gets
196	   divided into fixed number of sub regions. This division is supposed
197	   to be made based on geography, population density and their demands
198	   and related factors.

200	   Let nMaxInterASTopNodes be the possible maximum number of nodes
201	   assigned at the top most layer and nMaxInterASBottomNodes be that at
202	   the inter-AS-bottom layer and nMaxASNodes at the AS layer. Where
203	   nMaxInterASTopNodes <= 2^pA and nMaxInterASBottomNodes <= 2^pB and
204	   nMaxASNodes <= 2^pC.

206	3.1. Route propagation

208	   With hierarchy established, routing information that gets established
209	   inside a node of inter-AS-top, does not need to be propagated to
210	   another node of inter-AS-top. Entire routing information of inter-AS-
211	   top layer needs to be propagated to inter-AS-bottom layer. So, each
212	   router of inter-AS layer will have two tables of information, one for
213	   the inter-AS-top and another for the inter-AS-bottom of the inter-AS-
214	   top node that it belongs to. BGP (with little modification) will work
215	   very well with a trick applied at the SBRs. Each SBR will not
216	   propagate the routing information of inter-AS-bottom layer of its
217	   domain to another SBR of neighboring domain. i.e. SBR of one top
218	   layer node will propagate routing information only of inter-AS-top
219	   layer to SBR of another top layer node. Inside a node of inter-AS-
220	   top, routing information of inter-AS-top and inter-AS-bottom need to
221	   be propagated from one ASBR to another neighboring ASBR. Inside a top
222	   layer node A, routing information of another top layer node B will
223	   have two parts; one for the list of SBRs through which a packet will
224	   traverse from top layer node A to B and another for the list of ASBRs
225	   through which the packet will traverse from one AS to another inside
226	   A. In terms of BGP, AS_PATH attribute will be split into two parts;
227	   one for the information of the top layer and another for the bottom
228	   layer. Within the same node A routing information of one AS to
229	   another AS will not have any top layer information. i.e. the top
230	   layer information will be set to as NULL.

232	   Similarly, each node of the AS layer will have three tables of
233	   routing entries. One for the inter-AS-top, one for the inter-AS-
234	   bottom and another for the routing information inside the Autonomous
235	   System itself.

237	   Introduction of hierarchy at the inter-AS layer reduces the size of
238	   the routing table substantially. With the availability of hardware
239	   resources if flat address space is maintained at each layer, problems
240	   related to CIDR can be avoided. With flat address space, no
241	   hierarchical relationship needs to be established between any two
242	   nodes in the same layer. So, all the nodes inside each layer can be
243	   used till they get exhausted. With flat address space (i.e.  without
244	   prefix reduction), BGP tables will have maximum nMaxInterASTopNodes +
245	   nMaxInterASBottomNodes entries.

247	   IGP like OSPF has got provision to divide AS into smaller areas. OSPF
248	   hides the topology of an area from the rest of the Autonomous System.
249	   This information hiding enables a significant reduction in routing
250	   traffic. With the support of subnetting, OSPF attaches an IP address
251	   mask to indicate a range of IP addresses being described by that
252	   particular route. With this approach it reduces the size of the
253	   routing traffic instead of describing all the nodes inside it, but
254	   introduces another level of hierarchy. If subnetting concept can be
255	   avoided from the AS layer(with the additional overhead of computation
256	   inside the SPF tree), each area can be configured from a free pool of
257	   addresses based on its requirement dynamically. So, an AS can be
258	   divided into number of areas of heterogeneous sizes with the nodes
259	   from a free pool of address space.

261	   Similarly, the concept of area can be introduced in the inter-AS-
262	   bottom layer the way it works in OSPF. The area border routers in the
263	   inter-AS-bottom layer have to behave exactly in the similar manner
264	   the way an ABR behaves in OSPF.  i.e. an area border router will hide
265	   the topology inside an area to the rest of the world and will
266	   distribute the collected information inside the area to the rest. It
267	   will distribute the collected routing information from outside to the
268	   nodes inside as well. In order to implement this, protocol running in
269	   the inter-AS layer (say BGP) will have to introduce a 'cost' factor.
270	   This cost factor can be interpreted as the cost of propagation of a
271	   packet from one AS to another. The protocols running inside AS layer
272	   (RIP/OSPF, etc) will have to the supply the cost information for a
273	   packet to travel from one ASBR to another. All the protocols must
274	   behave in unison for supplying this information. The cost factor is
275	   needed for a remote node while sending a packet to a node inside an
276	   area while more than one area border routers are equidistant from
277	   that remote node. Thus inter-AS-bottom layer (i.e. one inter-AS-top
278	   level node) can be divided into number of areas of heterogeneous
279	   sizes with nodes of AS from a free pool of address space. BGP adopts
280	   a technique called route aggregation. Along with route aggregation it
281	   reduces routing information within a message. In the similar manner,
282	   introduction of area inside inter-AS-bottom layer will not only
283	   reduce the complexity of the protocol, but will reduce the size of a
284	   BGP packet substantially.

286	   With this architecture, each node(router) inside an AS is represented
287	   as A.B.C.  Each node may or may not be attached with a network which
288	   acts as a leaf node (i.e. a network will not act as a transit). In
289	   order to make use of user-id space properly and to support customer
290	   networks of heterogeneous sizes, the user-ID space needs to be
291	   divided as subnet-ID and user-ID. Profoundly, a VLSM (variable length
292	   subnet mask) type of approach has to be adopted at each node of an
293	   AS. So, each node of the AS layer will act as the root of a tree
294	   whose leaves are independent small customer networks which will act
295	   as stub. As the routing information of inter-AS layer as well as AS
296	   layer need not be passed inside any node of the VLSM tree, each
297	   router inside the tree should maintain default route for any address
298	   outside of its network. With this approach, load on each router of
299	   the service providers will become negligible. Protocols that supports
300	   VLSM with MPLS/VPN has to be implemented inside the tree (inside the
301	   VLSM tree, all the physical ports of a switch have to be configured
302	   with the subnet mask. So, mere MPLS on top of static routing table
303	   should do the rest).

305	   The fundamental assumptions based on which this architecture lies can
306	   be summarized as follows:

308	   i) Entire network can be viewed as a network of regions or states
309	   where each region or state can have its own identity by communicating
310	   with the rest of the world through some state border routers. Each
311	   region or state is a network of Autonomous Systems. Each region as
312	   well as each Autonomous System inside them will have a fixed
313	   (maximum) length of prefix.

315	   ii) Availability of hardware resources is such that flat address
316	   space can be maintained at the inter-AS layer.

318	   Introduction of mesh-structured hierarchy will have several
319	   advantages:

321	      o  Load at each router will get reduced substantially.
322	      o  Concept of CIDR style approach and complexity related to
323	           prefix reduction can be easily avoided.
324	      o  Mesh structured hierarchy will make traffic evenly distributed.
325	      o  Physical cable connection can be optimized.
326	      o  Administrative issues will become easier.

328	3.2. Determination of prefix lengths

330	   With this architecture, IP address can be described as A.B.C.D where
331	   the D part represents the user id. Each router in the inter-AS layer
332	   will have two tables of information, one for the inter-AS-top and
333	   another for the inter-AS-bottom of the inter-AS-top node that it
334	   belongs to. Whereas, each node of the AS layer will have three tables
335	   of routing entries; one for the inter-AS-top, one for the inter-AS-
336	   bottom and another for the routing information inside the Autonomous
337	   System itself. In the worst case. a node inside an AS needs to
338	   maintain nMaxInterASTopNodes + nMaxInterASBottomNodes + nMaxASNodes
339	   entries in its routing table.

341	   The dynamic nature of allocating an area from a free pool of address
342	   space is more frequent at the AS layer than at the inter-AS-bottom
343	   layer. As OSPF supports all the features needed, it can be considered
344	   as default choice in the AS layer.  Existing implementation of OSPF
345	   (Version 2) supports subnetting, by which an entire area can be
346	   represented as a combination of network address and subnet mask. With
347	   this approach, entire routing table gets reduced substantially.  With
348	   the removal of subnetting, all the nodes inside an area will have an
349	   entry inside the routing table (OSPF Version 1). So the deterministic
350	   factor is what is the maximum number of nodes inside an AS OSPF can
351	   support once subnetting support gets removed. So the prefix length of
352	   AS layer will be determined by this factor of OSPF.

354	   With the introduction of hierarchy in the inter-AS layer, number of
355	   entries in the BGP routing table will get reduced substantially. Even
356	   if pA and pB both are selected as 16, number of routing entries come
357	   within the admissible range of existing BGP protocol. But, it is the
358	   responsibility of IANA to come out with a scheme how
359	   nMaxInterASTopNodes and nMaxInterASBottomNodes are to be selected.
360	   Each top level node will have nMaxInterASBottomNodes nodes. It will
361	   be a waste of address space if each country gets assigned a top level
362	   nodes (e.g. china has got a population of 1,306,313,800 people where
363	   as Vatican City has got only 920 according to a census of 2006). So a
364	   moderate value of nMaxInterASBottomNodes is desirable, with which
365	   larger countries will have a number of top level nodes. e.g. each
366	   state of USA can be assigned a top level node. With the introduction
367	   of area in the inter-AS-bottom layer, each top level node can be
368	   divided into number of areas of heterogeneous sizes. So, a group of
369	   neighboring countries with less population can share the address
370	   space of a top level node. Similarly, user-id space has to be decided
371	   based on the largest area VLSM tree should be spanned through. All
372	   these issues are completely geo political and have to be decided by
373	   IANA.

375	3.2.1. A pseudo optimal distribution of prefixes in a 64bit architecture

377	   In order to have optimal use of cable connections, length of the VLSM
378	   tree is expected to be as short as possible. Also any single
379	   organization may prefer to have its user id space to be under the
380	   same network id. So, a 16bit user-id may become insufficient for
381	   places like large university campus, where as 32bit will become too
382	   large. Hence, 24bit user-id will be a moderate one which is the class
383	   A address space in IPv4 (also used as the space for private IP). As
384	   published in 1998 [6], OSPF can support an area with 1600 routers and
385	   30K external LSAs. So, 11 bits are needed to support this space. With
386	   the assumption that OSPF can support much more address space with the
387	   advancement of hardware technology as well as to keep the space open
388	   for future expansions, 12 bits are assigned for the AS layer. 16 bits
389	   are assigned for the inter-AS-bottom layer. So, if on the average,
390	   16bit equivalent space gets used within the user-id space (i.e. one
391	   out of 256) and 8bit equivalent nodes gets used inside an AS (16% of
392	   1600), for a top level node (with 16bit equivalent AS nodes), it will
393	   generate 2^40 IP addresses, which will give 8629 IP addresses per
394	   person in Japan (with a population of 127417200; Japan is at the 10th
395	   position from the top in the population list of the world). So, even
396	   if all the countries with population less than or equal to Japan are
397	   assigned a top level node and all the provinces/states of countries
398	   with larger population are assigned a top level node each, total
399	   number of nodes will come well under 1024. If a number of neighboring
400	   countries with lesser population shares a top level node, total
401	   number of top level nodes will come down further.  This suggests that
402	   62 bit equivalent (10(pA)+16(pB)+12(pC)+24(user-id)) space will be
403	   good enough for unicast addresses. This distribution expects OSPF to
404	   support 65K (64K+1K) external LSAs.

406	   64bit address space may be divided into two 63bit blocks as follows:

408	   i. Global unicast addresses with the most significant bit set to 0.
409	   This space is equally divided into provider assigned (PA) address
410	   space with prefix 00 and provider independent (PI) address space with
411	   prefix 01. Provider independent address space will be used for the
412	   customers who would like to retain their number even after changing
413	   their providers. As routing will be based on PA addresses, each PI
414	   address will be associated to at least one PA address. Section 4
415	   describes issues related to PI addressing in detail.

417	   ii. Address space with the MSB set to 1 will be distributed within
418	   the rest. Each of them will have a fixed prefix which will be
419	   determined with the consultation with IANA.  This distribution will
420	   be based on the requirements and the work that have already been done
421	   in connection to IPv6 along with the following requirements:

423	   a) Router address space: Any node in the router address space will be
424	   designated with a prefix followed by A.B.C.router-id.

426	   b) Address space for multicasting:

428	   c) Address space for private IP: A 32 bit address space should be
429	   good enough for private IP.

431	3.2.2. Whether to go for a two-tier or three-tier hierarchy

433	   Establishment of hierarchy in the inter-AS layer reduces the size of
434	   BGP entries to a great extent, but leads to an improper use of
435	   address space due to geo-political reason. If hierarchy in the inter-
436	   AS space gets removed, entire 26bit (10+16) space will be available
437	   for a single layer and use of inter-AS space will be true to its
438	   sense, but will increase external LSA (and/or number of entries in
439	   the BGP table) dramatically. So, it depends on to what extent OSPF
440	   can support external LSAs. BGP expects the packet length to be
441	   limited to 4096 bytes. BGP manages to make it work with this
442	   limitation with the concept of prefix reduction in the CIDR based
443	   environment.  As the number of inter-AS nodes increases, BGP has to
444	   change this limit in order to make it work in flat address space. The
445	   alternate will be to divide the inter-AS space into number of areas
446	   as defined in section 2.1. The area border routers will advertise the
447	   aggregated information to the rest of the world. BGP may have to
448	   incorporate both the options at the same time.  As the number of
449	   nodes in the inter-AS layer increases, in order to reduce the number
450	   of entries in the routing table, inter-AS space has to be split into
451	   two separate planes.  So, two-tier hierarchy can be considered as an
452	   interim state to go for three-tier hierarchy.  If it so happen that
453	   current available data is good enough to support the present need, it
454	   will be worth to look for to what extent it can support in the
455	   future. Assignment of inter-AS nodes in two-tier hierarchy should be
456	   based on the geographical distribution as if it is part of three-tier
457	   hierarchy.  Otherwise, introduction of three-tier hierarchy in the
458	   future will become another difficult task to go through. Based on the
459	   report of year 2011, BGP supports ~400,000 entries in the routing
460	   table. With this growing trend, BGP may have to change the limit of
461	   packet length even in a CIDR based environment. With the introduction
462	   of two-tier hierarchy, number of entries in the routing table will
463	   come down drastically and with the three-tier approach, it will come
464	   down further.

466	3.3. Issues related to Satellite communications

468	   Establishment of hierarchy in the inter-AS layer expects the only way
469	   any two autonomous systems in two different top level nodes
470	   communicate is through their SBRs. If two autonomous systems inside
471	   the same top level node communicate through satellite, it will be
472	   considered as a direct link between them. Whenever autonomous system
473	   'ASa' of top level node 'A' communicates with autonomous system 'ASb'
474	   of top level node 'B' through satellite, they have to go through
475	   their state border routers. i.e.  satellite port inside 'A' that
476	   communicates with a satellite port inside 'B' will be considered as
477	   state border router. If multiple such ports exists inside node 'A',
478	   all of them will be equidistant from any port inside 'B'.  Which
479	   expects any satellite port inside 'B' to have prior knowledge of list
480	   of autonomous systems that will be under the purview of any port
481	   inside 'A'. So, all the satellite ports of 'A' have to exchange such
482	   group of information with all the satellite ports of 'B' and vice
483	   versa.  These group of autonomous systems can be considered as a
484	   cluster of autonomous systems inside an area of a top level node. If
485	   number of such ports is small, some heuristics can be applied while
486	   assigning AS numbers in order to reduce the processing time during
487	   the circuit establishment phase.  It will become difficult to
488	   maintain such heuristics once the number of such ports becomes large.
489	   So, in case of satellite communication, the advantage of establishing
490	   hierarchy inside inter-AS layer diminishes as the number of satellite
491	   ports increases. If any private corporate maintains its own satellite
492	   channel to communicate between its offices at distant locations, all
493	   of these offices are going to be considered as under the user-id
494	   space of its network. Service providers that provide satellite
495	   services to the end-site customers, can operate in the usual manner
496	   as they will provide connection to customer networks which will act
497	   as stub.

499	4. Provider Independent addressing, name services and multihoming

501	   Provider independent addressing can be conceived as naming a host
502	   with a number. It can be used by customer networks who would like to
503	   retain their number even after changing their service provider; also
504	   it is useful to designate a host uniquely if the customer network is
505	   multihomed. Just like in name services, as address corresponding to a
506	   name needs to be resolved first to initiate communication, the same
507	   is required for PI addressing. Each globally unique PI address will
508	   be associated to at least one global unicast provider assigned
509	   address. For a host with single interface, this number will be same
510	   as the number of service providers the customer network is associated
511	   with.

513	   As either source or destination or both may be multihomed, there
514	   could be multiple paths to communicate between two hosts. This is
515	   required both for name services as well as for PI addressing.

517	   A system call needs to be introduced to get the source address based
518	   on the destination address. If application program needs to use the
519	   destination address directly, it needs to use this system call.

521	   int getcommaddr(int sockfd, struct in_addr *dst, struct addr_pair
522	   *endpts);

524	   'addr_pair' holds the addresses of communication end points as
525	   follows:

527	   struct addr_pair {
528	       struct in_addr src;
529	       struct in_addr dst;

531	   };

533	   'getcommaddr'[8] returns the number of source-destination pairs for
534	   communication; the field 'endpt' will hold the array of these
535	   addresses. The array will be in sorted manner based on the best
536	   possible route.  'sockfd' is used to get the 'type of service'
537	   assigned. So, an application program needs to set its type of service
538	   before using this call.

540	   'getcommaddr needs to call a routine 'getmappedaddr' to resolve the
541	   mapped provider assigned addresses of a provider independent address.

543	   int getmappedaddr(struct in_addr *piaddr, struct in_addr *mpiaddr);

545	   'getmappedaddr' will return number of mapped addresses and 'mpiaddr'
546	   will hold their values.

548	   Users may use name instead of IP address to reach the destination.  A
549	   new system call needs to be introduced 'gethostbynamewithsrcaddr',
550	   which is an extension to 'gethostbyname' as follows:

552	   struct hostent *gethostbynamewithsrcaddr(int sockfd,const char *name,
553	                  int *nroutes, struct addr_pair *endpts);

555	   'gethostbynamewithsrcaddr'[8] takes 'name' and 'sockfd' as input
556	   parameters and finds out the best possible route to reach the
557	   destination. It returns the pointer to the 'hostent' structure as
558	   returned by 'gethostbyname' system call.  The parameter 'nroutes'
559	   gets the number of possible routes to be used and the corresponding
560	   source and destination addresses gets assigned to 'endpts' in sorted
561	   manner. 'sockfd' is used to get the 'type of service' assigned. So,
562	   an application program needs to set its type of service before using
563	   this call.

565	   An application program needs to use these source addresses from the
566	   top (i.e. the 0th) to establish connection with the destination. It
567	   needs to bind source address 'src' and then connect with the
568	   destination address 'dst'.

570	4.1. PI address Resolution

572	   This section tries to come up with a solution for PI address
573	   resolution with the approach of DNS[7] with necessary differences.
574	   Just like name space in DNS, entire address range with prefix 01 will
575	   be the address space used by PI addresses. Servers that will hold the
576	   information of mapping between PI addresses and corresponding PA
577	   addresses will be called as PIMapServers and the programs that will
578	   be used to resolve addresses will be called as PIMapResolvers.

580	   In case of DNS where name is used in hierarchical format to resolve
581	   the addresses, PI address resolution will be based on the prefix of
582	   the PI address used for resolution.  The prefix is determined based
583	   on the architectural model used for the internet.  Based on the
584	   prefix information addresses of a list of servers can be found out
585	   that will act as regional servers which will be used to resolve
586	   mapped PA addresses corresponding to that PI address. A prefix will
587	   serve a fixed address space within entire PI address space. Address
588	   space belonging to a prefix will be distributed within customer
589	   networks of heterogeneous sizes. Address space allocation and the
590	   mapping of associated PA address(es) will be assigned by a regional
591	   authority. The regional authority will be fully responsible for the
592	   operation of regional servers in that region.

594	   Like DNS, there are some root servers which will have some fixed
595	   addresses, under which there are some prefixes which will act as top-
596	   level-domains. In case of CIDR based hierarchy, these prefixes may be
597	   of different prefix lengths which are selected based on the
598	   requirements. Each prefix in a top level domain can further be split
599	   into number of prefixes with the approach of CIDR. This tree
600	   structured hierarchy will be kept on growing till we get prefixes
601	   associated with regional servers. Each prefix associated with a
602	   regional server will be distributed amongst customer networks of
603	   various sizes as well as prefixes that will again be associated with
604	   some regional servers with the approach of CIDR. These regional
605	   servers can be considered as equivalent to  the authoritative name
606	   servers of DNS which are associated with zones. As stated earlier,
607	   prefixes starting with "00" will be assigned for provider assigned
608	   addresses and prefix starting with "01" will be assigned for provider
609	   independent addresses where as prefix starting with "1" will be
610	   assigned for addresses of all other types.

612	   As inherent hierarchy is involved in "Mesh Structured Hierarchy",
613	   this hierarchy goes up to two levels. As usual, there will be some
614	   root servers with fixed assigned addresses. Each root server will
615	   have prefixes with "01.A" that will act like top level domain. Under
616	   each top level domain, there will be entries with prefixes "01.A.B".
617	   Within a region "A.B", every global PA address is represented as
618	   "00.A.B.C.user-id". In order to support customer networks of
619	   heterogeneous sizes with the approach of VLSM, the "user-id" portion
620	   is further divided as "subnet-id.user-id". So, the effective network
621	   prefix of a customer network in PA address space is "00.A.B.C.pa-
622	   subnet-id". Within an "A.B", entire PI address space with prefix
623	   "01.A.B" will be distributed within customer networks of
624	   heterogeneous sizes. So, effective network prefix of a customer
625	   network with PI address will be "01.A.B.pi-subnet-id". A particular
626	   prefix "01.A.B.pi-subnet-id" will be mapped to at least one provider
627	   assigned prefix of same prefix length.  For a multihomed customer
628	   network within "A.B" that receives services from two service
629	   providers will have prefixes "00.A.B.C1.pa-subnet-id1" and
630	   "00.A.B.C2.pa-subnet-id2". A PI address prefix "01.A.B.pi-subnet-id"
631	   of same length will be mapped to both these prefixes of PA address
632	   space. Every region "A.B" will have regional server and backup
633	   server(s) with a maximum limit (say 4) with net addresses
634	   "00.A.B.server1", "00.A.B.server2", "00.A.B.server3" and
635	   "00.A.B.server4".

637	   Each PIMapServer will have a database of records that will have
638	   information to resolve PI addresses. In memory copy of a region will
639	   have an array of records where each record will have the following
640	   format.

642	   +------------+---------+------+-----+-------+-----------+
643	   | NetAddress | NetMask | Type | TTL | NAddr | Addr(1-4) |
644	   +------------+---------+------+-----+-------+-----------+

646	   First two fields "NetAddress/NetMask" represents the PI address range
647	   of a network. "Type" will be either Domain/Referral/Individual/
648	   SingleEntry/Default based on which a query and rest of the fields of
649	   a record have to be processed. A PI address can have maximum four
650	   mapped PA addresses. "Addr1", "Addr2", "Addr3", "Addr4" will hold the
651	   corresponding PA addresses and "NAddr" will hold the number of such
652	   addresses. The field "TTL" is a 32bit integer measured in seconds
653	   which will hold same meaning and approach as defined in the
654	   specification of DNS[7]. When a server receives a query for an
655	   address "X", it extracts the record of the network based on
656	   "NetAddress/NetMask" and "X" from its database. If no matching record
657	   is found, a negative response is sent. Based on the "Type" of the
658	   record, the query is processed in the following manner.

660	   Type=Domain:

662	   This is the most common type. If a customer network would not like to
663	   maintain a map server opts for this option. In this case there will
664	   be one to one mapping between a PI address and corresponding PA
665	   addresses. The fields "Addr1"/"Addr2"/"Addr3"/"Addr4" will hold the
666	   PA Net Addresses corresponding to the PI address of the network.
667	   Server will send the matching record to the resolver with
668	   Type=Domain. Resolver will extract the user-id portion of "X" and
669	   find the corresponding mapped PA addresses based on
670	   "Addr1"/"Addr2"/...etc.

672	   Theoretically, "A.B" portion of a PI address need not match with the
673	   "A.B" portion of the corresponding PA addresses. Consider a large
674	   corporate that has its corporate office and a branch office within
675	   the same region of a particular "A.B" and some other offices with
676	   different values of "A.B". The corporate can maintain a contiguous
677	   range of PI addresses for the ease of its operation. It needs to
678	   split entire PI address range based on its offices and assign the
679	   corresponding PA addresses. In order to minimize the path of a query
680	   it is desirable that "A.B" of a PI address and its corresponding
681	   mapped PA addresses belong to the same region.

683	   Type=Referral:

685	   This is used when an address within the domain "NetAddress"/"NetMask"
686	   has to be processed by another map server. The map server may itself
687	   be another regional server or a server within a customer network.

689	   When a customer network would like to have a direct control for the
690	   mapping of its addresses it needs to opt for this option.
691	   "Addr1"/"Addr2"/"Addr3"/"Addr4" of the database entry will hold the
692	   pointer to the information associated to each map server. "NAddr"
693	   will hold the number of map servers that can be referred. Information
694	   of each server will hold the following values: PI address of the map
695	   server + Number of PA addresses to reach the map server + PA
696	   addresses of the map server. Any one of these map servers need to be
697	   queried for further processing. A server may act either in recursive
698	   mode or in iterative mode based on its implementation just like in
699	   DNS. A large corporate may have different offices and each (or some
700	   of them) may maintain a map server based on their policies.

702	   When a server needs to handle a particular address separately, it
703	   needs to set "NetAddress" with that particular address and all the
704	   bits of "NetMask" will be set to "1". The "Type" field has to be set
705	   as "SingleEntry"(which is similar to the Type Address(A) in terms of
706	   DNS). If some of its addresses need to be handled separately but for
707	   the rest common rule may apply (like Type=Domain), records of the
708	   individual entries should be processed first and then for the rest.
709	   In these cases "Type" has to be set as "Default". So, a server of a
710	   customer network may have database entries with Type=Domain/Referral
711	   /SingleEntry/Default.  It makes sense for a server (or a master file)
712	   to have entries with Type=Default, but from the point of a resolver,
713	   it does not make any sense. So a server needs to extract the PA
714	   addresses and form a record with Type=SingleEntry and send it back to
715	   the resolver.

717	   For a host having multiple interfaces, each interface may be assigned
718	   PA addresses supplied by all the service providers, but it is
719	   desirable that PI address gets mapped to only one of them (preferably
720	   for a CE router, the interface which will have the shortest path will
721	   be mapped PI address with the PA address associated with that CE
722	   router).

724	   Type=Individual:

726	   This is meant for the individual users opting for services like
727	   telephonic services that need to maintain PI address. With this
728	   option a mobile user may maintain its PI address after changing its
729	   service provider. A map server needs to maintain some networks with a
730	   range of PI addresses in its database. When a query for an address
731	   "X" is received, server needs to get the corresponding record where
732	   "Addr1" will hold the pointer to a open file descriptor (or pointer
733	   to the in memory copy) of a separate data file where there will be
734	   one to one mapping between PI address and its corresponding PA
735	   address of all the assigned PI addresses. These networks and
736	   assignment of individual PI addresses have to be done by the regional
737	   authority.

739	   As with Type=Default, Type=Individual does not make any sense to a
740	   resolver. So, server needs to extract PA address and form a record
741	   with Type=SingleEntry and send it back to the resolver.

743	   As stated above, this solution is based on the approach of DNS. For
744	   the ease of implementation and to make use of the existing source
745	   code related to DNS (e.g. BIND) most of the features have been taken
746	   from DNS. Where ever differences arise, the approach followed by this
747	   document has to be accepted.

749	   IANA has to assign a port (e.g. 53 in case of DNS) for its UDP/TCP
750	   based implementation.

752	4.1.1. Record Format

754	   Each record (the way they will appear in a master file or will be
755	   used for communication) will have the following format:

757	   NetAddress/NetMask + Type (8 bit unsigned int) + <TTL> + RDATA (Type
758	   specific information)

760	   Record types are primarily the types of records as described above
761	   along with three other types: SOA (Start of a zone of authority), MPS
762	   (host with Type=SingleEntry that acts as a Map server for this zone)
763	   and DFL (Data File). These types are mainly useful in the context of
764	   processing AXFR/IXFR/NOTIFY/DFAXFR/DFIXFR messages.

766	   Types are defined as follows:

768	   Types               values          comments
769	   -----------------------------------------------------------
770	   SEN (SingleEntry)      1    same as type A(address) in DNS
771	   MPS (MapServer)        2    Map server
772	   DMN (Domain)           3
773	   DEF (Default)          4
774	   REF (Referral)         5
775	   SOA (Start of a zone)  6
776	   IND (Individual)       7
777	   DFL (Data File)        8
778	   -----------------------------------------------------------

780	   RDATA of different types will appear as follows:

782	   Type=SOA:
783	   PI address of server+SERIAL+REFRESH+RETRY+EXPIRE+MINIMUM (meaning and
784	   values of SERIAL/REFRESH/RETRY/EXPIRE/MINIMUM are same as they were
785	   defined in section 3.3.13 of RFC 1035[11])

787	   Type=(SEN/MPS):
788	   NAddr(Number of addresses) + corresponding PA addresses

790	   Type=(DMN/DEF):
791	   NAddr(Number of addresses) + corresponding Net addresses

793	   Type=REF:
794	   NAddr(Number of map server) + for each map server (PI address of map
795	   server + NAddr(Number of addresses of map server) + corresponding PA
796	   addresses))

798	   Type=IND:
799	   NAddr(=1) + full path name of the data file

801	   Type=DFL:
802	   Data file name + SERIAL + Number of records in the data file(32 bit
803	   unsigned int)

805	   While used in communication data file name is used as its length (8
806	   bit unsigned int) followed by the octets of the string.

808	   TTL value of a record has to be set to 0 if it is not relevant or to
809	   accept the value associated with the record of SOA.

811	4.1.2. Messages

813	   In order to support most of the features of DNS, message format has
814	   been retained almost same as that of DNS. So, all the relevant fields
815	   will be processed exactly in the same manner as that have been done
816	   in DNS and all the irrelevant issues have to be ignored. Rest of this
817	   section describes where and how changes have to be made.

819	   As defined in RFC 1035, the top level format of message is divided
820	   into 5 sections (some of which are empty in certain cases) shown
821	   below:

823	       +---------------------+
824	       |        Header       |
825	       +---------------------+
826	       |       Question      | the question for the name server
827	       +---------------------+
828	       |        Answer       | answering part of the question
829	       +---------------------+
830	       |      Authority      | authoritative map server
831	       +---------------------+
832	       |      Additional     | additional information
833	       +---------------------+

835	   The header section has been retained as defined in RFC 5395[12] as
836	   follows:

838	        0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15
839	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
840	       |                      ID                       |
841	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
842	       |QR|   OpCode  |AA|TC|RD|RA| Z|AD|CD|   RCODE   |
843	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
844	       |                QDCOUNT/ZOCOUNT                |
845	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
846	       |                ANCOUNT/PRCOUNT                |
847	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
848	       |                NSCOUNT/UPCOUNT                |
849	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
850	       |                    ARCOUNT                    |
851	       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

853	   The question section will have two parts:
854	   QType(one octet unsigned int)+QData.

856	   Query types are defined as follows:

858	   QTypes       values          comments
859	   -----------------------------------------------------------
860	   SEN            1    query for mapped PA address
861	   SOA            6    query information related to SOA
862	   DFL            8    query information related to data file
863	   DFXFR          249  data file transfer
864	   DFIXFR         250  incremental data file transfer
865	   IXFR           251  incremental authoritative data file xfr
866	   AXFR           252  authoritative data file transfer
867	   -----------------------------------------------------------
868	   QData will hold values based on QType.

870	   Following section describes issues related to QType=SEN.  Issues
871	   related to all other QTypes (i.e. related to file transfer) will be
872	   discussed afterwords.

874	   For QType=SEN(1): QData=PI address that needs to be resolved.

876	   The answer section, authority section and additional section will
877	   have a number of resource records where the number will be specified
878	   in the header.

880	   On receiving a query, map server will return the matching record from
881	   its database.  If response is address, the answer section will hold
882	   the record of any one of these two types: SEN/DMN.

884	   If Type=DMN, resolver needs to extract the mapped addresses as
885	   described in section 4.1.

887	   If Type=DMN, entire address range will appear in the form of
888	   NetAddress/NetMask. This will have advantages while catching data for
889	   any particular address, but getting the information of the entire
890	   address range.

892	   If the response is referral, answer section will be empty and the
893	   authoritative section will hold the record with Type=REF.

895	   If server supports recursion, for each iterative process that it
896	   receives a record with Type=REF, it needs to push the record to the
897	   additional section of the message that needs to be sent to the
898	   resolver. So, additional section will hold the records of Type=REF of
899	   the chain of the tree through which PA addresses have been resolved.

901	4.1.3. Master file and data file

903	   Section 5 of RFC 1035 states:

905	   "Master files are text files that contain RRs in text form.  Since
906	   the contents of a zone can be expressed in the form of a list of RRs
907	   a master file is most often used to define a zone, though it can be
908	   used to list a cache's contents."

910	   Section 5.1 of RFC 1035 states:

912	   "The format of these files is a sequence of entries.  Entries are
913	   predominantly line-oriented, though parentheses can be used to
914	   continue a list of items across a line boundary, and text literals
915	   can contain CRLF within the text.  Any combination of tabs and spaces
916	   act as a delimiter between the separate items that make up an entry.
917	   The end of any line in the master file can end with a comment.  The
918	   comment starts with a ";" (semicolon)."

920	   Master files follow the same approach and format in the line of DNS
921	   as described in section 5 of RFC 1035 with necessary differences.

923	   An example master file may look like as follows:

925	   @ "PI NetAddr"/"Net Mask"  SOA  "PI address of primary server" (
926	                                    20     ; SERIAL
927	                                    7200   ; REFRESH
928	                                    600    ; RETRY
929	                                    3600000; EXPIRE
930	                                    60)    ; MINIMUM
931	   "PI NetAddr"/"Net Mask"    MPS  0  NAddr "PA addresses"
932	   "PI NetAddr"/"Net Mask"    SEN  0  NAddr "PA addresses"
933	   "PI NetAddr"/"Net Mask"    DMN  0  NAddr "Net addresses"
934	   "PI NetAddr"/"Net Mask"    DEF  0  NAddr "Net addresses"
935	   "PI NetAddr"/"Net Mask"    IND  0  NAddr(=1) "Data file name"

937	   A data file contains a sequence of entries where each entry appears
938	   in a separate line. Each entry is a mapping between a PI address and
939	   its associated PA address separated by space(s). Entries are
940	   generally sorted with PI address.  As in case of master file comments
941	   can be inserted with the start of a ";" (semicolon) that will end at
942	   the end of the line.  Data files are commonly associated with the map
943	   servers maintained by regional authority, but they are not generally
944	   associated with the map servers maintained by individual customer
945	   networks. A data file entry may appear to be as follows:

947	   "PI Address" NAddr "PA Addresses"

949	   A map server may have a number of data files. These files have to be
950	   defined in another file (a supporting file, the way boot file
951	   "named.boot" is used in BIND) that will have information of each of
952	   them. An entry in that file will follow the same format of a record
953	   (Type=DFL) and will have the following fields:

955	   "PI NetAddr"/"NetMask" Type(DFL) TTL "Data File Name" SERIAL "Number
956	   of records".

958	   This file will be used to process message with QType=DFL which will
959	   be used to support data file transfer/incremental data file transfer.

961	   For QType=DFL(8): QData="PI NetAddr"/"NetMask" of the desired network
962	   For QType=SOA(6): QData="PI NetAddr"/"NetMask" of the desired zone
963	   A map server will return a record of Type=DFL on receiving a query
964	   with QType=DFL where as it will return a record of Type=SOA on
965	   receiving a query with QType=SOA.

967	4.1.4. Zone maintenance and transfers

969	   Section 4.3.5 of RFC 1034 states:

971	   "The general model of automatic zone transfer or refreshing is that
972	   one of the name servers is the master or primary for the zone.
973	   Changes are coordinated at the primary, typically by editing a master
974	   file for the zone.  After editing, the administrator signals the
975	   master server to load the new zone.  The other non-master or
976	   secondary servers for the zone periodically check for changes (at a
977	   selectable interval) and obtain new zone copies when changes have
978	   been made.

980	   To detect changes, secondaries just check the SERIAL field of the SOA
981	   for the zone.  In addition to whatever other changes are made, the
982	   SERIAL field in the SOA of the zone is always advanced whenever any
983	   change is made to the zone."

985	   Section 1.2 of RFC 5936 states:

987	   "A DNS implementation is not required to support AXFR, IXFR, and
988	   NOTIFY, but it should have some means for maintaining name server
989	   coherency.  A general-purpose DNS implementation will likely support
990	   AXFR (and in the same vein IXFR and NOTIFY), but turnkey DNS
991	   implementations may exist without AXFR."

993	   Zone maintenance and transfer will follow the same approach as DNS
994	   with few minor updates. Frequency of update of data files will be
995	   high compared to the frequency of update of master file. That is why
996	   transfer(/incremental transfer) of data file has been treated
997	   separately from the transfer(/incremental transfer) of master file.

999	   For all the messages of QType=AXFR/DFXFR/IXFR/DFIXFR, QData="PI
1000	   NetAddr"/"NetMask" of the desired zone or the desired network. NOTIFY
1001	   message needs to include which file has been updated followed by the
1002	   related information. So, if master file has been changed, NOTIFY
1003	   message with query type SOA will be sent and query type DFL will be
1004	   sent if a data file has been changed.

1006	   Transfer of master file will be same as transfer of master file in
1007	   DNS followed by transfer of all the data files. i.e. processing of
1008	   AXFR will have the same approach as DNS followed by DFXFR for all the
1009	   data files. In order to make this happen, at the end of transferring
1010	   the contents of the master file, server (of AXFR message) needs to
1011	   send NOTIFY message for all of the data files belonging to that zone
1012	   to the client(i.e. the secondary server). Processing of NOTIFY of a
1013	   data file by the secondary server needs to send DFIXFR to the primary
1014	   if data file already exist; otherwise it needs to send DFXFR.
1015	   Incremental update of master file (IXFR) will be same as IXFR in DNS
1016	   with a minor update. If client of IXFR finds a new data file gets
1017	   introduced, it calls DFXFR corresponding to that data file. Similarly
1018	   if an entry of a data file gets deleted, client deletes corresponding
1019	   data file.

1021	   Processing of DFXFR will have same approach of AXFR in DNS.
1022	   Similarly processing of DFIXFR will have same approach as IXFR in
1023	   DNS.  While transferring a data file record, an equivalent record of
1024	   type SEN needs to be sent with the values of PI address and mapped PA
1025	   address(es) from the record of data file. Where ever a record of type
1026	   SOA is sent while processing AXFR/IXFR in case of DNS, record of type
1027	   DFL needs to be sent while processing DFXFR/DFIXFR.

1029	   For AXFR, IXFR and NOTIFY in DNS, one needs to follow RFC 5936[13],
1030	   RFC 1995[14] and RFC 1996[15] respectively.

1032	5. Issues related to IP mobility

1034	   An interface of a customer network may have several IP addresses
1035	   (e.g. for a multihomed customer site, each interface will have
1036	   multiple global unicast addresses also it may have private
1037	   addresses). For a mobile node that has been moved to a customer
1038	   network which gets service from a service provider and maintains
1039	   private IP addresses, will have at least three IP addresses; provider
1040	   assigned unicast address, private address and its permanent "Home
1041	   Address". The "Home Address" will be aliased with the provider
1042	   assigned address (i.e. the co-located care-of address). So the
1043	   interface structure needs to have an additional field to hold the
1044	   value of care-of address. The PCB structure will have an additional
1045	   field 'inp_lcladdr'.  So 'inp_lcladdr' will have the current provider
1046	   assigned address that a foreign node needs to use for communication.
1047	   The field 'inp_laddr' that is used to hold the value of local address
1048	   will hold the value of "Home Address" of a mobile node. Similarly,
1049	   PCB needs to introduce another field 'inp_fcladdr' to support the
1050	   destination address to be mobile.  The existing field 'inp_faddr'
1051	   which is used to address a foreign address will hold the value of
1052	   "Home Address" of the mobile node. Customers with PI address who
1053	   would like to have mobility support, the mapped address will be
1054	   considered as the "Home Address" of the mobile node.

1056	   An outgoing packet from a mobile node in a foreign site needs to be
1057	   stacked with the associated care-of address. While initiating
1058	   communication, the 'bind' system call needs to go through the
1059	   interface list and fetch the associated structure to check whether
1060	   the source address is aliased or not and needs to fill the value of
1061	   'inp_lcladdr' of PCB accordingly.

1063	   When TCP receives a SYN for connection establishment, it allocates a
1064	   PCB and assigns the values for 'inp_laddr', and related fields.
1065	   During this phase, TCP also needs to check whether the local address
1066	   is aliased or not (based on the fields of interface structure; which
1067	   is applicable for a mobile node at foreign site) and needs to fill
1068	   the values of 'inp_lcladdr' accordingly. Similarly if destination
1069	   address is found to be aliased, based on the stacking type, it needs
1070	   to fill up the field 'inp_fcladdr'.

1072	   IP address stacking can be performed with the approach introduced in
1073	   section 6.4 of RFC6275[9]. RFC6275 talks about the stacking of IP
1074	   addresses for a destination address (Let us call it as type 0
1075	   stacking). Two more types of stacking need to be introduced; type 1
1076	   stacking where only source address will appear in the stack and type
1077	   2 stacking where both source address and destination address will
1078	   appear in the stack with a particular type of ordering.

1080	   Protocol output routine like 'tcp_output' or 'udp_output' needs to
1081	   fill the IP packet in the following manner.

1083	   If the socket contains a valid 'inp_lcladdr', use 'inp_lcladdr' as
1084	   the source address and 'inp_laddr' will appear in the stack. If the
1085	   socket contains a valid 'inp_fcladdr' use 'inp_fcladdr' as the
1086	   destination address and 'inp_faddr' will appear in the stack. If only
1087	   'inp_fcladdr' contains a valid address where as 'inp_lcladdr' is
1088	   NULL, use type 0 stacking. If only 'inp_lcladdr' contains a valid
1089	   address where as 'inp_fcladdr' is set as NULL, use type 1 stacking.
1090	   If both 'inp_lcladdr' and 'inp_fcladdr' contains valid addresses, use
1091	   type 2 stacking.

1093	   Protocol input routine like 'tcp_input' or 'udp_input' needs to
1094	   process the packet in the reverse order based on the type of
1095	   stacking.  For type 0 stacking, use the address in the stack as the
1096	   destination address; for type 1 stacking, use the address in the
1097	   stack as the source address; for type 2 stacking use both source
1098	   address and destination address from the stack.

1100	5.1. Changes expected with the specifications related to IP mobility

1102	   RFC6275 demands correspondent node binding from mobile nodes for
1103	   route optimization. This binding is required when a connection gets
1104	   established as well as when the mobile node changes it address space.
1105	   There are application like HTTP which opens up multiple connections
1106	   on the run time which are very short lived. If mobile nodes need to
1107	   send binding messages for all the connections, network will be
1108	   unnecessarily congested. This congestion can be avoided with the
1109	   establishment of binding at the time of connection establishment
1110	   itself.  So, if TCP server happens to be mobile, it will set the
1111	   value of 'inp_lcladdr' in the stack while sending SYN+ACK. TCP client
1112	   which initiates communication through 'connect' needs to set
1113	   'inp_fcladdr' field on receiving TCP+ACK. With this approach
1114	   correspondent node binding messages need to be sent only when a
1115	   mobile node changes its position from one address space to another.

1117	   Route optimization is not applicable to applications which are of
1118	   multicast type.  In these cases packets need to be forwarded with the
1119	   mechanism of reverse tunneling with the approach of "IP Encapsulation
1120	   within IP" as defined in RFC2003.  In order to support packet
1121	   delivery with route optimization method as well as with
1122	   "Encapsulating Delivery Style" based on the application type the
1123	   protocol control block needs to introduce another field
1124	   'inp_hagentaddr' to hold the address of the home agent of the mobile
1125	   node. The interface structure also needs to have same field. The
1126	   'bind' system call needs to go through the interface list to fetch
1127	   'inp_hagentaddr' to the PCB along with 'inp_lcladdr' as described
1128	   earlier. So, protocol output routines like 'tcp_output', 'udp_output'
1129	   need to fill up the packets based on the application type. In
1130	   "Encapsulating Delivery Style" packets need to be formed in the
1131	   following manner.

1133	   The inner IP header will contain
1134	      Source Address: Home address of the mobile node
1135	      (i.e. 'inp_laddr')
1136	      Destination address: Address of the correspondent node
1137	      (i.e. 'inp_faddr')
1138	   The outer IP header will contain
1139	      Source Address: co-located care of address of the mobile node
1140	      (i.e. 'inp_lcladdr')
1141	      Destination Address: Address of the home agent of the mobile node
1142	      (i.e. 'inp_hagentaddr')
1143	   Protocol field: IP in IP

1145	6. Refinements over existing IPv6 specification

1147	   As IPv6 was envisioned long before some of the newer technologies
1148	   e.g. MPLS came into picture, some refinements can be made over the
1149	   existing specification. These considerations are related to bandwidth
1150	   usages and performance inside switches. Experimental results show
1151	   that smaller packet size gives better result for the processing of RT
1152	   packets.  So, it is desirable to have IP packet header to be as small
1153	   as possible.

1155	   As described earlier, evaluation of the parameters
1156	   nMaxInterASTopNodes, nMaxInterASBottomNodes and nMaxASNodes is geo-
1157	   political and have to be decided by IANA. Once these parameters are
1158	   determined with mutual agreements, values of pA, pB, pC and prefix
1159	   length of user id can be determined. With 64bit address space, IP
1160	   header will be reduced by 16 bytes.

1162	   The 'flow label' field of IPv6 packet header may not be of any use
1163	   with MPLS is in use. ATM used to have 4 priority classes. The first
1164	   specification of IPv6 RFC-1883 used a 4bit type of service field
1165	   along with a 24bits flow label field. These two were modified to a
1166	   8bit type of service field and a 20bit flow label field in the
1167	   current spec RFC-2460.  Too many priority classes may increase
1168	   complexities to process inside switches. If type of service field of
1169	   IPv6 header may be reduced to be of 4bit length as it was stated in
1170	   RFC-1883 and 'flow label' field gets removed, another three bytes may
1171	   be reduced from the IPv6 header.

1173	   The field 'Hop Limit' has got a 8bit value in the existing spec. The
1174	   role of this field needs to be discussed properly with a large
1175	   address space.

1177	   RFC4862[10] introduces the concept of "Stateless auto configuration"
1178	   with the goal in mind that no manual configuration is required by
1179	   individual machines before connecting them to the network. It
1180	   generates a link local address with a link-local prefix and the link
1181	   address (e.g. Ethernet/E.164 for ISDN) first. This link local address
1182	   is used to configure global unicast address and any other
1183	   configurable parameters based on router advertisement.  Global
1184	   unicast addresses are generated by the prefix supplied by the router
1185	   advertisement and the link specific interface identifier. This
1186	   identifier can be as large as 64 bit length. So irrespective of the
1187	   size of the network (it may be 10000 or 100 or even less than that)
1188	   every customer network will consume a 64bit equivalent addresses.
1189	   This seems to be a huge blunder. What is expected is the length of
1190	   the interface identifier is equivalent to support the number of nodes
1191	   supported by that subnet. In order to achieve this the router itself
1192	   or a server in that subnet needs to maintain a storage which will
1193	   generate the interface identifier based on the request from
1194	   individual hosts.  It may be desirable that interface identifiers are
1195	   generated from DHCP servers. With the option of generating interface
1196	   identifier through DHCP, changes in the auto configuration process
1197	   can be looked at as follows:

1199	   From the point of view of a host, it can be considered as a two step
1200	   process. Host needs to send Router Solicitations message to find out
1201	   the presence of a router. Router Advertisement message should include
1202	   an option field which will inform whether prefix information should
1203	   be configured through Router Advertisement or through DHCP.  Host
1204	   needs to send a request message to get the interface identifier.  If
1205	   both the information needs to be obtained from a DHCP server they can
1206	   be obtained through a single message.

1208	   From the server's point of view, it needs to maintain a database for
1209	   a mapping of the link-layer address and subnet specific interface
1210	   identifier. Lifetime of an interface identifier has to be processed
1211	   in the usual manner the way existing DHCP implementation treats IP
1212	   addresses.

1214	   There seem to be another possible danger to obtain prefix information
1215	   through Router Advertisement. As the Router Advertisement comes in
1216	   the form of ICMP messages, once it is received by the ICMP layer, it
1217	   looses information from which interface the message has been received
1218	   (This problem arises for hosts that are having multiple interfaces
1219	   and not all of them are attached to the same subnet).  So, auto
1220	   configuration of a host has to be performed one interface at a time
1221	   by making all other interfaces disabled. Once configuration of all
1222	   the interfaces are done, all of them have to be enabled.

1224	   If it is expected that hosts should reconfigure their addresses
1225	   dynamically based on Router Advertisement message, Router
1226	   Advertisement needs to generate a special message for a certain
1227	   amount of time that needs to include old prefix and the corresponding
1228	   new prefix in the message.

1230	   In order to support multihoming[8], prefix information needs to
1231	   include the fields 'default router' and 'next hop address' to reach
1232	   the default router for each of the prefixes.

1234	   In a 64bit architecture, link-local address can be formed with a
1235	   link-local prefix and link-layer address in a suitable manner; say it
1236	   can be formed with a 16bit link-local prefix followed by a 48bit
1237	   link-layer address. For hardware that supports more than 48bit
1238	   addressing (say E.164), the least significant 48bits may be
1239	   considered to generate link-local addresses.

1241	7. Distributed processing and Multicasting

1243	   With the inherent hierarchy involved in this architecture,
1244	   distributed applications can also be structured in a suitable manner.
1245	   Say, for a commonly used web based application a master level server
1246	   will be there at every top level node. Any change that might happen
1247	   in the application, has to be synchronized within these master level
1248	   servers first. There might be servers at the middle layer (inside
1249	   each inter-AS-bottom) inside each top level node. Once the changes
1250	   get reflected at the master node, all the servers at the middle layer
1251	   needs to update themselves with their master level node. This will
1252	   reduce network traffic substantially. Inherent hierarchy in the
1253	   architecture will also help establishing multicast tree in the
1254	   similar manner. Work on these issues can be progressed only after
1255	   this architecture gets approved.

1257	8. Transition to real IP from private IP

1259	   Both CIDR based hierarchy and Mesh structured hierarchy expects a
1260	   VLSM tree at the bottom. In VLSM, in real IP space with provider
1261	   assigned (PA) addresses, assignment of network resources has to be
1262	   associated with the address space to be used with the type of
1263	   service. Within a typical switch supporting multiple types of ports,
1264	   a line card of strength OC48 can be replaced with 4 line cards of
1265	   strength OC12. An OC12 card may also be replaced with 4 OC3 cards. An
1266	   OC12 card may be attached to another switch with DS3 ports and so on.
1267	   When it reaches to the customer network port density of a switch has
1268	   to be directly proportional to the address block that a customer
1269	   network will be assigned to. i.e. each customer network has to be
1270	   assigned a block of address space (say, 128, 256, 512, 1K, 2K etc).
1271	   Within the switch these ports have to be assigned net address/net
1272	   mask the way VLSM works.

1274	   In IPv4 environment, providers have provided services in terms of
1275	   bandwidth of the ports say, 2 Mbps/4 Mbps/1 Gbps line etc. If these
1276	   ports were assigned addresses based on the number of users of the
1277	   customer network, transition from private IP to real IP is simple.
1278	   Consider a switch that has supplied 2 Mbps line to a set of customers
1279	   with number of users within 1K to 2k, each of them will be assigned a
1280	   block of 2K each. But if number of users are not proportional to the
1281	   bandwidth used, say same 2 Mbps line were used to customers of sizes
1282	   1K, 2K 10K and 16K respectively reorganization will be needed if
1283	   possible. This rearrangement may be possible within the switch itself
1284	   or by connecting ports of appropriate sizes from different switch,
1285	   otherwise each of them has to be assigned an address block of 16K
1286	   each or with the way VLSM works whatever is suitable. So, address
1287	   block assignment in the VLSM tree has to grow in a bottom up
1288	   approach.

1290	   Thus, transition of existing provider network without (or very
1291	   little) rearrangement to a real IP space with CIDR based approach is
1292	   apparently not a difficult job. In a CIDR based approach, sizes of
1293	   the VLSM trees are heterogeneous that leads to number of routing
1294	   entries to be very high. Mesh structured hierarchy is convenient to
1295	   reduce the routing overhead as well as for distribution of network
1296	   resources in a suitable manner in the long run. To covert CIDR based
1297	   approach to Mesh structured hierarchy requires reorganization mainly
1298	   in the routing domain and by splitting trees of very large sizes (>24
1299	   bit address space) at the top.

1301	   Section 3.2.1 reveals that in Mesh structured hierarchy a 64bit
1302	   architecture will be good enough for our need in a provider assigned
1303	   (PA) address space; the same is true for CIDR based approach as well.

1305	9. IANA Consideration

1307	   This is a first level draft for proposed standard. Hence, IANA
1308	   actions should come into play at a later stage, if needed.

1310	10. Security Consideration

1312	   This document does not include any security related issues.

1314	11. Acknowledgments

1316	   The author would like to thank to Professor Amitava Datta of
1317	   University of Western Australia for his review and constructive
1318	   comments.

1320	12. Normative References

1322	   [1]  Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms for
1323	        IPv6 Hosts and Routers", RFC 4213, October 2005.

1325	   [2]  Fuller V., Li. T., "Classless Inter-Domain Routing (CIDR): The
1326	        Internet Address Assignment and Aggregation Plan", RFC 4632,
1327	        August 2006.

1329	   [3]  Huston, G., "Commentary on Inter-Domain Routing in the
1330	        Internet", RFC 3221, December 2001.

1332	   [4]  Q. Vohra, E. Chen., "BGP Support for Four-octet AS Number
1333	        Space", RFC 4893, May 2007.

1335	   [5]  Srisuresh, P. and K. Egevang, "Traditional IP Network Address
1336	        Translator (Traditional NAT)", RFC 3022, January 2001.

1338	   [6]  J. Moy., "OSPF Standardization Report", RFC 2329, April 1998

1340	   [7]  P.V. Mockapetris., "Domain names - concepts and facilities",
1341	        RFC 1034, November 1987.

1343	   [8]  S. Bandyopadhyay, "Solution for Site Multihoming in a Real IP
1344	        Environment", <draft-shyam-site-multi-41> work in progress.

1346	   [9]  C. Perkins, Ed., D. Johnson, J. Arkko, "Mobility Support in
1347	        IPv6" RFC 6275, July 2011.

1349	   [10] S. Thomson, T. Narten, T. Jinmei, "IPv6 Stateless Address
1350	        Autoconfiguration", RFC 4862, September 2007.

1352	   [11] P.V. Mockapetris, "Domain names - implementation and
1353	        specification", RFC 1035, November 1987.

1355	   [12] D. Eastlake 3rd, "Domain Name System (DNS) IANA
1356	        Considerations", RFC 5395, November 2008.

1358	   [13] E. Lewis, A. Hoenes, Ed., "DNS Zone Transfer Protocol (AXFR)",
1359	        RFC 5936, June 2010.

1361	   [14] M. Ohta, "Incremental Zone Transfer in DNS", RFC 1995,
1362	        August 1996.

1364	   [15] P. Vixie, "A Mechanism for Prompt Notification of Zone Changes
1365	        (DNS NOTIFY)", RFC 1996, August 1996.

1367	13. Informative References

1369	   [16] Postel, J., "Internet Protocol", STD 5, RFC 791,
1370	        September 1981.

1372	   [17] Rekhter, Y., and T., Li, "A Border Gateway Protocol 4 (BGP-
1373	        4)",RFC 1771, March 1995.

1375	   [18] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6)
1376	        Specification, RFC 1883, December 1995.

1378	   [19] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998.

1380	   [20] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6)
1381	        Specification", RFC 2460, December 1998.

1383	   [21] Rosen, E., Viswanathan, A. and R. Callon, "Multiprotocol
1384	        Label Switching Architecture", RFC 3031, January 2001.

1386	14. Author's Address

1388	   Shyamaprasad Bandyopadhyay
1389	   HL No 205/157/7, Kharagpur 721305, India
1390	   Phone: +91 3222 225137
1391	   e-mail: shyamb66@gmail.com