idnits 2.17.1 

draft-dai-tsvwg-pfc-free-congestion-control-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (13 July 2020) is 1376 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Transport Area Working Group                                 H. Dai, Ed.
3	Internet-Draft                                                     B. Fu
4	Intended status: Informational                                    K. Tan
5	Expires: 14 January 2021                                          Huawei
6	                                                            13 July 2020

8	                  PFC-Free Low Delay Control Protocol
9	             draft-dai-tsvwg-pfc-free-congestion-control-00

11	Abstract

13	   Today, low-latency transport protocols like RDMA over Converged
14	   Ethernet (RoCE) can provide good delay and throughput performance in
15	   small and lightly loaded high-speed datacenter networks due to
16	   lossless transport based on priority-based flow control (PFC).
17	   However, PFC suffers from various issues from performance degradation
18	   and unreliability (e.g., deadlock), limiting the deployment of RoCE
19	   to only small scale clusters (~1000).

21	   This document presents LDCP, a new transport that scales loss-
22	   sensitive transports, e.g., RDMA, to entire data-centers containing
23	   tens of thousands machines, without dependency on PFC for
24	   losslessness, i.e., PFC-free.  LDCP develops a novel end-to-end
25	   congestion control scheme and achieves very low queue occupancy even
26	   under high network utilization or large traffic churns, resulting in
27	   almost no packet loss.  Meanwhile, LDCP allows a new flow to jump
28	   start at full speed at the very beginning and therefore minimizes the
29	   latency for short RPC-style transactions.  LDCP relies on only WRED
30	   and ECN, two widely supported features on switches, so it can be
31	   easily deployed in existing network infrastructures.  Finally, LDCP
32	   is simple by design and thus can be easily implemented by
33	   programmable or ASIC NICs.

35	Status of This Memo

37	   This Internet-Draft is submitted in full conformance with the
38	   provisions of BCP 78 and BCP 79.

40	   Internet-Drafts are working documents of the Internet Engineering
41	   Task Force (IETF).  Note that other groups may also distribute
42	   working documents as Internet-Drafts.  The list of current Internet-
43	   Drafts is at https://datatracker.ietf.org/drafts/current/.

45	   Internet-Drafts are draft documents valid for a maximum of six months
46	   and may be updated, replaced, or obsoleted by other documents at any
47	   time.  It is inappropriate to use Internet-Drafts as reference
48	   material or to cite them other than as "work in progress."
49	   This Internet-Draft will expire on 14 January 2021.

51	Copyright Notice

53	   Copyright (c) 2020 IETF Trust and the persons identified as the
54	   document authors.  All rights reserved.

56	   This document is subject to BCP 78 and the IETF Trust's Legal
57	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
58	   license-info) in effect on the date of publication of this document.
59	   Please review these documents carefully, as they describe your rights
60	   and restrictions with respect to this document.  Code Components
61	   extracted from this document must include Simplified BSD License text
62	   as described in Section 4.e of the Trust Legal Provisions and are
63	   provided without warranty as described in the Simplified BSD License.

65	Table of Contents

67	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
68	     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   3
69	   2.  LDCP algorithm  . . . . . . . . . . . . . . . . . . . . . . .   3
70	     2.1.  ECN . . . . . . . . . . . . . . . . . . . . . . . . . . .   4
71	     2.2.  Stable stage algorithm  . . . . . . . . . . . . . . . . .   4
72	     2.3.  Zero-RTT bandwidth acquisition  . . . . . . . . . . . . .   6
73	   3.  Reference Implementation  . . . . . . . . . . . . . . . . . .   8
74	   4.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
75	   5.  Security Considerations . . . . . . . . . . . . . . . . . . .   9
76	   6.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   9
77	     6.1.  Normative References  . . . . . . . . . . . . . . . . . .   9
78	     6.2.  Informative References  . . . . . . . . . . . . . . . . .  10
79	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  10

81	1.  Introduction

83	   Modern cloud applications, such as web search, social networking,
84	   real-time communication, and retail recommendation, require high
85	   throughput and low latency network to meet the increasing demands
86	   from customers.  Meanwhile, new trends in data-centers, like resource
87	   disaggregation, heterogeneous computing, block storage over NVMe,
88	   etc., continuously drive the need for high-speed networks.  Recently,
89	   high-speed networks, with 40Gbps to 100Gbps link speed, are deployed
90	   in many large data-centers.

92	   Conventional software TCP/IP stacks incur high latencies and
93	   substantial CPU overhead, and have limited applications from fully
94	   utilizing the physical network capacities.  RDMA over Converged
95	   Ethernet (RoCE), however, has shown very good low-delay and
96	   throughput performance in small and lightly loaded networks, due to
97	   the ability of OS bypassing and a lossless transport that performs
98	   hop-by-hop flow control, i.e., PFC.  Nevertheless, in a large data-
99	   center network (with tens of thousands of machines) with bursty
100	   traffic, PFC backpressure leads to cascaded queue buildups and
101	   collateral damages to victim flows, resulting in neither Low latency
102	   nor high throughput [Guo2016rdma].  Therefore, high-speed networks
103	   still face fundamental challenges to deliver the three aforementioned
104	   goals.

106	   This document describes LDCP, a scalable end-to-end congestion
107	   control that achieves low latency even under high network
108	   utilization.  The key insight behind LDCP is using ACKs to grant to
109	   or revoke from senders credits, in order to mimic receiver-driven
110	   pulling.  LDCP requires data receivers to reply ACKs as fast as
111	   possible, preferably one ACK for each data packet received (per-
112	   packet ACK).  The congestion window is adjusted on the per-ACK basis
113	   using a parameterized AIMD algorithm.  This algorithm manages to
114	   smooth out the traffic burstiness and stabilize the queue size at
115	   ultra-low level, preventing queue buildups and preserving high link
116	   utilization.  A first-RTT bandwidth acquisition algorithm is also
117	   proposed to allow new flows to start sending at a large rate, but
118	   excessive packets will be actively dropped by WRED if they overwhelm
119	   the network, in order to protect on-going flows.  When heavy
120	   congestion happens due to a large number of concurrent flows
121	   contending for the bottleneck link, e.g., large-scale incast, LDCP
122	   allows the congestion window to be beneath one packet, so the number
123	   of flows that LDCP can endure remarkably increases compared with TCP
124	   or DCTCP.

126	1.1.  Requirements Language

128	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
129	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
130	   document are to be interpreted as described in RFC 2119 [RFC2119].

132	2.  LDCP algorithm

134	   LDCP involves primarily two algorithms: a fast start algorithm that
135	   is used in the first RTT, and a stable stage algorithm that governs
136	   the rest lifespan of a flow.  Each algorithm works with a separate
137	   ECN setting respectively.  Because we want to use as fewer priority
138	   classes as possible, we leverage the common WRED/ECN [CiscoGuide2012]
139	   [RFC3168] feature in commodity switches to support multiple ECN
140	   marking policies within one priority class.

142	2.1.  ECN

144	   LDCP employs WRED/ECN at intermediate switches to mark packets when
145	   congestion happens [Floyd1993random].  Instead of using the average
146	   queue size for marking as in the original RED proposal, LDCP employs
147	   instant queue based ECN to give more precise congestion information
148	   to end hosts [Alizadeh2010data] [Kuzmanovic2005power].  The switch is
149	   configured with four parameters: K_min, K_max, P_max and buf_max, and
150	   it is going to mark a packet with a probability function as follows,

152	   if q < K_min, p = 0

154	   if K_min <= q < K_max, p = (q-K_min)/(K_max-K_min)*P_max

156	   if q >= K_max, p = 1

158	   If q is larger than the maximum buffer of the port (buf_max), the
159	   packet is dropped.  This general ECN model works for both algorithms
160	   developed in LDCP but with different sets of parameters,
161	   respectively.  We will explain below.

163	2.2.  Stable stage algorithm

165	   In stable stage, i.e., rounds after the fast start (sec 2.3), the
166	   flow is in the congestion avoidance state, and LDCP works as follows.

168	   The sender maintains a congestion window (cw) to control the sending
169	   rate of data packets.  The receiver returns ACK packets to confirm
170	   the delivery of these data packets.  Meanwhile, the CE (Congestion
171	   Experienced) flag in data packets are echoed back by ECN-Echo (ECE)
172	   flags in the ACKs.  An ACK that does not carry an ECE flag (ECE=0)
173	   informs the sender that the network is not congested, while an ACK
174	   that carries an ECE flag (ECE=1) informs the sender that the network
175	   is congested.

177	   There could be two possible ways regarding the number of ACKs
178	   generated.  The simplest one is to have the receiver to generate an
179	   ACK for every received data packet (i.e., per-packet ACK) and set the
180	   ECE flag if the corresponding packet has a CE mark.  Alternatively,
181	   if the receiver is busy, it can also employ delayed ACK to generate
182	   an ACK for at most m data packets if they all are not marked, but
183	   would generate an ACK with ECE flag immediately once a CE marked
184	   packet is received.  The goal of this receiver behavior is to ensure
185	   that the sender has precise information of CE marking.  A similar
186	   design is also in [Alizadeh2010data].

188	   An LDCP sender updates the cw upon each ACK arrival according to the
189	   ECE marks, namely per-ACK window adjustment (PAWA).  An ECE=0 flag
190	   increases the cw, while an ECE=1 flag decreases the cw.  When per-
191	   packet ACK is obeyed on the receiver, the update rule is as follows:

193	   if ECN-Echo = 0, cw = cw + alpha/cw

195	   if ECN-Echo = 1, cw = cw - beta --(1)

197	   where alpha and beta are constants, and cw >= 1 (0 < alpha, beta <=
198	   1).

200	   Eq. (1) reveals that if an incoming ACK does not carry an ECE flag
201	   (ECE=0), it grants the sender credits, and cw is increased by alpha/
202	   cw; if the ACK carries an ECE flag (ECE=1), it revokes credits from
203	   the sender, and cw is decreased by beta.

205	   In essence, Eq. (1) implements an additive increase and
206	   multiplicative decrease (AIMD) policy similar to previous work, e.g.,
207	   DCTCP [Alizadeh2010data].  But PAWA, together with per-packet ACK,
208	   has following benefits: Firstly, PAWA reacts to each received ECE
209	   mark (or no mark) immediately, rather than employs a RTT-granularity
210	   averaging process and reacts only once per RTT (like DCTCP does), so
211	   it is more responsive and accurate to congestions.  Secondly, along
212	   with WRED/ECN, PAWA is able to de-synchronize flows.  Instead of
213	   cutting a large portion of cw immediately upon the first ECE-marked
214	   ACK (like ECN-enabled TCP does), LDCP distributes the window
215	   reduction in one round.  Such de-synchronization is effective to
216	   reduce the window fluctuation and stabilize a low queue at the
217	   switches.  Not only that, per-packet ACK allows ACK-clocking to
218	   better pace out the packets: as each ACK confirms the delivery of one
219	   packet, an ACK arrival also clocks out one new packet, hence the
220	   packets are almost equispaced.  Finally, PAWA has a tiny state
221	   footprint, i.e., a single state of cw, and is easy to implement in
222	   hardware compared with DCTCP.

224	   Per-packet ACK and PAWA match the principle in discrete control
225	   systems: increase the controller's action rate but take a small
226	   control step per action.  They are effective in improving the control
227	   stability and accuracy.

229	   If delayed ACK is used on the receiver side, an ACK can confirm the
230	   delivery of multiple (denoted by n) packets, then Eq. (1) becomes:

232	   if ECN-Echo = 0, cw = cw + n * alpha/cw

234	   if ECN-Echo = 1, cw = cw - n * beta --(2)
235	   In extremely congested cases where a large number of flows contending
236	   for the bottleneck link, e.g., heavy incast with thousands of
237	   senders, even each flow maintains a window of merely one packet,
238	   large queue sizes would still be caused.  To handle these situations,
239	   LDCP allows cw to further reduce beneath one packet.  A flow with a
240	   cw < 1 is ticked out by a timer, whose timeout is set as RTT/cw.
241	   Accordingly, the cw updating rule is,

243	   if ECN-Echo = 0, cw = cw + gamma

245	   if ECN-Echo = 1, cw = max{gamma,eta * cw}

247	   where cw < 1.  We choose eta = 1/2. gamma is the increase step when
248	   ACK is not marked ECE, and is also the minimum window size (typical
249	   values of gamma include 1/4, 1/8, 1/16).

251	2.3.  Zero-RTT bandwidth acquisition

253	   Setting up an initial rate at the very beginning of a flow is
254	   challenging.  Since the sender does not ever get a chance to probe
255	   the network, it faces a difficult dilemma: If it picks up a too large
256	   initial window (IW), it may cause congestion inside network,
257	   resulting in large queue buildup or even packet drops; On the other
258	   hand, if the sender chooses a too conservative IW, it may lose the
259	   transmission opportunities in the first RTT and hurt short flow
260	   performance greatly, which could have finished in one round.  LDCP
261	   resolves this dilemma with a zero-RTT bandwidth acquisition
262	   algorithm, which allows the sender to fast start opportunistically
263	   without adverse impacts to on-going flows in stable stage.  In what
264	   follows, the design of the fast start algorithm is firstly described,
265	   afterwards an implementation using existing techniques is provided.

267	   Specifically, when a flow starts, the sender chooses a large enough
268	   Initial Window (e.g., BDP) and sends out as many packets as possible
269	   in the first RTT.  (For brevity, packets transmitted by a sender in
270	   the first RTT are denoted by first-RTT-packets, and packets
271	   transmitted in the congestion avoidance state (sec 2.2) are referred
272	   to as stable-stage-packets.)  By intention, first-RTT-packets are
273	   marked to have lower priority, while stable-stage-packets are marked
274	   to have high priority.  The two priority classes are controlled by
275	   two separate AQM policies.

277	   The first-RTT-packets are controlled by an AQM policy which simply
278	   drops packets if they are sent too aggressively, i.e., the queue
279	   exceeds a configured threshold K.  A network switch receives packets
280	   transmitted by the senders and puts them into a queue.  The queue
281	   distinguishes the first-RTT-packets and the stable-stage-packets
282	   according to the marks in the packets.  Because first-RTT-packets are
283	   with low priority, they will be dropped if the receiving queue size
284	   exceeds the configured threshold, while stable-stage-packets are
285	   enqueued as long as the queue size is below the queue capacity.
286	   Stable-stage-packets are dropped only when the queue is full.

288	   Senders and switches must cooperate.  The sender adds one mark to
289	   first-RTT-packets, and the switches identify first-RTT-packets using
290	   this mark; the sender adds another mark to stable-stage-packets, and
291	   the switches recognize packets sent beyond first RTT based on this
292	   mark.

294	   In summary, first-RTT-packets are sent with a large rate, and
295	   controlled by a separate AQM, to quickly acquire free bandwidth if
296	   there is; and low priority is used to protect on-going long flows if
297	   there is not.

299	   The above design can be implemented by leveraging a common feature on
300	   modern switches.  On a commodity switch, the WERD/ECN feature on an
301	   ECN-enabled queue works as follows.  ECN-capable packets (the two-bit
302	   ECN fields in IP headers are set to '01' or '10') are subject to ECN-
303	   marking, while ECN-incapable packets (the two-bit ECN fields in IP
304	   headers are set to '00') comply with WRED-dropping, i.e., ECN-
305	   incapable packets are dropped if the queue size exceeds a configured
306	   threshold K, as in Eq (3).

308	   if q < K, D(q) = no drop

310	   if q >= K, D(q) = drop --(3)

312	   The fast start algorithm makes use of such WERD/ECN feature to
313	   distinguish first-RTT and stable-stage packets: the sender sets the
314	   low priority first-RTT-packets to ECN-incapable, and sets the high
315	   priority stable-stage-packets to ECN-capable.  All the packets carry
316	   the same DSCP value and are mapped to the same priority queue on
317	   switches.  This queue is exclusively used by LDCP flows.  First-RTT-
318	   packets are either dropped or successfully pass the switch.  After
319	   the first RTT, the sender will count how many in-order packets has
320	   been acknowledged using ACKs and takes this as a good estimation of
321	   cw and enters the stable stage (sec 2.2).

323	   At first glance, the above design might look counterintuitive.  If we
324	   want to improve the performance of short flows, why should we drop
325	   their packets, instead of queuing them, even with a higher priority?
326	   The answer, however, lies in that if we allow blind burst in the
327	   first RTT, these first-RTT-packets could build excessively large
328	   queues, e.g., in a heavy incast scenario, and eventually these
329	   packets may still get dropped.  Therefore, an AQM policy is necessary
330	   to keep a low queue for the first RTT packets.  An additional benefit
331	   of the above strategy is to also give protection to flows in stable
332	   stage.  Those stable stage flows will experience seldom packet loss
333	   and constant performance even facing rather dynamic churns of short
334	   flows.  Finally, we comment that while we could put the first-RTT-
335	   traffic into a separate high priority queue, we believe it is not
336	   very necessary.  The reason is with LDCP's stable stage algorithm,
337	   the queue is already small at the switch, and thus the benefit for a
338	   separate priority queue may be limited.  Given the limited priority
339	   queues in Ethernet, it is a fair choice to map both into one priority
340	   queue while applying different WRED/ECN polices to control their
341	   behavior.

343	3.  Reference Implementation

345	   LDCP has been implemented with RoCEv2 on a programmable many-core NIC
346	   (referred to as uNIC). uNIC has hardware enhancements for RoCEv2
347	   packet (IB/UDP/IP stack) encapsulation and decapsulation.  The RoCEv2
348	   stack, as well as the congestion control algorithm, is implemented by
349	   microcode software on uNIC.

351	   We firstly add congestion window cw to RoCEv2.  RoCEv2 uses Packet
352	   Sequence Number (PSN) to ensure in-order delivery, but PSN can have
353	   jump overs if SEND/WRITE requests are interleaved with READ requests,
354	   and packets can have different sizes.  Therefore, it is difficult for
355	   cw to calculate the data size by PSN.  We add a new byte sequence
356	   number to packets - LDCP Sequence Number (LSN).  Packets belonging to
357	   READ, SEND/WRITE requests share the same LSN space, while packets of
358	   READ Responses have a separate LSN space, coded in a customized
359	   header.  The LDCP sliding window is based on LSN.

361	   In the stable stage of LDCP, cw is updated in the PAWA manner, and we
362	   program the uNIC to reply an ACK for each data packet it receives
363	   (uNIC is able to automatically coalesce ACKs if all cores are busy),
364	   which echoes back the CE mark if the data packet is marked.  Note
365	   that there is no ACK packets for Read Response in the RDMA protocol,
366	   we also program the uNIC to reply ACKs for Read Responses to slide
367	   the window.  Because out-of-order delivery of Read Responses can be
368	   discovered by the requester, and a repeat read request will be
369	   issued, it is not necessary to add a NAK protocol for Read Responses
370	   to ensure reliability.  The CE-Echo bits are coded in a customized
371	   header encapsulated in the ACK.

373	   As mentioned, packets in fast-start stage and stable stage are
374	   distinguished by ECN-capability.  If a new flow does not finish
375	   within the fast-start stage, it will transfer to the stable stage.
376	   There are two transition conditions: 1) Packet loss is detected in
377	   the fast-start stage, which indicates the network is overloaded. cw
378	   in stable stage is set to the number of packets that are
379	   accumulatively acknowledged before packet loss.  The lost packets are
380	   retransmitted using go-back-N. 2) When a full IW of packets have all
381	   been acknowledged.  (IW is set to BDP as suggested in sec 2.3.)  This
382	   condition is for flows that are larger than BDP and finish the fast-
383	   start stage without packet loss.  Since all packets sent during fast-
384	   start stage are confirmed, the stable stage algorithm now takes over
385	   and cw is set to BDP.  Note that acknowledging a BDP size of data
386	   needs two RTTs (the ACK for the IW-th packet returns at the end of
387	   second RTT), but sending BDP-sized data only requires one RTT.  After
388	   the end of the first RTT, the flow will not stop sending (because the
389	   ACK of the first packet will return to free the cw) but set the
390	   packets to ECN-capable ever since.

392	   There is a practical issue to consider: if all the packets sent out
393	   during the fast-start stage are dropped due to overloaded traffic,
394	   how can the sender quickly detect packet loss to avoid retransmission
395	   timeout?  We solve this problem by setting the IW-th packet to ECN-
396	   capable during the fast-start stage.  For messages smaller than IW
397	   packets, their last packets are set to ECN-capable.  These ECN-
398	   capable packets will not be dropped even the queue size exceeds K
399	   (unless queue buffer overflows) since they are subject to ECN-
400	   marking.  They pass through the switches and arrive at the receiver,
401	   allowing the receiver to examine if packet loss happens.

403	   All these implementation details are transparent to user
404	   applications.  LDCP supports all RDMA transport operations (READ,
405	   WRITE, SEND, with immediate data or not, ATOMIC), and thus has full
406	   support of IB verbs.

408	4.  IANA Considerations

410	   This document makes no request of IANA.

412	5.  Security Considerations

414	   To be added.

416	6.  References

418	6.1.  Normative References

420	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
421	              Requirement Levels", BCP 14, RFC 2119,
422	              DOI 10.17487/RFC2119, March 1997,
423	              <https://www.rfc-editor.org/info/rfc2119>.

425	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
426	              of Explicit Congestion Notification (ECN) to IP",
427	              RFC 3168, September 2001,
428	              <https://www.rfc-editor.org/info/rfc3168>.

430	6.2.  Informative References

432	   [Alizadeh2010data]
433	              Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel,
434	              P., Prabhakar, B., Sengupta, S., and M. Sridharan, "Data
435	              Center TCP (DCTCP)", ACM SIGCOMM 63-74, 2010.

437	   [CiscoGuide2012]
438	              "Cisco IOS Quality of Service Solutions Configuration
439	              Guide",  , 2012.

441	   [Floyd1993random]
442	              Floyd, S. and V. Jacobson, "Random early detection
443	              gateways for congestion avoidance", IEEE/ACM Transactions
444	              on networking 1, 4 (1993), 397-413, 1993.

446	   [Guo2016rdma]
447	              Guo, C., Wu, H., Deng, Z., Soni, G., Ye, J., Padhye, J.,
448	              and M. Lipshteyn, "RDMA over commodity Ethernet at scale",
449	              ACM SIGCOMM 202-215, 2016.

451	   [Kuzmanovic2005power]
452	              Kuzmanovic, A., "The power of explicit congestion
453	              notification", ACM SIGCOMM 61-72, 2005.

455	Authors' Addresses

457	   Huichen Dai (editor)
458	   Huawei
459	   Huawei Mansion, No.3, Xinxi Road, Haidian District
460	   Beijing
461	   China

463	   Email: daihuichen@huawei.com

465	   Binzhang Fu
466	   Huawei
467	   Huawei Mansion, No.3, Xinxi Road, Haidian District
468	   Beijing
469	   China

471	   Email: fubinzhang@huawei.com
472	   Kun Tan
473	   Huawei
474	   Huawei Mansion, No.3, Xinxi Road, Haidian District
475	   Beijing
476	   China

478	   Email: kun.tan@huawei.com