Re: [Dclc] planning for ietf90

"Fred Baker (fred)" <fred@cisco.com> Wed, 23 April 2014 22:02 UTC

Return-Path: <fred@cisco.com>
X-Original-To: dclc@ietfa.amsl.com
Delivered-To: dclc@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F41211A06DB for <dclc@ietfa.amsl.com>; Wed, 23 Apr 2014 15:02:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -109.473
X-Spam-Level:
X-Spam-Status: No, score=-109.473 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, MIME_8BIT_HEADER=0.3, RP_MATCHES_RCVD=-0.272, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 22qF9CqL38gX for <dclc@ietfa.amsl.com>; Wed, 23 Apr 2014 15:02:13 -0700 (PDT)
Received: from alln-iport-2.cisco.com (alln-iport-2.cisco.com [173.37.142.89]) by ietfa.amsl.com (Postfix) with ESMTP id 9FBED1A06D2 for <dclc@irtf.org>; Wed, 23 Apr 2014 15:02:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=7805; q=dns/txt; s=iport; t=1398290528; x=1399500128; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=gL3guZ1QBD+aaQP9HFlJJYQic7Vq4Sf76Um/pGlCLmQ=; b=JgDf47TTcwgKeovj/s8J2TwifsjNkVCKEKR5Td0PqztCCs5Hca1A8QDI vQ09O9VxzTXsfWPbmT6ARdXtNPStAfdjfb8XCJUby3GG/+wuvyWBwLZil i5hszqOJLlSNCPhOnh8KqLfhHbamLJ8tUzms7KhT0nSPkImTAhXIY2Q4r 0=;
X-Files: signature.asc : 195
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AioFAHc3WFOtJV2Z/2dsb2JhbABagwZPV4MPuW+HOoEcFnSCJQEBAQMBAQEBIEsLBQsCAQYCLRIDAgInCxQRAgQOBQ4GB4geCA2OLJwboxYTBIk2gyaBGhEBUAcKgmU1gRUEkHCBN4ZOklWDMYFyOQ
X-IronPort-AV: E=Sophos; i="4.97,914,1389744000"; d="asc'?scan'208"; a="38195056"
Received: from rcdn-core-2.cisco.com ([173.37.93.153]) by alln-iport-2.cisco.com with ESMTP; 23 Apr 2014 22:02:07 +0000
Received: from xhc-rcd-x13.cisco.com (xhc-rcd-x13.cisco.com [173.37.183.87]) by rcdn-core-2.cisco.com (8.14.5/8.14.5) with ESMTP id s3NM27Zv031563 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Wed, 23 Apr 2014 22:02:07 GMT
Received: from xmb-rcd-x09.cisco.com ([169.254.9.100]) by xhc-rcd-x13.cisco.com ([173.37.183.87]) with mapi id 14.03.0123.003; Wed, 23 Apr 2014 17:02:06 -0500
From: "Fred Baker (fred)" <fred@cisco.com>
To: 邓灵莉 <lingli.deng@139.com>
Thread-Topic: [Dclc] planning for ietf90
Thread-Index: AQHPXz+qH+T/a0sAHkONG+bkEkNdXA==
Date: Wed, 23 Apr 2014 22:02:05 +0000
Message-ID: <D67EB0A8-A93C-467B-BF72-BF491E40DE88@cisco.com>
References: <00c601cf5d26$bf6c7410$3e455c30$@com> <2b095354b337869-0000e.Richmail.00014356897851338097@139.com>
In-Reply-To: <2b095354b337869-0000e.Richmail.00014356897851338097@139.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator:
x-originating-ip: [10.21.99.193]
Content-Type: multipart/signed; boundary="Apple-Mail=_B8555500-B73B-4FAC-AC3F-B34FFF6972AE"; protocol="application/pgp-signature"; micalg="pgp-sha1"
MIME-Version: 1.0
Archived-At: http://mailarchive.ietf.org/arch/msg/dclc/yGWF9AUfJyGsz72BZrtG3VhU9M0
Cc: dclc <dclc@irtf.org>
Subject: Re: [Dclc] planning for ietf90
X-BeenThere: dclc@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Discussion of Data Center Latency Control <dclc.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/dclc>, <mailto:dclc-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/dclc/>
List-Post: <mailto:dclc@irtf.org>
List-Help: <mailto:dclc-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/dclc>, <mailto:dclc-request@irtf.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Apr 2014 22:02:16 -0000

I received your first note, and sent a note to ietf-action to see if we can get SpamAssassin tuned to appreciate you a little more…

Other comments inline.

On Apr 20, 2014, at 11:11 PM, 邓灵莉 <lingli.deng@139.com> wrote:

> Hi all,
>  
> It seemed that my last email did not get through, so I am resending it via another email account. Sorry if you get duplicate copies of the same content.
>  
> Looking forward to your comments and contribution.
>  
> Cheers,
> Lingli
> 
> 
> 
> 
> 
> <11112014031114574914m7.jpg>	
> 邓灵莉
> 职务:	研究员/Researcher
> 公司:	中国移动研究院/China Mobile Research Institute
> 地址:	北京宣武门西大街32号/32 Xuanwumenxi Ave, Beijing
> 邮箱:	lingli.deng@139.com
> 手机:	13810597148
> 邮编:	100053
> 日期:	2014年04月21日 星期一
>  
> ------------------ 原始邮件 ------------------
> 发件人: "邓灵莉/Lingli Deng" <denglingli@chinamobile.com>;
> 发送时间: 2014-04-21 13:58:41
> 收件人: "lingli.deng" <lingli.deng@139.com>;
> 抄送: (无);
> 主题: 转发: planning for ietf90
>  
>  
> 发件人: 邓灵莉/Lingli Deng [mailto:denglingli@chinamobile.com] 
> 发送时间: 2014年4月19日 10:34
> 收件人: 'dclc@irtf.org'
> 主题: planning for ietf90
>  
> Hi all,
>  
> From my impression, I believe people showed interest in the following topics, and would like to invite further discussion as we start planning for ietf 90.
>  
> 1, Production data sharing: 
> It seems it is generally agreed that it would be both highly desirable and generally hard to get real data from production DCs. 
> I suspect that a security personnel would naturally tend to say “NO” if he is asked to share a piece of raw data without knowing the risk it bears. Therefore, It may help if he is provided with a concrete list of aggregated metric/parameters, which is intended to outline the “vague big picture” rather than to capture “every sensitive detail”.
> Hence, I would suggest that we start working on a more concrete “specification” about what specific data would be helpful based on the experience from the research community on working on a general problem.
> Take the incast problem for instance, the distribution of flow duration/volume traversing a given bottleneck link may be of interest. What do you think?

I agree that a specification of “what constitutes incast” might be useful. Part of that will involve, for example, the worst case queue depth that happens when an incast event occurs, and how many communications of what kind are involved. If we ask for a literal traffic trace, as you say, the risk is high and value is low. However, if we were to provide some sort of filter that a set of traffic traces could be fed through might come up with a usable summary of the information. Suppose, for example, that we were able to place a wireshark on the links into the bottleneck (Distribution or top-of-rack) switch leading to a requesting host and leading from the TOR to the host:

                     Rack
  +------------+
  |Distribution|    +----+
  |  Switch    +----+TOR |
  +------------+    +----+
                    |    |
                    |    |
                    +----+
                    |Host|
                    +----+
                    |    |
                    |    |
                    |    |
                    +----+

and capture the seconds before to after the event. We should be able to describe that at the time of the event traffic was arriving into the rack at <some byte/packet rate>, and we saw <description of stream of requests> followed by <description of stream of responses>, followed by a return to ambient traffic. What I would expect we would observe is that the traffic before used some percentage of the link in a manner common to LANs, there was a crunch of data followed by some number of aftershocks as TCP did timeout retransmissions, and when the system returned to ambient behavior the competing traffic had largely been bludgeoned off the link and took a few seconds to recover. We should be able to describe that using anonymized addresses and not reporting potentially-private data.

I would suggest that the wiresharks be time-synchronized using IEEE 1588a if possible.

Would it make sense to post an internet draft describing this kind of thing so that anyone could comment and perhaps execute it?

> 2, Problem statement/analysis
> I believe it would be of great value to work on further exploring and better understanding the potential problems at least for the early phase of DCLC.
> It is essential to have merge the understanding from DC operators (use-cases/expectations), understanding from general research (e.g. exploration of the factors contributing to a given problem and how it would affect the expectations) and the understanding from device manufactures (e.g. device features that triggers the contributing factors and affects operator’s expectation).
> From a very high level, three types of use-cases (i.e. delay-sensitive distributed applications, virtualization, and multi-tenancy) and two types of problems (i.e. incast and bufferbloat) have been mentioned in our previous discussion. I would like to invite more input and concrete work on this direction.

Agreed

> 3, Solutions, of course
> Original ideas/on-going work/experience on application of existing technologies are all welcome. Comparison or general reasoning among different solutions would also be appreciated.

Agreed

> 4, research/experimental tools
> Original ideas/on-going work/experience on general simulation platform and testing guidelines (such as testing methodology and benchmarks) for DCLC relevant scenarios.
> (We have not been discussing this on the list, but as I am planning testing myself, I find it quite desirable and believe it would be a common call.)

Agreed

> These are the ideas from my side, any other thoughts or suggestions?
>  
> Looking forward to your feedback and contribution.
>  
> Cheers,
> Lingli
> _______________________________________________
> Dclc mailing list
> Dclc@irtf.org
> https://www.irtf.org/mailman/listinfo/dclc