Re: [Idnet] Summary 20170814 & IDN dedicated session call for case
Stenio Fernandes <sflf@cin.ufpe.br> Fri, 18 August 2017 17:15 UTC
Return-Path: <steniofernandes@gmail.com>
X-Original-To: idnet@ietfa.amsl.com
Delivered-To: idnet@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7238B1320B5 for <idnet@ietfa.amsl.com>; Fri, 18 Aug 2017 10:15:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.399
X-Spam-Level:
X-Spam-Status: No, score=-2.399 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.199, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uq9WVwKIpgwm for <idnet@ietfa.amsl.com>; Fri, 18 Aug 2017 10:14:59 -0700 (PDT)
Received: from mail-yw0-x22c.google.com (mail-yw0-x22c.google.com [IPv6:2607:f8b0:4002:c05::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 56965120721 for <idnet@ietf.org>; Fri, 18 Aug 2017 10:14:59 -0700 (PDT)
Received: by mail-yw0-x22c.google.com with SMTP id p68so62614696ywg.0 for <idnet@ietf.org>; Fri, 18 Aug 2017 10:14:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=E5+KR5TwsF+pDiMFTtxSLDV4AXeI+iQZiaXcVlD1PlQ=; b=FfRr7XhZ6JfI3yWcX9KdXs8WPTp5OzEHmOE9w9Y/M+sLrx8cDcB8c7Nsa4XtA+UKRs Rbwhnd5yYpoOpHkHLY2aR7wequP61x/9fFWtX4sTwSXRJXoOQzNxHRgFuk6D0V6XhF3q 1X94ZDsz1y7g97jStPUc8xNRyvTNhhjkZsV4H9lxNzIVHNe9PjNlcR1UKG+q/Se/g/7z GWCGUQK5rKL28JY5vN+65OTkUJZ8gLd05XGsq7d68QGt1esRPBa5HyzS415mMYeldg4D DVIPPFrrZtOMPiT6TLcfTbfMreSDP0JQcBu0vztEVamyiXo/7KAy3I/HA7ArI6SvQIAr N3nQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=E5+KR5TwsF+pDiMFTtxSLDV4AXeI+iQZiaXcVlD1PlQ=; b=oB2a6XnRo+1itxk9xmlTV2ZZEB1WDlIii/SidtCXJx/7BxH2TyYuNEaERwtqfIxFjd x/zfvaMCkUi0UJVFgme6yEfWH+ETJA+xmkHAQTGLFDWz1XdiN5CE5a5EiDMPAU373WIB QQ7FJ08yUqxKplPwsgNJbnNPbGNNzp1NewrwleOPwTDSLAtIpYoOAmMTz+2GIHepOMyg qHHvRDYvcQ7dS2p1aZKgqT5RQJXkGMDCYR9mK3XwZgvdtb5SkleANIjSm4MwxjmiCU62 XHDtLlMMR1O6A7zF3juWQHrMSo9L3erHo0aTH2HiIUiGwSvaSHHgAbWLYwIv2ZM/oi94 veUA==
X-Gm-Message-State: AHYfb5gg0gbxQROZ+n8H+wEIIGtswiEGu7WIQTt48QuBmOs2U1pHoZ5t kNeQUPiC/RqreYGXds1xf5G6q8Qv1BcnGr0=
X-Received: by 10.37.110.134 with SMTP id j128mr7893778ybc.280.1503076498377; Fri, 18 Aug 2017 10:14:58 -0700 (PDT)
MIME-Version: 1.0
Sender: steniofernandes@gmail.com
Received: by 10.37.104.74 with HTTP; Fri, 18 Aug 2017 10:14:17 -0700 (PDT)
In-Reply-To: <6AE399511121AB42A34ACEF7BF25B4D298461D@DGGEMM505-MBX.china.huawei.com>
References: <6AE399511121AB42A34ACEF7BF25B4D2982F34@DGGEMM505-MBX.china.huawei.com> <78A2745BE9B57D4F9D27F86655EB87F92599490E@SJCEML701-CHM.china.huawei.com> <CAPrseCrzJUfo==ATt9TZeqtXfNvR4XkPNTvVO5G_8TBn_pBnRg@mail.gmail.com> <6AE399511121AB42A34ACEF7BF25B4D298461D@DGGEMM505-MBX.china.huawei.com>
From: Stenio Fernandes <sflf@cin.ufpe.br>
Date: Fri, 18 Aug 2017 13:14:17 -0400
X-Google-Sender-Auth: k80GuvjslS-wfHOMdFJ9RAmL41c
Message-ID: <CAPrseCr8r-nHcqJs2V7gEt7jFpqWZ3W0fqVm1TKr4zo7=MeVGQ@mail.gmail.com>
To: yanshen <yanshen@huawei.com>
Cc: Haoyu song <haoyu.song@huawei.com>, "idnet@ietf.org" <idnet@ietf.org>
Content-Type: multipart/alternative; boundary="001a1148b9c8c192c805570a44c6"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idnet/gFwj-fY08SgshlEKwtEaWhKcgzo>
Subject: Re: [Idnet] Summary 20170814 & IDN dedicated session call for case
X-BeenThere: idnet@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "The IDNet \(Intelligence-Defined Network\) " <idnet.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idnet>, <mailto:idnet-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idnet/>
List-Post: <mailto:idnet@ietf.org>
List-Help: <mailto:idnet-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idnet>, <mailto:idnet-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Aug 2017 17:15:04 -0000
Hi Yansen et al. Here is my take on the anomaly detection use case (2.6). I know that I was supposed to write just 'rough thoughts and requirements', but I decided to write in advance regarding the upcoming tasks (sep/oct). ====================================================================== Definitions: First, it is important to understand that the scientific literature allows different names for anomaly detection, such as outlier detection, novelty detection, noise detection, deviation detection, or mining exception. This not so strict naming also brings a number of definitions for what could be considered as an outlier or anomaly. A more general definition for an outlier is an observation that is somewhat inconsistent when compared to the remainder of the set of observations. In the specific case of a network anomaly, such observation may cause network failures and performance problems such as congestions and denial of services (DOS). Anomalies can be generally classified into three categories, namely point, contextual, and collective. Point anomaly is related to a single observation whereas collective anomaly is related to a series of observations. For instance, a SYN Flood attack can be considered a collective anomaly since a single TCP SYN segment is valid and is not considered a point anomaly. Contextual anomaly is the interpretation whether the point or collective anomaly is, in fact, an anomaly given a proper context. Applications: Abnormal behavior from packets or streams of packets (flows) requires precise and, in some cases, quick detection so that the network could react and take appropriate measures to mitigate its short and long term effects. Ideally, the network must be intelligent enough to automatically learn what is normal traffic so it could adapt to any abnormal traffic patterns, including zero-day attacks. Of course, not all anomalies come from intentional attacks. Abnormal behavior could also come from misconfiguration or malfunctions in the network. Data / Features: Most techniques for anomaly detection use features extracted from transport or network layer data, mainly due to the widespread adoption of IPFIX/NetFlow on routers/switches. It is possible to create new variables (a.k.a. Feature Engineering in the data mining/machine learning lingo) from the raw data, such as packet or flow inter-arrival times. Depending on the measurement process one could also include data from other Internet layers to make the detection more accurate. One must be aware that scalability is always a concern when dealing with massive amount of data. The types and characteristics of the input data often limit the choices of techniques that could be used for anomaly detection. Some techniques require labeled data (i.e., using prior knowledge to identify an observation as normal or abnormal), which in most cases requires enormous processing efforts. Techniques: Methods for anomaly detection come from several fields and their subdisciplines, such as Statistics, Machine Learning, Data Mining, Information Theory, Spectral Theory, and the like. Therefore, there are a number of techniques to handle (i.e., detection and/or removal) abnormal observations. As the main scope and interest of IDNET are on techniques that can learn from the incoming network traffic and events, the behavioral-based anomaly detection ones seem the best fit. Behavioral-based anomaly detection methods are usually classified as supervised, semi-supervised, or unsupervised, depending on the availability of labeled data for the training phase. Supervised and semi-supervised learning require labeled data whereas unsupervised learning is able to work with unlabeled data. Unsupervised anomaly detection techniques create initially a region (e.g., a cluster in an n-dimensional hyperspace) that represents the limits of a normal behavior so that any observation beyond those bounds is considered an anomaly. They can easily (automatically) adapt to changes in the incoming network traffic/events. As far as we are concerned to recent systems for anomaly detection, there is a clear trending on hybrid techniques (i.e., the combination of two or more techniques) to overcome well-known limitations of each individual class of techniques, such as low precision/recall, high processing overhead, and the like. This means in general building a system with two or more phases that combines supervised, semi-supervised, and unsupervised learning in sequence. The outputs of anomaly detection techniques can be scores and/or labels. Of course, the objectives of the classification problem (i.e., either single class or multiclass) define their outputs. Therefore, given the type of classification problem, a number of methods can be applied, such as the ones based on unsupervised clustering. Challenges: Current challenges for network anomaly detection includes i) dealing with high dimensional data, class imbalance, and noise, ii) performing fast and accurate feature engineering, iii) ensuring cluster homogeneity, iv) lowering false alarm rate, and v) handling sequential, spatial, and graph data simultaneously. ====================================================================== Cheers, Stenio On Mon, Aug 14, 2017 at 10:37 PM, yanshen <yanshen@huawei.com> wrote: > Hi Haoyu, > > Agree. These two crucial cases in Network Management are what we are > focusing on now. Since we plan to organize a dedicated session in NMRG, all > the discussion will converge to the area of Network Management before Nov. > > Just expect Stenio's output few days later : ) > > Yansen > > > > -----Original Message----- > > From: steniofernandes@gmail.com [mailto:steniofernandes@gmail.com] On > > Behalf Of Stenio Fernandes > > Sent: Tuesday, August 15, 2017 2:09 AM > > To: Haoyu song <haoyu.song@huawei.com> > > Cc: yanshen <yanshen@huawei.com>; idnet@ietf.org > > Subject: Re: [Idnet] Summary 20170814 & IDN dedicated session call for > case > > > > Haoyu, > > > > I'm working on the anomaly detection use case and will send it to the > list this > > week. > > > > Stenio > > > > On Mon, Aug 14, 2017 at 12:47 PM, Haoyu song <haoyu.song@huawei.com> > > wrote: > > > Yansen, > > > > > > > > > > > > I see two key use cases are missing in the current list: root cause > > > analysis and anomaly detection. Those two are likely to use ML-based > > > solutions and the first one has already received a lot of research. > > > > > > > > > > > > Haoyu > > > > > > > > > > > > From: IDNET [mailto:idnet-bounces@ietf.org] On Behalf Of yanshen > > > Sent: Sunday, August 13, 2017 8:36 PM > > > To: idnet@ietf.org > > > Subject: [Idnet] Summary 20170814 & IDN dedicated session call for > > > case > > > > > > > > > > > > Dear all, > > > > > > > > > > > > Here is a summary and some index (2017.08.14). Till now, whatever the > > > case is supported or not, I tried to organize all the content and keep > > > the core part. It is still welcome to contribute and discuss. > > > > > > > > > > > > If I miss something important, please let me know. Apologized in > advance. > > > > > > > > > > > > Yansen > > > > > > > > > > > > > > > > > > --------- Roadmap --------- > > > > > > ***Aug. : Collecting the use cases (related with NM). Rough thoughts > > > and requirements > > > > > > Sep. : Refining the cases and abstract the common elements > > > > > > Oct. : Deeply analysis. Especially on Data Format, control flow, or > > > other key points > > > > > > Nov.: F2F discussions on IETF100 > > > > > > --------- Roadmap End --------- > > > > > > > > > > > > > > > > > > 1. Gap and Requirement Analysis > > > > > > 1.1 Network Management requirement > > > > > > 1.2 TBD > > > > > > 2. Use Cases > > > > > > 2.1 Traffic Prediction > > > > > > Proposed by: yanshen@huawei.com > > > > > > Track: > > > https://www.ietf.org/mail-archive/web/idnet/current/msg00131.html > > > > > > Abstract: Collect the history traffic data and > > > external data which may influence the traffic. Predict the traffic in > > > short/long/specific term. Avoid the congestion or risk in previously. > > > > > > > > > > > > 2.2 QoS Management > > > > > > Proposed by: yanshen@huawei.com > > > > > > Track: > > > https://www.ietf.org/mail-archive/web/idnet/current/msg00131.html > > > > > > Abstract: Use multiple paths to distribute the > > > traffic flows. Adjust the percentages. Avoid congestion and ensure QoS. > > > > > > > > > > > > 2.3 Application (and/or DDoS) detection > > > > > > Proposed by: aydinulas@gmx.net > > > > > > Track: > > > https://www.ietf.org/mail-archive/web/idnet/current/msg00133.html > > > > > > Abstract: Detect the application (or attack) from > > > network packets (HTTPS or plain) Collect the history traffic data and > > > identify a service or attack (ex: Skype, Viber, DDoS attack etc.) > > > > > > > > > > > > 2.4 QoE Management > > > > > > Proposed by: albert.cabellos@gmail.com > > > > > > Track: > > > https://www.ietf.org/mail-archive/web/idnet/current/msg00137.html > > > > > > Abstract: Collect low-level metrics (SNR, latency, > > > jitter, losses, etc) and measure QoE. Then use ML to understand what > > > is the relation between satisfactory QoE and the low-level metrics. As > > > an example learn that when delay>N then QoE is degraded, but when > > > M<delay<N then QoE is satisfactory for the customers (please note that > > > QoE cannot be measured directly over your network). This is useful to > > > understand how the network must be operated to provide satisfactory > QoE. > > > > > > > > > > > > 2.5 (Encrypted) Traffic Classification > > > > > > Proposed by: jerome.francois@inria.fr; > > > mskim16@etri.re.kr > > > > > > Track: [Jerome] > > > https://www.ietf.org/mail-archive/web/idnet/current/msg00141.html ; > > > [Min-Suk Kim] > > > https://www.ietf.org/mail-archive/web/idnet/current/msg00153.html > > > > > > Abstract: > > > > > > [Jerome] collect flow-level traffic > > > metrics such as protocol information but also meta metrics such as > > > distribution of packet sizes, inter-arrival times... Then use such > > > information to label the traffic with the underlying application > > > assuming that the granularity of classification may vary (type of > > > application, exact application name, > > > version...) > > > > > > [Min-Suk Kim] continuously collect packet > > > data, then applying learning process for traffic classification with > > > generating application using deep learning models such as CNN > > > (convolutional neural > > > network) and RNN (recurrent neural network). Data-set to apply into > > > the models are generated by processing with features of information > > > from flow in packet data. > > > > > > > > > > > > 2.6 TBD > > > > > > > > > > > > 3. Data Focus > > > > > > 3.1 Data attribute > > > > > > 3.2 Data format > > > > > > 3.3 TBD > > > > > > > > > > > > 4. Support Technologies > > > > > > 4.1 Benchmarking Framework > > > > > > Proposed by: pedro@nict.go.jp > > > > > > Track: > > > https://www.ietf.org/mail-archive/web/idnet/current/msg00146.html > > > > > > Abstract: A proper benchmarking framework > > comprises > > > a set of reference procedures, methods, and models that can (or better > > > *must*) be followed to assess the quality of an AI mechanism proposed > > > to be applied to the network management/control area. Moreover, and > > > much more specific to the IDNET topics, is the inclusion, dependency, > > > or just the general relation of a standard format enforced to the data > > > that is used (input) and produced > > > (output) by the framework, so a kind of "data market" can arise > > > without requiring to transform the data. The initial scope of > > > input/output data would be the datasets, but also the new knowledge > > > items that are stated as a result of applying the benchmarking > > > procedures defined by the framework, which can be collected together > > > to build a database of benchmark results, or just contrasted with > > > other existing entries in the database to know the position of the > > > solution just evaluated. This increases the usefulness of IDNET. > > > > > > > > > > > > 4.2 TBD > > > > > > > > > _______________________________________________ > > > IDNET mailing list > > > IDNET@ietf.org > > > https://www.ietf.org/mailman/listinfo/idnet > > > > > > > > > > > -- > > Prof. Stenio Fernandes > > CIn/UFPE > > http://www.steniofernandes.com > -- Prof. Stenio Fernandes CIn/UFPE http://www.steniofernandes.com
- [Idnet] Summary 20170814 & IDN dedicated session … yanshen
- Re: [Idnet] Summary 20170814 & IDN dedicated sess… Haoyu song
- Re: [Idnet] Summary 20170814 & IDN dedicated sess… Oscar Mauricio Caicedo Rendon
- Re: [Idnet] Summary 20170814 & IDN dedicated sess… Oscar Mauricio Caicedo Rendon
- Re: [Idnet] Summary 20170814 & IDN dedicated sess… Stenio Fernandes
- Re: [Idnet] Summary 20170814 & IDN dedicated sess… yanshen
- Re: [Idnet] Summary 20170814 & IDN dedicated sess… yanshen
- Re: [Idnet] Summary 20170814 & IDN dedicated sess… Hesham ElBakoury
- Re: [Idnet] Summary 20170814 & IDN dedicated sess… Haoyu song
- Re: [Idnet] Summary 20170814 & IDN dedicated sess… Hesham ElBakoury
- Re: [Idnet] Summary 20170814 & IDN dedicated sess… yanshen
- Re: [Idnet] Summary 20170814 & IDN dedicated sess… Hesham ElBakoury
- Re: [Idnet] Summary 20170814 & IDN dedicated sess… yanshen
- Re: [Idnet] Summary 20170814 & IDN dedicated sess… Albert Cabellos
- Re: [Idnet] Summary 20170814 & IDN dedicated sess… Stenio Fernandes