I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair.  Please treat these comments just
like any other last call comments.

For more information, please see the FAQ at

<https://trac.ietf.org/trac/gen/wiki/GenArtfaq>.

Document:  draft-ietf-i2nsf-nsf-monitoring-data-model-12
Reviewer:  Dale R. Worley
Review Date:  2021-11-28
IETF LC End Date:  2021-12-01
IESG Telechat date:  not known

Summary:

    This draft is on the right track but has open issues, described in
    the review.  It is clear that all of these issues can be fixed
    appropriately, but they need to be fixed before publication.

Major issues:

This document presents a data model for data being passed between
various I2NSF entities.  It appears that the author has a thorough
understanding of the I2NSF architecture and so has made various
references to it in the document.  But since the data model definition
does not depend on the overall architecture, the document should be
revised to either (1) remove unnecessary references to the overall
architecture, (2) segregate them in ways that show they are not needed
to understand the data model, or (3) carefully referenced back to the
documents that define them.

There are also a few points where there seems to be technical issues
regarding the definitions of specific data items.

Details:

1.  Introduction

Why do we have both "administrative entities" and "Security
Controller" here, and "NSF data collector" in section 3?  Naively, I
would expect that in regard to the definition of the data model
presented by the "server side", all "client side" processes would be
considered an amorphous group covered by one generic term.

2.  Terminology

   This document uses the terminology described in [RFC8329].

Given that RFC 8329 doesn't define the terminology, it would be better
to expand on this to "This document uses the terminology described in
[RFC8329], much of which is defined in the I2NSF terminology document
[I2NSF-TERMS]."  Indeed, since I2NSF-TERMS is
draft-ietf-i2nsf-terminology-05, presumably part of the same effort as
this document, why is RFC 8329 being mentioned?

--

There seems to be trouble with terms used in this document.  Some of
them are mentioned in section 2.2 of RFC 8329, which simply refers to
I2NSF-TERMS.  Others (e.g. "I2NSF Record") seem like they should be
listed in RFC 8329, but aren't, and seem to be entirely undefined.
Some of those terms appear in text that may as well be omitted from
this document.  Ideally, the specialized vocabulary in this document
should be listed in this section and proper definitions or references
provided for them.

3.  Use Cases for NSF Monitoring Data

   *  The security administrator with I2NSF User can configure a policy

"I2NSF User" is not listed in RFC 8329.  Also, the placement of "with
I2NSF User" suggests that that phrase is some aspect of "security
administrator", and you might want to say "The I2NSF User that is the
security administrator ...".  OTOH, if "with I2NSF User" is some
aspect of "can configure", it should probably be placed after "can
configure".  (Is an I2NSF User a type of "user", as that word is
normally used?)

4.  Classification of NSF Monitoring Data

   This enables security administrators to assess the state of
   the networks and in a timely fashion.

Likely should delete "and".

   In essence, these types of monitoring data can be leveraged to

Probably can be simplified to "This monitoring data ...".

   As with I2NSF components, every generic system entity can include a
   set of capabilities that creates information about some context with
   monitoring data (i.e., monitoring information), composition,
   configuration, state or behavior of that system entity.

I am sure this could be clarified if it was simplified.  I think the
meaning is "Every system entity creates information about some context
with defined I2NSF monitoring data, and so every entity can be an
I2NSF component."

   This
   information is intended to be provided to other consumers of
   information and in the scope of this document, which deals with NSF
   monitoring data in an automated fashion.

I think this means "This information is can be consumed by other I2NSF
components."

4.1.  Retention and Emission

   I2NSF Event:  I2NSF Event is defined as an important occurrence over
      time,

This should be "an important occurrence at a particular time,".  "over
time" means that there is an extended period of time over which the
event occurs, but I'm sure that I2NSF Events specify only a single
instant for "when it happened".

      Records can be
      continuously processed by a system entity as an I2NSF Producer and

Up until this point, the description of "record" could apply to any
database system.  But I suspect that the intended semantics are that
Records are generated at particular instants (and are unchanging
afterward), and thus a set of records has an ordering in time based on
when they are generated.  This is the fundamental characteristic of a
"log file".  In particular, a database of users does not have this
property but a database of user activities does.  If Record is
intended to be constrained to this situation, that should be stated
explicitly.

   I2NSF Counter:  [...] When an NSF data collector asks for the value
      of a counter to it, a system entity emits

Note this sentence is incomplete in the draft.

It might be valuable to note that an I2NSF Counter can be an integer
approximation of a value that is actually continuous.  (All of the
examples that are given are values that are intrinsically integers.)
Perhaps add as the 3rd sentence "Other examples are integer
approximations to continuous values, such as a processor temperature
measured in tenths of a degree or the percentage of a disk that is
used."

Indeed, the first sentence of this paragraph says "continuous value
changes", despite that all of the examples are integer values that
cannot change continuously.  Perhaps a better phrasing is "a specific
representation of an information element whose value changes very
frequently."

   The retention of I2NSF monitoring information listed in Section 9 may

It seems like "in Section 9" could/should be omitted.

4.2.  Notifications, Events, and Records

   In consequence, an I2NSF Event is specified to
   trigger an I2NSF Policy Rule.  Such an I2NSF Event is defined as any
   important occurrence over time in the system being managed, and/or in
   the environment of the system being managed,

This text provides two definitions of "I2NSF Event" which aren't quite
the same.  One is "anything that triggers an I2NSF Policy Rule", a
purely technical face.  The other is "any important occurrence over
time", which is a human fact.  The two definitions coincide only if
the policy rules exactly cover everything that is "important".  This
needs to be tracked back to the source definition of "I2NSF Event" and
these sentences revised to match it.

   which aligns well with
   the generic definition of Event from [RFC3877].

Strictly, this clause says that "an I2NSF Event" "aligns well with the
generic definition of Event", but I think you mean that the *concept*
of an I2NSF Event aligns etc.

4.3.  Unsolicited Poll and Solicited Push

   Ideally, an I2NSF User is accessing every relevant
   information about the I2NSF Component and is emitting I2NSF Events to
   an NSF data collector (e.g., Security Controller) in a timely manner.

OK, what *is* the model of operations?  In this sentence, it seems
that an "I2NSF User" is a process that accesses (by some method)
information about (in?) an I2NSF Component, and then emits (via I2NSF
Events) that data to an NSF data collector.  But none of that is laid
out in the preceding sections.  Indeed "I2NSF User" is not defined,
though here it doesn't sound like the usual definition of "user".

   The actual mechanism
   implemented by an I2NSF Component is out of the scope of this
   document.

In this sentence, it sounds like the Component is the thing that sends
the data, whereas just above, it is the User.

   In some
   cases, the collection of information has to be conducted via a login
   mechanism provided by a system entity.

What is the use of the terms solicited, unsolicited, poll, and push
here?  Usually, the data source is considered a "server", and the
consumer is a "client".  If the client makes a request to the server,
that is called "solicited" "pulling", and if it happens periodically,
it is called "polling".  Whereas if the server initiates an
interaction to send data, that is called "unsolicited" "pushing".
Terminology in this draft doesn't seem to use those conventions, but
it doesn't tell what conventions it does use.

5.  Basic Information Model for Monitoring Data

   *  vendor-name: The name of the NSF vendor.

Generally, the minimum information needed to identify how to interact
with a device is (1) vendor name, (2) device model name/number, (3)
software version identifier.  Vendor name alone isn't particularly
useful.

6.  Extended Information Model for Monitoring Data

   This section covers the additional information associated with the
   system messages.

What is "system messages"?  The term has not been defined or mentioned
previously.  Is it a special class of "messages"?  Indeed, the term
seems to not be used elsewhere.

   The extended information model is only for the
   structured data such as events, record, and counters.  Any
   unstructured data is specified with the basic information model only.

There has been no previous discussion of "structured"
vs. "unstructured" data.

--

The final sentence of this section suggests that the dampening type
can be set by the user of the monitoring system.  But all occurrences
of dampening-type in the below descriptions say either
"dampening-type: on-repetition" or "dampening-type: none", which
implies that for each type of alarm, only a particular value of
dampening-type is allowed.

Also "dampening-type: none" is invalid (as it is undefined in the
model in section 9) and "dampening-type: no-dampening" is probably
intended.

6.1.1.  Memory Alarm

   *  severity: The severity of the alarm such as critical, high,
      medium, and low.

"such as" implies that there may be other values, whereas section 5
states that there are exactly 4 severities and section 9 agrees.  You
need to decide what the rule is and align all descriptions of
"security" data to that rule.

6.2.2.  Configuration Change

Should there be components of the event that describe what change was
made to the configuration?  The examples for "message" only
distinguish creating a new configuration vs. modifying an existing
configuration, but that information seems to me to be inadequate for
any significant security monitoring.

6.2.3.  Session Table Event

   The following information should be included in a Session
   Table Event:

Is "session table event" a known term of art?

6.2.4.  Traffic Flows

   *  arrival-rate: Arrival rate of packets of the traffic flow.

Most data for "packets per second" have a twin datum for "bytes per
second".  Should there be an "arrival-speed" datum for traffic flows?

6.3.1.  DDoS Detection

   *  attack-type: Any one of SYN flood, ACK flood, SYN-ACK flood, FIN/
      RST flood, TCP Connection flood, UDP flood, ICMP flood, HTTPS
      flood, HTTP flood, DNS query flood, DNS reply flood, SIP flood,
      SSL flood, and NTP amplification flood.

The module definition gives a fixed set of attack-types, but given
that there are 14 described types, it seems likely that additional
types will be defined.  Some extension mechanism needs to be used,
either a catch-all extension type or recogition that users will define
additional types.

   *  end-time: The time stamp indicating when the attack ended.  If the
      attack is still undergoing when sending out the alarm, this field
      can be empty.

The Yang definition seems to make this field mandatory and provide no
null value.  Perhaps making it optional in the model is the best way
of modeling the desired semantics.

6.3.2.  Virus Event

It's not clear whether this event is for when a virus is found within
a packet flow or for when it is found within a host system.  Are there
two different types of virus events for these?  Or does each type use
a subset of the fields of one common event schema?

   *  virus: Type of the virus. e.g., trojan, worm, macro virus type.

It seems like this datum should be named "virus-type".  Also, it seems
unlikely that virus types form a definitive taxonomy, so this field
should be considered less important than "virus-name" (which is likely
to be a key into a database of known viruses).

6.3.3.  Intrusion Event

   *  event-name: The name of the event. e.g., detection-intrusion.

Why is there not a single, definitive event-name value for intrusion
events?  Or was "i.e." meant rather than "e.g."?

   *  raw-info: The information describing the flow triggering the
      event.

Given there are 8 defined fields that describe the flow, what
additional information can raw-info contain?  Also, the semantics of
this raw-info is different from that in 6.3.2.

Indeed, there is some disalignment in the description of this field:

In section 6, raw-info is listed for:
    6.3.2 Virus Event
    6.3.3 Intrustion event

In section 8, raw-info is listed for:
    i2nsf-nsf-detection-ddos
    i2nsf-nsf-detection-virus
    i2nsf-nsf-detection-intrusion
    i2nsf-nsf-detection-web-attack
    i2nsf-nsf-detection-voip-volte

In the model in section 9, raw-info is listed as a component of:
    i2nsf-nsf-detection-ddos
    grouping i2nsf-nsf-event-type-content, which is used in
        i2nsf-nsf-detection-virus
        i2nsf-nsf-detection-intrusion
        i2nsf-nsf-detection-web-attack
	i2nsf-nsf-detection-voip-volte

Only in section is raw-info described as "describing the flow
triggering the event".

6.3.4.  Web Attack Event

   *  event-name: The name of event. e.g., detection-web-attack.

Why is there not a single, definitive event-name value for intrusion
events?  Or was "i.e." meant rather than "e.g."?

   *  cookies: The HTTP Set-Cookie header field of the response.

I would think the cookies header in the request would be of more
interest than the cookies header in the response.

6.3.5.  VoIP/VoLTE Event

This event type has no event-type field.  Is that correct?

6.4.3.  User Activity Log

6.4.1 and 6.4.3 are only weakly aligned with each other, despite that
they describe login and activities of two types of users (administrators
and ordinary users).  Should these types be unified, or at least their
fields compared to better align them?

6.7.2.  Policy Hit Counter

   *  hit-times: The hit times that the security policy matches the
      specified traffic.

Given the Yang definition, I think the wording you want here is "The
number of times that the security policy ...".

7.  NSF Monitoring Management in I2NSF

It's not clear to me that any of this is needed for the definition of
the data model.  It seems more to be a higher-level description of the
entire I2NSF system, but the details of the data model aren't directly
relevant to the higher-level description (as long as the data model
provides the required fields) and that the data model isn't directly
affected by the higher-level I2NSF system.

9.  YANG Data Model

        The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL',
        'SHALL NOT', 'SHOULD', 'SHOULD NOT', 'RECOMMENDED',
        'NOT RECOMMENDED', 'MAY', and 'OPTIONAL' in this
        document are to be interpreted as described in BCP 14
        (RFC 2119) (RFC 8174) when, and only when, they appear
        in all capitals, as shown here.

This module contains the full RFC 8174 text, but the only use of it is
the instance of MUST in nsf-name.

10.  I2NSF Event Stream

   The following example

It seems like this should start a new paragraph.  The preceding text
is an overview of the event stream, but the following text is a single
example.  The example is not what I would expect; it is not an event.
I think this text would better state the purpose:

   The following example XML shows the capabilities of the event
   streams generated by an NSF (e.g., "NETCONF" and "I2NSF-Monitoring"
   event streams) for the subscription of an NSF data collector.  The
   XML examples in this document ...

--
             <replayLogCreationTime>
               2021-04-29T09:37:39+00:00
             </replayLogCreationTime>

It's not clear to me "2021-04-29T09:37:39+00:00" is a value that is
applicable to all NSF event streams.

15.  Contributors

   Chaehong Chung Department of Electronic, Electrical and Computer
   Engineering Sungkyunkwan University 2066 Seo-ro Jangan-gu Suwon,
   Gyeonggi-do 16419 Republic of Korea EMail: darkhong@skku.edu

   [and others]

For clarity, between the name and the affiliation should be a comma or
dash.

[END]