idnits 2.17.1 

draft-daveor-cgn-logging-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (April 11, 2018) is 2207 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'RFC5905' is defined on line 721, but no explicit
     reference was found in the text


     Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Engineering Task Force                              D. O'Reilly
3	Internet-Draft                                            April 11, 2018
4	Intended status: Informational
5	Expires: October 13, 2018

7	   Approaches to Address the Availability of Information in Criminal
8	  Investigations Involving Large-Scale IP Address Sharing Technologies
9	                      draft-daveor-cgn-logging-04

11	Abstract

13	   The use of large-scale IP address sharing technologies (commonly
14	   known as "Carrier-Grade NAT" and "A+P") presents a challenge for law
15	   enforcement agencies due to the fact that incoming source port
16	   information is not routinely logged by Internet-facing servers.  The
17	   absence of this information means that it is becoming increasingly
18	   difficult for law enforcement agencies to identify suspects in
19	   criminal activity online.  This document considers the reasons why
20	   source port information is not routinely logged by Internet-facing
21	   servers and makes recommendations to help improve the situation.  A
22	   deployment maturity model has been developed and a study of the
23	   support for logging incoming source port information in common server
24	   software is also presented.

26	Status of This Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.

31	   Internet-Drafts are working documents of the Internet Engineering
32	   Task Force (IETF).  Note that other groups may also distribute
33	   working documents as Internet-Drafts.  The list of current Internet-
34	   Drafts is at https://datatracker.ietf.org/drafts/current/.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	   This Internet-Draft will expire on October 13, 2018.

43	Copyright Notice

45	   Copyright (c) 2018 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (https://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document.  Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document.  Code Components extracted from this document must
54	   include Simplified BSD License text as described in Section 4.e of
55	   the Trust Legal Provisions and are provided without warranty as
56	   described in the Simplified BSD License.

58	Table of Contents

60	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
61	   2.  Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . .   4
62	   3.  Centralised Connection Logging  . . . . . . . . . . . . . . .   5
63	   4.  Challenges to Capturing Source Port . . . . . . . . . . . . .   7
64	     4.1.  Lack of Awareness . . . . . . . . . . . . . . . . . . . .   7
65	     4.2.  Lack of Support for Logging Source Port . . . . . . . . .   8
66	     4.3.  Additional Storage Requirements . . . . . . . . . . . . .   8
67	     4.4.  Default Log Formats . . . . . . . . . . . . . . . . . . .   8
68	     4.5.  Breaking Existing Tooling . . . . . . . . . . . . . . . .   9
69	     4.6.  Accuracy of Recorded Time . . . . . . . . . . . . . . . .   9
70	     4.7.  Translation of Source Port by Endpoint Infrastructure . .   9
71	   5.  Comparison Model  . . . . . . . . . . . . . . . . . . . . . .  10
72	   6.  Support for Logging Source Port . . . . . . . . . . . . . . .  10
73	   7.  Recommendations . . . . . . . . . . . . . . . . . . . . . . .  11
74	     7.1.  Raise Awareness of the Importance of Logging Source Port   12
75	     7.2.  Increase Support for Logging Source Port  . . . . . . . .  12
76	     7.3.  Update Default Log Formats  . . . . . . . . . . . . . . .  12
77	     7.4.  Adequate Timestamp Accuracy in Logs . . . . . . . . . . .  13
78	     7.5.  Source Port Translation in Endpoint Infrastructure  . . .  13
79	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  14
80	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  14
81	   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  15
82	   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  15
83	     11.1.  Informative References . . . . . . . . . . . . . . . . .  15
84	     11.2.  Normative References . . . . . . . . . . . . . . . . . .  15
85	   Appendix A.  Support for Source Port Logging in Various Server
86	                Software . . . . . . . . . . . . . . . . . . . . . .  17
87	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  18

89	1.  Introduction

91	   Large-scale IP address sharing technologies (such as "Carrier-Grade
92	   NAT", [RFC6888]) are a helpful tool for extending the life of IPv4
93	   addresses by allowing multiple endpoints to share a small number of
94	   IPv4 addresses.  A related category of technologies, known as
95	   "Address plus Port", or "A+P" [RFC6346], are also used for large-
96	   scale IP address sharing, achieved in these cases by using some of
97	   the port number bits for addressing purposes.  A number of such
98	   technologies have been discussed and deployed, such as Dual-Stack
99	   Lite [RFC6333], NAT64 [RFC6146], NAT444 [I-D.shirasaki-nat444],
100	   Lightweight 4over6 [RFC7596], MAP-E [RFC7597] and MAP-T [RFC7599].

102	   All of these technologies involve extending the space of available
103	   IPv4 addresses by mapping communication from multiple endpoints to a
104	   single, or small number of shared addresses, through the use of port
105	   numbers.  The detail of how this is achieved in each technology
106	   varies, but the principle remains the same in all cases.

108	   From the perspective of a server on the Internet, endpoint traffic
109	   that has passed through IP address sharing infrastructure appears to
110	   be originating from the IP address of the address sharing appliance.
111	   Common practice at the present time is for servers to log the
112	   connection time and source IP address of incoming connections.
113	   However, the IP address of the address sharing appliance is not
114	   sufficient to identify the true source of the traffic because
115	   potentially hundreds or thousands of individual endpoints were using
116	   that IP address at the same time.  If the need arises during a
117	   criminal investigation to identify the source of a specific
118	   connection, the source port and exact connection time will also be
119	   required.  Without this additional information it is highly unlikely
120	   that it will be possible for law enforcement authorities to progress
121	   their investigations.

123	   Information is required from at least two sources to establish the
124	   link from the logs of an Internet-facing server to a specific
125	   subscriber endpoint:

127	   1.  The administrator of the Internet-facing server must have logged
128	       enough information to enable the operator of the IP address
129	       sharing infrastructure to isolate a specific subscriber endpoint.

131	   2.  The operator of the IP address sharing infrastructure must have
132	       logged sufficient information (for a sufficient length of time)
133	       to be able, when provided with adequate data by a law enforcement
134	       agency, to isolate the relevant subscriber endpoint.

136	   The operators of large-scale IP address sharing infrastructure,
137	   typically Internet Service Providers, are usually required by law to
138	   maintain records of which endpoint was using a particular IP address
139	   and port at a particular time.  The period of time for which these
140	   records must be retained is defined by national legislation.
141	   Irrespective of whether (and for how long) these records are
142	   available, a starting point is needed to indicate to an investigating
143	   law enforcement agency that a particular endpoint was involved in a
144	   suspected criminal activity under investigation.  Without such a
145	   starting point, it would be very difficult to progress the
146	   investigation even as far as engagement with the operator of the
147	   address sharing infrastructure.  The records of Internet-facing
148	   servers are often a crucial source of this type of evidence.

150	   It has been recognised for some time that IP address sharing presents
151	   a challenge to the ability to trace network use and abuse [RFC7620].
152	   Further, it has also been recognised that this challenge is likely to
153	   become more severe and widespread with the increased use of large-
154	   scale address sharing [RFC6269].  More recently, Europol has
155	   highlighted the issue of large-scale IP address sharing as a threat
156	   to Internet governance [EUROPOL_IOCTA].  It is reported that the
157	   problem of crime attribution related to the use of carrier-grade NAT
158	   technologies is regularly encountered by 90% of respondents to a
159	   survey on the topic.

161	   Address sharing, including large-scale address sharing, is required
162	   as long as the use of IPv4 continues.  Full deployment of IPv6 has
163	   the potential to ultimately eliminate the current attribution issues
164	   arising from the use of large-scale address sharing technologies,
165	   although presumably new attribution challenges will arise in that
166	   scenario.  Since it is impossible to anticipate if or when full
167	   migration to IPv6 will take place, it is prudent to consider the
168	   implications of the transitionary technologies until the need for
169	   them has been eliminated.

171	2.  Scope

173	   Previous work has already suggested as best practice the logging by
174	   Internet-facing servers of source IP address, source port and exact
175	   connection time [RFC6302].  However, this continues to be
176	   exceptional, rather than routine, logging practice.  The purpose of
177	   this document is to consider in more detail how it might be possible
178	   to bring about routine logging by Internet-facing servers of the
179	   information needed to re-establish the ability to trace network abuse
180	   for criminal investigative purposes.  This document specifically does
181	   not address or consider the logging requirements of operators of
182	   large-scale address sharing infrastructre.  Instead, the focus is on
183	   the logging considerations of operators of Internet-facing servers.
184	   The main contributions of this document are:

186	   1.  To consider the reasons why source port logging is not routinely
187	       carried out.

189	   2.  To identify some possible solutions and workarounds for the
190	       reasons that source port logging is not routinely carried out.

192	   3.  To examine the feasibility of source port logging from the
193	       perspective of software support for this feature.

195	   Clearly no single solution will address the problem of crime
196	   attribution on the Internet.  Load balancers, proxies and other
197	   network infrastructure may also, intentionally or as a side-effect,
198	   obfuscate the true source of Internet traffic and these problems will
199	   continue to exist with or without the presence of large-scale address
200	   sharing technologies (like Carrier-Grade NAT and A+P).  Nevertheless,
201	   at the time of writing large-scale address sharing technologies
202	   present a significant challenge to crime attribution, as highlighted
203	   by Europol in the above referenced link, and this document attempts
204	   to consider the challenges specifically presented by that category of
205	   technologies.

207	   The discussion begins by considering whether centralised connection
208	   logging is a viable solution to the problem of subscriber
209	   identification in criminal investigations.  This is followed by an
210	   examination of the reasons why source port logging is not currently
211	   routinely carried out.  A model has been developed for the comparison
212	   of the maturity of various server deployments to log source port and
213	   a study of common server software has been performed to assess the
214	   status of support for this functionality.  Many, but not all,
215	   enterprise server solutions that were examined made the logging of
216	   source port either "Possible" or "Feasible", as defined in the
217	   maturity model.  Only one type of server software examined made the
218	   logging of source port "Default".

220	3.  Centralised Connection Logging

222	   When large-scale IP address sharing technologies are used, source IP
223	   address is no longer a sufficient identifier of an individual
224	   subscriber.  At a minimum, source port and accurate timestamp
225	   information are also required to distinguish between the potentially
226	   large number of individual users of a specific IP address at a
227	   particular time.  [RFC6269] points out that there are two solutions
228	   to the question of how adequate information can be recorded to
229	   identify the parties to a particular connection.  They are:

231	   1.  Operators of IP address sharing infrastructure log mappings
232	       between (source IP address, source port) combinations and their
233	       subscribers.  Server operators log the IP address and source port
234	       of incoming connections.  This is referred to as source port
235	       logging.

237	   2.  Instead of relying on server operators to log the source port of
238	       incoming connections, operators of IP address sharing
239	       infrastructure log all combinations of (external IP address,
240	       external port, destination IP address) for outgoing connections.
241	       This is referred to as connection logging.  Server operators log
242	       the IP address and timestamp of incoming connections, which is
243	       the common current practice.

245	   Two challenges to the use of connection logging by operators of IP
246	   address sharing infrastructure are also presented in RFC6269.
247	   Briefly:

249	   o  The volumes of data involved make centralised recording of
250	      destination IP addresses infeasible.

252	   o  Many individuals using the same IP address to access a popular
253	      destination (e.g. a popular website) might mean that it is not
254	      possible to distinguish between the activity of one subscriber and
255	      another, even if connection records are kept by the operator of
256	      the address sharing infrastructure.

258	   The first issue raised is that the volumes of data involved make
259	   centralised recording of destination IP addresses infeasible.
260	   Whether destination IP addresses are recorded or not, the volume of
261	   logs generated by a large-scale IP address sharing infrastructure
262	   will be substantial, and some approaches have been proposed to
263	   address this hurdle and make central connection logging more
264	   feasible, such as deterministic allocation of ports
265	   [RFC6269],[RFC7422] or allocation of port ranges [RFC7768],
266	   [RFC6346].  While arguments of infeasibility are not arguments in
267	   principle why such logging cannot be done, the volumes of data
268	   involved in recording every single outgoing connection in a large
269	   Internet service provider represent legitimate technical, commercial
270	   and operational arguments for why it can not work in practice.  Some
271	   representative figures for the scales of data involved can be found
272	   in [RFC7422], wherein it is estimated that the logging overhead would
273	   be of the order of 150MB per subscriber, per month.  For a service
274	   provider with one million subscribers, this would produce a volume of
275	   logs (uncompressed) of the order of 150 terabytes per month.  Aside
276	   from the technical overhead of storing such a volume of data,
277	   searching and locating relevant records over an extended, legally
278	   mandated retention period would also present a significant technical
279	   challenge.

281	   The second point raised in [RFC6269] against connection logging by
282	   operators of IP address sharing infrastructure suggests that even if
283	   connection logs store all combinations of (timestamp, source IP,
284	   source port, destination IP), if this information is queried in the
285	   absence of source port because source port has not been recorded by
286	   the destination IP, this would not be sufficient to distinguish the
287	   activity of one individual from another in cases where the
288	   destination IP is a popular one.  This problem is further exacerbated
289	   in the case of protocols that make multiple connections per session
290	   (e.g.  HTTP/HTTPS).  The implication of this point is that connection
291	   logging, despite potential significant technical and operational
292	   overhead, cannot guarantee that the information retained is
293	   sufficient to identify an individual suspect, even when all required
294	   records are available.

296	   Finally, the privacy concerns arising from connection logging in this
297	   scenario have been repeatedly raised [RFC6888] and
298	   [I-D.ietf-behave-ipfix-nat-logging].

300	   In summary, it is certainly clear that operators of address sharing
301	   infrastructure need to retain records to enable the identification of
302	   suspects, and such records must consist of, at least, sufficient
303	   information to identify an individual subscriber when provided with a
304	   timestamp, source IP, source port and destination IP.  However, there
305	   is no centralised solution available that removes the need for server
306	   operators to retain source port information.

308	4.  Challenges to Capturing Source Port

310	   It is relatively easy to articulate the reason why the operator of an
311	   Internet-facing server would wish to retain source port information
312	   for incoming connections.  If the server operator (or the users that
313	   they serve) finds themselves the victim of a crime, it is preferable
314	   that all information that could be needed by the server operator to
315	   facilitate a criminal investigation is available.  On the other hand,
316	   there are reasons why a server operator might not have the required
317	   source port information.  This section enumerates the factors that
318	   could negatively influence both the ability and the inclination of
319	   server operators to capture and record source port information.

321	4.1.  Lack of Awareness

323	   Server operators are principally focussed on delivering the services
324	   for which they are operating their infrastructure.  One of the main
325	   problems with the increasing use of IP address sharing technologies
326	   is the lack of awareness on the part of server operators that there
327	   are direct implications for them in case they should become the
328	   victim of a crime.

330	   At the time of writing, a minimal amount of material is available
331	   online concerning this issue, even for those actively seeking to find
332	   out about source port logging.  Where specific guidance or
333	   information has been provided by vendors in relation to the
334	   configuration of source port logging, no explanation is provided for
335	   why this might be something that server operators might consider
336	   desirable.  For example [MSDN_IIS_LOG].

338	   There is, therefore, a considerable awareness gap between the
339	   importance of this issue for the purpose of investigating criminal
340	   activity online and the awareness of those who need to act in advance
341	   of any criminality taking place to ensure that the information needed
342	   to facilitate a future investigation is available.

344	4.2.  Lack of Support for Logging Source Port

346	   Before a server operator can decide to log source port information,
347	   the server software must support logging of the source port of
348	   incoming connections.  Many, but not all major software distributions
349	   support the logging of the source port of incoming connections.
350	   Clearly lack of support in server software is a technical obstacle
351	   for a server operator to logging source port at the endpoint.  It may
352	   still be possible to log source port at some location before the
353	   server endpoint (e.g. at a reverse proxy) but absence of support in
354	   server software will mean that endpoint logging will not be possible.

356	4.3.  Additional Storage Requirements

358	   In cases where it is possible to simply add source port to the list
359	   of fields recorded in log entries, the additional storage required to
360	   preserve source port data is minimal; in the region of six bytes per
361	   log entry (maximum of five ASCII digits for the source port plus an
362	   additional delimiter).

364	   However, in some cases where software supports logging source port of
365	   incoming connections, it has been noted that this can only be
366	   achieved by enabling verbose or debug logging in the software.  This
367	   would substantially (and unnecessarily) increase the size of logs
368	   produced by the server and would also, in all probability, reduce the
369	   production performance of the server.  These factors would
370	   undoubtedly negatively influence the decision by a server operator to
371	   log incoming source port.

373	4.4.  Default Log Formats

375	   Many major software distributions provide default log formats in
376	   their configuration files.  A review of the default log format of
377	   some common server software has been carried out and in only one case
378	   was it found that the source port of incoming connections is logged
379	   by any of the default log formats.

381	4.5.  Breaking Existing Tooling

383	   Much commercial and free log analysis software, by default, expects
384	   logs to be in a particular format.  Consider, for example, the
385	   ubiquity of the Apache Common and Extended Log Formats.  The software
386	   can usually be configured to parse arbitrary log formats, but this is
387	   additional configuration work for a server operator.  For example:
388	   [ANALOG_LOG_CONFIG],[AWSTATS_LOG_CONFIG].  Without migration
389	   planning, a change to default log formats would most likely cause
390	   substantial disruption to a considerable amount of downstream
391	   processing of server log files.  In addition to commercially
392	   available software, many administrators have developed or downloaded
393	   scripts that expect logs to be in a standard log format.

395	   Therefore, log processing software, and in particular custom scripts,
396	   may break if default log formats change unexpectedly.  At least, the
397	   tooling may need to be updated to correctly process the additional
398	   fields newly present in log file.

400	4.6.  Accuracy of Recorded Time

402	   As well as recording the IP address and source port of the
403	   connection, it is important to record the exact time of the
404	   connection.  It has been suggested that there is a need for keeping
405	   the exact time against some sort of global standard (e.g.  NTP)
406	   [RFC6302], however this may not be possible for practical, security
407	   or legacy reasons.  In practice, it is usually not necessary to keep
408	   time against a global standard, as long as time is recorded
409	   consistently.  The reason for this is that any time offset between
410	   the server and the time recorded in another organisation's records
411	   (running address sharing infrastructure) can be calculated and
412	   compensated for manually.  Time offsets of this nature are commonly
413	   encountered and well understood in the digital forensics world.

415	4.7.  Translation of Source Port by Endpoint Infrastructure

417	   It is common for an incoming connection to terminate somewhere other
418	   than the actual server that is ultimately handling the connection.
419	   Load balancers, proxies or denial of service countermeasures may be
420	   present to improve the efficiency or availability of the platform,
421	   any one of which could potentially terminate the incoming connection.
422	   The operation of these types of endpoint infrastructure can cause
423	   translation of the incoming connection parameters, including source
424	   port, before the connection is established to the actual server
425	   endpoint.

427	   In such cases the source port logged at the server endpoint is a
428	   source port that only has meaning within the endpoint infrastructure
429	   and in most cases will not carry any information about the source
430	   port in use at the connection origin, in this case the connection
431	   origin being the large-scale address sharing infrastructure.  In the
432	   worst case scenario (from a crime attribution point of view), the
433	   endpoint infrastructure may obfuscate the true source connection
434	   information in a way that is unrecoverable.

436	5.  Comparison Model

438	   A model has been developed to assist with comparison of the maturity
439	   of server software deployments to store and retrieve source port
440	   information for incoming connections.  The model is depicted in
441	   Figure 1.

443	   +-------------------------------------------------------------+
444	   | Possible -> Feasible -> Default -> Manageable -> Accessible |
445	   +-------------------------------------------------------------+

447	                                 Figure 1

449	   o  "Possible": Means that the server software supports, in any way,
450	      the ability to record source ports for incoming connections.

452	   o  "Feasible": Means that it there are no significant performance or
453	      storage implications for enabling the storage of source ports.

455	   o  "Default": Means that, at a minimum, at least one of the default
456	      log formats provided with the software distribution enables the
457	      storage of source ports.

459	   o  "Manageable": Means that tooling is, or has been, build or adapted
460	      to support the storage of source ports.

462	   o  "Accessible": Means that it is possible to identify and retrieve
463	      relevant records in the stored log data.

465	6.  Support for Logging Source Port

467	   Open-source research has been conducted to assess the status of
468	   support for logging of source port information in common server
469	   software.

471	   The assessment criteria were as follows:

473	   o  Server software is categorised as "Possible" if there was any way
474	      identified to cause the logging of source port.

476	   o  Server software is categorised as "Feasible" if the logging of
477	      source port does not require increasing the log level to cause the
478	      logging of source port to be possible.  In other words, if a
479	      server requires enabling verbose, debug or audit logging in order
480	      to be able to record source port then logging is "Possible" but
481	      not "Feasible".

483	   o  Server software is categorised as "Default" if at least one of the
484	      available default log formats enables logging of the incoming
485	      source port, or if source port is logged by default.

487	   o  The "Manageable" and "Accessible" aspects of the comparison model
488	      relate to specific deployments and are therefore not considered in
489	      the assessment of server software support.

491	   The latest versions of 16 common server software packages have been
492	   examined and documentation has been research to identify if and how
493	   source port logging can be enabled.  The findings are described in
494	   Appendix A.  Online documentation has been examined to identify if
495	   and how source port logging can be enabled.  The results are
496	   presented in the following table:

498	        +----------+----------+---------+------------+------------+
499	        | Possible | Feasible | Default | Manageable | Accessible |
500	        +----------+----------+---------+------------+------------+
501	        |    13    |    11    |    1    |    N/A     |    N/A     |
502	        +----------+----------+---------+------------+------------+

504	                          Table 1: Support Table

506	   It was noted that only one of the server software packages examined
507	   (OpenSSH version 7.5) enables the logging of incoming source port by
508	   default.  This conclusion has been reached despite using the most
509	   generous possible interpretation of "Default", whereby meeting the
510	   criteria for "Default" is achieved when logging of source port is
511	   offered as a possible default, rather than requiring that logging of
512	   source port is enabled by default.  In due course, as awareness of
513	   this issue increases, it is envisioned that a stricter interpretation
514	   of "Default" would be more appropriate, requiring that the logging of
515	   source port be enabled by default.

517	7.  Recommendations

519	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
520	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
521	   document are to be interpreted as described in [RFC2119].

523	   The recommendations presented below are courses of action that have
524	   been identified based on the current state of source port logging and
525	   the challenges described above.

527	7.1.  Raise Awareness of the Importance of Logging Source Port

529	   Publishers of both free and commercial software SHOULD release
530	   deployment guidance or best practice that describes why server
531	   administrators need to record source port information, with
532	   instructions for how this can be done.  This will help to address the
533	   lack of awareness of the importance of this issue.

535	   Considering also the awareness of those who are building software
536	   applications, or otherwise involved with coding of Internet-facing
537	   applications, secure coding guidance SHOULD be updated to include
538	   reference to source port information, particularly where such
539	   guidance already touches on the issue of logging.  For example the
540	   OWASP Secure Coding Practices specifies a list of important log event
541	   data [OWASP_SCP].  However the "important log event data" list does
542	   not, at the time of writing, include source port.

544	7.2.  Increase Support for Logging Source Port

546	   Many software packages support logging of source port information,
547	   but only ten out of the sixteen examined support logging in a way
548	   that would not significantly negatively impact the operation of the
549	   server software.  Software publishers therefore need to consider
550	   their level of support of logging source port.  In particular,
551	   software SHOULD support the logging of source port and SHOULD do so
552	   in a way that does not substantially impact on production
553	   performance.

555	7.3.  Update Default Log Formats

557	   In cases where a software package has support for logging of incoming
558	   source port, the configuration SHOULD incorporate one or more
559	   optional log formats that include incoming source port as a field
560	   logged by default.  Obviously this will not have any impact on
561	   deployments of the software that are already in place but for future
562	   deployments, the incorporation of source port into "out of the box"
563	   log formats will mean that those administrators using unaltered
564	   default log formats will automatically store the needed information.
565	   Software vendors SHOULD provide a default log format that includes
566	   logging of source port, as described in this document.

568	   An alternative approach, taking into account the fact that changes to
569	   log formats might break downstream tooling, would be to configuring
570	   parallel logging of connection information to a separate log stream.

572	   This would also be a possible solution that could be used by those
573	   server software types that log via syslog.  In this case, software
574	   publishers SHOULD produce guidance on how to configure syslog to log
575	   connection information parallel to the main log files.  Such a
576	   solution would help to ease the transition to an alternate log format
577	   since current log formats would not need to be changed because the
578	   required source port information is stored separately, but can still
579	   be correlated with the main log files if needed.

581	7.4.  Adequate Timestamp Accuracy in Logs

583	   In order to query their records, operators of large-scale address
584	   sharing infrastructure will usually need connection times specified
585	   with at least the granularity of a second.  Consideration should be
586	   given by server operators to making sure that the times recorded in
587	   their log files have sufficient accuracy to allow identification of
588	   the required records.  Server software SHOULD be able to log time
589	   with at least the granularity of a second.

591	   There are many reasons why it is may not be possible for servers to
592	   record logs with reference to a global time source.  This could
593	   include scenarios should as security sensitive networks, or internal
594	   production networks.  As long as times are recorded consistently, it
595	   should be possible to measure the offset from a traceable global time
596	   source (if required) for the purposes of quering records at another
597	   source.  If the entity controlling the server is aware that there is
598	   an offset required to synchronise with a global time source, it is
599	   expected that the offset would be indicated by the entity while the
600	   logs were being collected.

602	   Adequate timstamp accuracy also needs to be considered by software
603	   developers when they are producing software.  Although the recording
604	   of time is mentioned in the OWASP Secure Coding Practices, the
605	   required accuracy/granularity of the recorded time is not discussed
606	   [OWASP_SCP].  Development guidance SHOULD include clarifying that
607	   times need to be recorded with at least the granularity of a second.

609	7.5.  Source Port Translation in Endpoint Infrastructure

611	   In cases where endpoint infrastructure terminates incoming
612	   connections (proxies, load balancers, etc.), and the infrastructure
613	   translates incoming source port information, there is a risk that the
614	   important crime attribution information may be lost.  One possibility
615	   is to log source port information at the endpoing infrastructure and
616	   this may be an appropriate solution in some cases.  However, this may
617	   lead to an excessive volume of logging, depending on the particular
618	   scenario.  For example if the intermediate infrastructure is being
619	   used to mitigate DDoS attacks, logging all incoming traffic would
620	   potentially lead to logging of all incoming DDoS connections.  This
621	   would clearly be an undesirable outcome.

623	   An alternative solution is to pass information about the original
624	   connection (before mapping/translation of connection information
625	   takes place) to the actual endopint.  Solutions to achieve this
626	   already exist for certain application layer protocols.  The Forwarded
627	   HTTP Extention [RFC7239], for example, supports (as an optional
628	   feature) the tranfer of source port information in the "Forwarded
629	   For" header, and this technique can also support multiple layers of
630	   proxying without loss of attribution.  Therefore, endpoint
631	   infrastructure that translates source ports SHOULD pass the original
632	   connection information through to the Internet-facing server for
633	   logging purposes.

635	8.  IANA Considerations

637	   This memo includes no request to IANA.

639	9.  Security Considerations

641	   Clearly a balance needs to be struck between individual right to
642	   privacy and law enforcement access to data during criminal
643	   investigations.  On the one hand, the routine logging of any
644	   additional information has the potential to introduce risks related
645	   to privacy and human rights.  On the other hand, there is a societal,
646	   crime prevention requirement to address the information gap created
647	   by large-scale address sharing technologies.  Across the world there
648	   are also a broad spectrum of legislative regimes and human rights
649	   challenges, interpretation of which relate directly to this question.

651	   IP addresses are routinely logged today and this information can be
652	   used for identification of people online in some cases.  The cases in
653	   which an IP addresses does not identify an individual directly are
654	   not necessarily apparent to the person performing the logging (who
655	   cannot tell, for example, if the true source of the traffic is behind
656	   a NAT or other form of proxy) and the same is true even if source
657	   port is logged.  It is not apparent that there is any additional risk
658	   to individual privacy between the case when a single piece of
659	   endpoint identifying information (source IP address) is logged versus
660	   the case when two pieces of endpoint identifying information (source
661	   IP address and source port) are logged.  Balancing this against the
662	   significant advantages from the crime attribution point of view
663	   suggests that this may be a worthwhile approach.

665	10.  Acknowledgements

667	   Several members of the v6ops mailing list provided valuable feedback
668	   and discussion on early drafts of this document.  In particular, Tom
669	   Herbert, Ca By, Ole Troan, Lee Howard, Erik Nygren, Fred Baker,
670	   Fernando Gont, Gert Doering, Mark Smith, Jordi Palet Martinez, DY
671	   Kim, Mark Andrews and T.  Petch.  Special acknowledgement also goes
672	   to Mohamed Boucadiar who has provided ongoing feedback throughout the
673	   document development process.

675	11.  References

677	11.1.  Informative References

679	   [I-D.ietf-behave-ipfix-nat-logging]
680	              Sivakumar, S. and R. Penno, "IPFIX Information Elements
681	              for logging NAT Events", draft-ietf-behave-ipfix-nat-
682	              logging-13 (work in progress), January 2017.

684	   [I-D.shirasaki-nat444]
685	              Yamagata, I., Shirasaki, Y., Nakagawa, A., Yamaguchi, J.,
686	              and H. Ashida, "NAT444", draft-shirasaki-nat444-06 (work
687	              in progress), July 2012.

689	11.2.  Normative References

691	   [ANALOG_LOG_CONFIG]
692	              Analog, "Analog 6.0: Log formats", 2017,
693	              <http://mirror.reverse.net/pub/analog/docs/logfmt.html>.

695	   [AWSTATS_LOG_CONFIG]
696	              AWStats, "AWStats Installation, Configuration and
697	              Reporting (for version 7.6)", 2017,
698	              <https://awstats.sourceforge.io/docs/awstats_setup.html>.

700	   [EUROPOL_IOCTA]
701	              Europol, "The Internet Organised Crime Threat Assessment",
702	              2016, <https://www.europol.europa.eu/activities-services/
703	              main-reports/
704	              internet-organised-crime-threat-assessment-iocta-2016>.

706	   [MSDN_IIS_LOG]
707	              Microsoft, "IIS 8.5 - How to log client port number",
708	              2015, <https://blogs.msdn.microsoft.com/amb/2015/11/12/
709	              iis-8-5-how-to-log-client-port-number/>.

711	   [OWASP_SCP]
712	              OWASP, "OWASP Secure Coding Practices Quick Reference
713	              Guide", 2010, <https://www.owasp.org/images/0/08/
714	              OWASP_SCP_Quick_Reference_Guide_v2.pdf>.

716	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
717	              Requirement Levels", BCP 14, RFC 2119,
718	              DOI 10.17487/RFC2119, March 1997,
719	              <https://www.rfc-editor.org/info/rfc2119>.

721	   [RFC5905]  Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch,
722	              "Network Time Protocol Version 4: Protocol and Algorithms
723	              Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010,
724	              <https://www.rfc-editor.org/info/rfc5905>.

726	   [RFC6146]  Bagnulo, M., Matthews, P., and I. van Beijnum, "Stateful
727	              NAT64: Network Address and Protocol Translation from IPv6
728	              Clients to IPv4 Servers", RFC 6146, DOI 10.17487/RFC6146,
729	              April 2011, <https://www.rfc-editor.org/info/rfc6146>.

731	   [RFC6269]  Ford, M., Ed., Boucadair, M., Durand, A., Levis, P., and
732	              P. Roberts, "Issues with IP Address Sharing", RFC 6269,
733	              DOI 10.17487/RFC6269, June 2011,
734	              <https://www.rfc-editor.org/info/rfc6269>.

736	   [RFC6302]  Durand, A., Gashinsky, I., Lee, D., and S. Sheppard,
737	              "Logging Recommendations for Internet-Facing Servers",
738	              BCP 162, RFC 6302, DOI 10.17487/RFC6302, June 2011,
739	              <https://www.rfc-editor.org/info/rfc6302>.

741	   [RFC6333]  Durand, A., Droms, R., Woodyatt, J., and Y. Lee, "Dual-
742	              Stack Lite Broadband Deployments Following IPv4
743	              Exhaustion", RFC 6333, DOI 10.17487/RFC6333, August 2011,
744	              <https://www.rfc-editor.org/info/rfc6333>.

746	   [RFC6346]  Bush, R., Ed., "The Address plus Port (A+P) Approach to
747	              the IPv4 Address Shortage", RFC 6346,
748	              DOI 10.17487/RFC6346, August 2011,
749	              <https://www.rfc-editor.org/info/rfc6346>.

751	   [RFC6888]  Perreault, S., Ed., Yamagata, I., Miyakawa, S., Nakagawa,
752	              A., and H. Ashida, "Common Requirements for Carrier-Grade
753	              NATs (CGNs)", BCP 127, RFC 6888, DOI 10.17487/RFC6888,
754	              April 2013, <https://www.rfc-editor.org/info/rfc6888>.

756	   [RFC7239]  Petersson, A. and M. Nilsson, "Forwarded HTTP Extension",
757	              RFC 7239, DOI 10.17487/RFC7239, June 2014,
758	              <https://www.rfc-editor.org/info/rfc7239>.

760	   [RFC7422]  Donley, C., Grundemann, C., Sarawat, V., Sundaresan, K.,
761	              and O. Vautrin, "Deterministic Address Mapping to Reduce
762	              Logging in Carrier-Grade NAT Deployments", RFC 7422,
763	              DOI 10.17487/RFC7422, December 2014,
764	              <https://www.rfc-editor.org/info/rfc7422>.

766	   [RFC7596]  Cui, Y., Sun, Q., Boucadair, M., Tsou, T., Lee, Y., and I.
767	              Farrer, "Lightweight 4over6: An Extension to the Dual-
768	              Stack Lite Architecture", RFC 7596, DOI 10.17487/RFC7596,
769	              July 2015, <https://www.rfc-editor.org/info/rfc7596>.

771	   [RFC7597]  Troan, O., Ed., Dec, W., Li, X., Bao, C., Matsushima, S.,
772	              Murakami, T., and T. Taylor, Ed., "Mapping of Address and
773	              Port with Encapsulation (MAP-E)", RFC 7597,
774	              DOI 10.17487/RFC7597, July 2015,
775	              <https://www.rfc-editor.org/info/rfc7597>.

777	   [RFC7599]  Li, X., Bao, C., Dec, W., Ed., Troan, O., Matsushima, S.,
778	              and T. Murakami, "Mapping of Address and Port using
779	              Translation (MAP-T)", RFC 7599, DOI 10.17487/RFC7599, July
780	              2015, <https://www.rfc-editor.org/info/rfc7599>.

782	   [RFC7620]  Boucadair, M., Ed., Chatras, B., Reddy, T., Williams, B.,
783	              and B. Sarikaya, "Scenarios with Host Identification
784	              Complications", RFC 7620, DOI 10.17487/RFC7620, August
785	              2015, <https://www.rfc-editor.org/info/rfc7620>.

787	   [RFC7768]  Tsou, T., Li, W., Taylor, T., and J. Huang, "Port
788	              Management to Reduce Logging in Large-Scale NATs",
789	              RFC 7768, DOI 10.17487/RFC7768, January 2016,
790	              <https://www.rfc-editor.org/info/rfc7768>.

792	Appendix A.  Support for Source Port Logging in Various Server Software

794	   The table below enumerates the findings of best-effort, open-source
795	   review of documentation of the various products.  Where it has been
796	   indicated that it is not possible to log source port then either (a)
797	   no reference has been identified in online documentation to indicate
798	   how source port logging can be enabled, or (b) a reference positively
799	   indicating that logging of source port is not possible has been
800	   found.

802	   +---------+------------+------------+----------+----------+---------+
803	   | Categor |   Server   |  Version   | Possible | Feasible | Default |
804	   |    y    |            |            |          |          |         |
805	   +---------+------------+------------+----------+----------+---------+
806	   |   HTTP  |   Apache   |   2.4.25   |   Yes    |   Yes    |    No   |
807	   |         |   HTTPD    |            |          |          |         |
808	   |   HTTP  |    IIS     |     10     |   Yes    |   Yes    |    No   |
809	   |   HTTP  |   Tomcat   |   8.5.15   |   Yes    |   Yes    |    No   |
810	   |   HTTP  |   Squid    |   3.5.25   |   Yes    |   Yes    |    No   |
811	   |   HTTP  |   nginx    |   1.12.0   |   Yes    |   Yes    |    No   |
812	   |   Mail  |  sendmail  |   8.15.2   |   Yes    |   Yes    |    No   |
813	   |   Mail  | Microsoft  |    2016    |   Yes    |    No    |    No   |
814	   |         |  Exchange  |            |          |          |         |
815	   |         |   Server   |            |          |          |         |
816	   |   Mail  |  Postfix   |   2.10.0   |   Yes    |   Yes    |    No   |
817	   |   Mail  |    Exim    |    4.89    |   Yes    |   Yes    |    No   |
818	   |   Mail  |  Dovecot   |  2.2.30.1  |   Yes    |   Yes    |    No   |
819	   |   Mail  |  UW IMAP   | imap-2007f |    No    |    No    |    No   |
820	   |  DBase  |   Oracle   |  12.2.0.1  |    No    |    No    |    No   |
821	   |  DBase  |   MySQL    |   5.7.18   |    No    |    No    |    No   |
822	   |  DBase  | Microsoft  |    2016    |   Yes    |    No    |    No   |
823	   |         | SQL Server |            |          |          |         |
824	   |  DBase  | PostgreSQL |   9.6.3    |   Yes    |   Yes    |    No   |
825	   |   SSH   |  OpenSSHD  |    7.5     |   Yes    |   Yes    |   Yes   |
826	   +---------+------------+------------+----------+----------+---------+

828	             Table 2: Support for Logging Incoming Source Port

830	Author's Address

832	   David O'Reilly
833	   Ireland

835	   Email: rfc@daveor.com