The IETF uses the Matomo On-Premise analytics package for www.ietf.org. While Matomo provides a broad range of functionalit, only a limited subset is used. Table 1 summarizes the data collection and the motivation for collecting each field. This approach was implemented after consulting the IETF community on a proposal, which provides additional detail and background..
Website analytics is implemented to:
- limit collection and retention of data to what is needed to serve specific identified purposes;
- not require the use of web cookies; and
|IP addresses||IP addresses of the request|
|Timestamp||Approximate data and and time a site resources was requested|
|Page Title||The title of the requirements web page (from the HTML title tag)|
|Page URL||URL of the requested resource|
|Referer URL||URL of the page that linked to the requested resource|
|File Downloads||Identifies which non-HTML resources were downloaded from the current page.|
|Outside Link Clicks||Identifies which links to sites outside www.ietf.org were clicked on the current page.|
|Page Speed||Track the time it takes for web pages to be generated by the webserver and then downloaded by the requestor|
|Browser Language||The preferred language of the requestor’s browser (derived from the HTTP Accept-Language header)|
|User Agent||The user agent string of the browser making the request (derived from the HTTP User-Agent header).|
Matomo collects the raw visitor data defined in Table 1, and then computes aggregate data (reports) summarizing this raw data.
Access to the raw visitor data is restricted to only those users required to operate the system.
The collection and reporting of website usage metrics will entail the handling of IP addresses which in certain environments might enable user identification.
Therefore, the product will be configured to apply the “Matomo level 2” anonymization scheme:
- IPv4 – mask the lower 16 bits of the address
- IPv6 – mask the lower 80 bits of the address
IP addresses are not logged in un-anonymized form by the analytics system, and the system is configured to minimize the long-term re-identification of users across visits. Specifically, this entails disabling tracking cookies and not using the Matomo User ID feature in the Tracking API which allows for persistent user identification (even across networks).
Returning visitor statistics (i.e., the linking of multiple page requests) are enabled based on dynamically calculated fingerprint that uses the “operating system, browser, browser plugins, [anonymized] IP address and browser language”. The lifetime of this fingerprint is 30 minutes. There is residual risk that could lead to the identification of users:
- Geolocation of these IP addresses (in concert with the Browser Language) is an expected analysis. For countries with small number of IETF participants, one might be able to infer their usage.
- With holistic access to the raw visitor data (likely through SQL-level access to the underlying Matomo database as this is not a product feature), novel de-anonymization approaches could be possible. This risk is mitigated by restricting access to the database (and raw visitor information) as noted above.
Matomo is configured with data retention periods defined in Table 2. Data beyond this period is purged.
|Raw Visitor Information||5 days|
|Aggregate Data||12 months|
Configuration of website analytics is subject to review for GDPR compliance by IETF LLC Counsel, and compliance with the IETF Privacy Statement.
Monthly overview reports for www.ietf.org web analytics