This document currently describes the approach to web analytics for content on www.ietf.org maintained in the Wagtail CMS.
The IETF uses the Matomo On-Premise analytics package for www.ietf.org. While Matomo provides a broad range of functionalit, only a limited subset is used. Table 1 summarizes the data collection and the motivation for collecting each field. This approach was implemented after consulting the IETF community on a proposal, which provides additional detail and background..
Website analytics is implemented to:
Item | Description |
---|---|
IP addresses | IP addresses of the request |
Timestamp | Approximate data and and time a site resources was requested |
Page Title | The title of the requirements web page (from the HTML title tag) |
Page URL | URL of the requested resource |
Referer URL | URL of the page that linked to the requested resource |
File Downloads | Identifies which non-HTML resources were downloaded from the current page. |
Outside Link Clicks | Identifies which links to sites outside www.ietf.org were clicked on the current page. |
Page Speed | Track the time it takes for web pages to be generated by the webserver and then downloaded by the requestor |
Browser Language | The preferred language of the requestor’s browser (derived from the HTTP Accept-Language header) |
User Agent | The user agent string of the browser making the request (derived from the HTTP User-Agent header). |
Matomo collects the raw visitor data defined in Table 1, and then computes aggregate data (reports) summarizing this raw data.
The aggregated data is made available to the IETF LLC staff, contractors whose role requires it, and the Internet Engineering Steering Group.
Access to the raw visitor data is restricted to only those users required to operate the system.
Analytics configuration only uses client-side JavaScript to collect all metrics. The Matamo Image Tracker feature which allows limited metric collection without JavaScript is disabled.
A visitor can prevent all web analytics functionality by disabling JavaScript for www.ietf.org in their browser.
The collection and reporting of website usage metrics will entail the handling of IP addresses which in certain environments might enable user identification.
Therefore, the product will be configured to apply the “Matomo level 2” anonymization scheme:
IP addresses are not logged in un-anonymized form by the analytics system, and the system is configured to minimize the long-term re-identification of users across visits. Specifically, this entails disabling tracking cookies and not using the Matomo User ID feature in the Tracking API which allows for persistent user identification (even across networks).
Returning visitor statistics (i.e., the linking of multiple page requests) are enabled based on dynamically calculated fingerprint that uses the “operating system, browser, browser plugins, [anonymized] IP address and browser language”. The lifetime of this fingerprint is 30 minutes. There is residual risk that could lead to the identification of users:
Matomo is configured with data retention periods defined in Table 2. Data beyond this period is purged.
Data Set | Rentention |
---|---|
Raw Visitor Information | 5 days |
Aggregate Data | 12 months |
Configuration of website analytics is subject to review for GDPR compliance by IETF LLC Counsel, and compliance with the IETF Privacy Statement.