Noise or Music?

The Impact of AnonymizeIP (aip)

Categories: GDPR & Privacy, Setup Accuracy / Comments: 0

Share Button

A key recommendation for GDPR compliance is to use the Google Analytics feature anonymizeIP, also known as “aip”. As the name suggests, this simple switch drops the last 3 digits from your visitor’s IP address. For example, if a visitor has a public ip of 215.113.40.184, then Google will obfuscate this to 217.115.40. (and all other visitor ip addresses) when it processes your Google Analytics data. No ip address data is stored. That’s good for GDPR compliance because ip data is considered PII in Europe. Note the difference between PI and PII in my last post: GDPR – Should You Request Consent Before Tracking?

Note that IP address anonymization is not a panacea switch to make you GDPR compliant, rather, a useful feature to have for extra assurance.

Anonymize IP Analytics Impact Assessment

Within Google Analytics, ip addresses are used to determine the geolocation of a visitor i.e. what continent, country, city did they connect from? That’s useful information in terms of assessing your global reach, content requirements, and for marketing purposes – for example, measuring the impact of specific geolocation campaigns. Therefore not having an ip address stored for your visitors is a big deal.

So if the last 3 digits of an IP address are removed, does that impact geolocation data?

The answer is of course yes, but how much of an impact, and should you worry about it?

To establish that we conducted a detailed study – one were a website with a global reach can send data to two identically set up Google Analytics web properties: One with the standard full ip address (aip=off) and one where anonymizeIP is set (aip=on). We then compared the datasets in BigQuery to measure the geolocation difference.

In fact this was originally done on a small scale (2,267 sessions) by Huiyan Wan of Conversion Works, based on a UK centric website audience. I therefore wanted to work with Huiyan to replicate and verify her results using a much larger dataset of 1 million sessions and one with a much broader reach of international visitors.

The full study, conducted for visitors during the month of May 2018, is detailed at: Anonymize IP Analytics Impact Assessment (full web page report).

Summary of Results

There is no noticeable difference in accuracy at country level. However, if you are reliant on 80%+ accuracy at city level, then the anonimizeIP method may impact your performance.

map_ip_test_eu_2
  • Overall Discrepancy
    • The difference at continent and country level is negligible, with aip=on showing 99.5%+ the same results as with the full-ip is used. Hence, this remaining summary is based only on city differences.
    • Differences become significant at city level with an average of 76.7% accuracy. However allowing for a 50Km margin, the accuracy improves to 87.0%. Allowing for a 200Km margin the accuracy further increases to 93.0%
  • Discrepancy by Device (mobile v desktop)
    • The same trend is observed for mobile and desktop traffic, with mobile devices showing a 5.9% increase in city accuracy. This accuracy reduces in magnitude as the margin is increased to 50 and 200Km.
  • Discrepancy by Sub-Continent
    • On the whole, the less economically developed sub-continents of the world show better city level accuracy than the more advanced economies. For example, from the chart below, eastern Europe is darker (more accurate) than other parts of Europe. The same story is observed for other regions of the world.
    • Although at first surprising, I attribute this to early adopter countries – that is, early internet access adopters such as the US, UK, EU – giving little attention to the geographic governance of IP location information in the early days of internet growth. Whereas, secondary adopters learnt from the mistakes of the pioneering countries and implemented better record keeping of IP geolocations. In addition, less developed economies are likely to have a higher reliance on mobile devices for internet access, which are more geo accurate.

Methodology Notes

Website: Collected data is from a European headquarter global brand, categorised in the “sports” vertical. Receiving approximately 3M sessions per month during the period of this study, May/June 2018.

Session Data: Collected in duplicate via GTM. That is, one data hit sent to a standard (full-ip) Google Analytics property, and an identical hit sent to an anonymizedIP (aip=1) property of the same GA account.

Google Analytics Configuration: All property and view configuration was identical, in particular: “Enable Users Metric in Reporting” (property level); Filters applied and their order (view level); Robots excluded via Google’s auto-blocking method (view level).

Anomalies:

  • Loss of Users and Sessions. With anonymizedIP, there was an unexplained small loss of users (-0.10%) and a small rise in session count (+0.07%). This may be attributable to a “rogue” tag in the full-ip property that was not configured in the aip=1 property (the GA account contained 63 GTM tags at the time of the study).
  • Bounce Rate. There was a significant difference in the reported bounce between the full-ip and aip=1 properties, 22.84% versus 35.07% respectively. I attribute this to the impact the setting of “Exclude all hits from known bots and spiders” for each view. That is, although the method employed by Google for achieving this is unknown, it is likely to be a machine learning method where historical data has a significant impact when determining if a session if from a robot or spider. At the time of this research, the full-ip property is 7 years old, while the aip=1 property is less than 1 month old. Hence my hypothesis that the property age is the cause of this. This was verified by creating a new full-ip-cc (carbon copy) property and comparing with the older full-ip property where the same discrepancy, and direction, was observed.

Why is anonymizeIP not ON by default?

A good question, I would prefer it if Google Analytics had done this by default from the begining. However, to anonymise you must activate the feature which can be done in one of the following ways:

  1. If you deploy your GA tracking via GTM, set aip within your GTM tag
  2. If you deploy tracking within your pages:
    • for analytics.js
      ga('set', 'anonymizeIp', true);
    • for gtag.js
      gtag('config', '<GA_TRACKING_ID>', { 'anonymize_ip': true });
  3. If you are using the Measurement Protocol i.e. sending server side data hits, add to the URL:
    aip=1
    Note that any value for aip will set this parameter i.e. just its presence in the data hit will activate the feature. Therefore ensure aip is not present in your URL if you do not wish to use this feature.

IMPORTANT: When you activate anonymizeIP you will also need to edit any View filters that use IP addresses as the partial IP will no longer match. Simply remove the last 3 digits of your ip filter (and always test your changes before deploying on a production setup!)

Credit where credit is due

This research was a collaboration with Huiyan Wan of Conversion Works who did the heavy lifting for this study and produced the full report

BTW, if you are interested in what I am building in this space – an automated GA data auditing tool – visit verified-data.com.

Share Button

Leave a Reply

Your email address will not be published. Required fields are marked *

Anti-spam question (required):

This site uses Akismet to reduce spam. Learn how your comment data is processed.

© Brian Clifton 2018
Best practice privacy statement