Google Analytics Anonymize IP – An Impact Study

Anonymize IP is the Google Analytics feature used to mask IP addresses of collected data. Universal Analytics collects and stores visitor IP addresses by default. These are used to provide geolocation information about your visitors – such as which city, country, continent they connect from. In the new GA4 however, no IP addresses are stored – in fact it is no longer possible to do so. That’s a big change in approach. In this post, I investigate what impact that will have on your data.

Universal versus GA4 IP addresses

If you are using Universal Analytics, a key requirement for GDPR compliance is to use the Google Analytics feature “anonymize IP”, also known as “aip”. As the name suggests, this simple switch drops the last 3 digits from your visitor’s IP address. For example, if a visitor has a public ip of 215.113.42.184, then Google will obfuscate this to 217.115.42.0 (and all other visitor ip addresses) when it processes your Google Analytics data. Hence the full ip address is not stored. That’s good for GDPR compliance because ip data is considered PII in Europe.

On the other hand, Google Analytics 4 does not store IP addresses. It drops any IP addresses that it collects from EU users before logging that data via EU domains and servers. GA4 does still provide coarse geolocation data – essentially using the same anonymise IP method. Reference: Google Help Centre Doc.

Note that IP address anonymization is not a panacea switch to make your website or app compliant with EU privacy laws. Rather, it’s a useful feature of Google Analytics to have for extra assurance (in a different post I explain What Is PII versus Personal Data).

Anonymize IP – An Impact Assessment

Within Google Analytics, ip addresses are used to determine the geolocation of a visitor i.e. what continent, country, city they connected from. That’s useful information in terms of assessing your global reach, content requirements, and for marketing purposes – for example, measuring the impact of specific geolocation campaigns. Therefore not having an ip address stored for your visitors is a big deal.

So if the last 3 digits of an IP address are removed, how does that impact geolocation data?

There is an impact of course, but how much and should you worry about it? To establish that we conducted a detailed study – one were a website with a global reach sent data to two identically set up Universal Analytics web properties:

  • One with the standard full ip address (aip=off), and
  • One where anonymizeIP is set (aip=on).

We then compared the datasets in BigQuery to measure the geolocation difference. This was originally done on a small scale (2,267 sessions) by Huiyan Wan of Media Monks, based on a UK centric website audience. I therefore wanted to work with Huiyan to replicate and verify her results using a much larger dataset of 3 million sessions and one with a much broader reach of international visitors.

Summary of Results

The full study has a lot of details and you can read it at Anonymize IP Analytics Impact Assessment. It was conducted for during the month of May 2018. Below is my summary of the findings.

At country level, there is no noticeable difference in accuracy. However, if you are reliant on 80%+ accuracy at city level, then the anonimize IP method may impact your performance. Click the chart to interact with the data.

map_ip_test_eu_2

Overall Discrepancy

    • The difference at continent and country level is negligible, with aip=on showing 99.5%+ the same results as with the full-ip is used. Hence, this remaining summary is based only on city differences.
    • Differences become significant at city level with an average of 76.7% accuracy. However allowing for a 50Km margin, the accuracy improves to 87.0%. Allowing for a 200Km margin the accuracy further increases to 93.0%

Discrepancy by Device (mobile v desktop)

    • The same trend is observed for mobile and desktop traffic, with mobile devices showing a 5.9% increase in city accuracy. This accuracy reduces in magnitude as the margin is increased to 50 and 200Km.

Discrepancy by Sub-Continent

  • On the whole, the less economically developed sub-continents of the world show better city level accuracy than the more advanced economies. For example, in the chart above, eastern Europe is darker (more accurate) than other parts of Europe. The same story is observed for other regions of the world.
  • Although at first surprising, I attribute this to early adopter countries – that is, early internet access adopters such as the US, UK, EU – giving little attention to the geographic governance of IP location information in the early days of internet growth. Whereas, secondary adopters learnt from the mistakes of the pioneering countries and implemented better record keeping of IP geolocations. In addition, less developed economies are likely to have a higher reliance on mobile devices for internet access, which are more geo accurate.

Methodology Notes

Website: Collected data is from a European headquarter global brand, categorised in the “sports” vertical. Receiving approximately 3M sessions per month during the period of this study, May/June 2018.

Session Data: Collected in duplicate via GTM. That is, one data hit sent to a standard (full-ip) Google Analytics property, and an identical hit sent to an anonymizedIP (aip=1) property of the same GA account.

Google Analytics Configuration: All property and view configuration was identical, in particular: “Enable Users Metric in Reporting” (property level); Filters applied and their order (view level); Robots excluded via Google’s auto-blocking method (view level).

Anomalies:

  • Loss of Users and Sessions. With anonymize IP, there was an unexplained small loss of users (-0.10%) and a small rise in session count (+0.07%). This may be attributable to a “rogue” tag in the full-ip property that was not configured in the aip=1 property (the GA account contained 63 GTM tags at the time of the study).
  • Bounce Rate. There was a significant difference in the reported bounce rate between the full-ip and aip=1 properties, 22.84% versus 35.07% respectively. I attribute this to the impact the setting of “Exclude all hits from known bots and spiders” for each view. That is, although the method employed by Google for achieving this is unknown, it is likely to be a machine learning method where historical data has a significant impact when determining if a session if from a robot or spider. At the time of this research, the full-ip property was 7 years old, while the aip=1 property is less than 1 month old. Hence my hypothesis that the property age is the cause of this. This view was supported by creating a new full-ip-cc (carbon copy) property and comparing with the older full-ip property. The same bounce rate discrepancy, and direction, was observed.

Why is anonymize IP not ON by default?

A good question, I would prefer it if Google Analytics had done this by default from the beginning. However, it appears Google has finally got the message about the importance of end-user privacy, and made this the default and only option for GA4.

However, if you are using Universal Analytics you must activate anonymise IP to be privacy compliant with GDPR. This can be done in one of the following ways:

  1. If you deploy your GA tracking via GTM, set aip within your GTM tag
  2. If you deploy tracking within your pages:
    • for analytics.js ga('set', 'anonymizeIp', true);
    • for gtag.js gtag('config', '', { 'anonymize_ip': true });
  3. If you are using the Measurement Protocol i.e. sending server side data hits, add to the URL: aip=1 Note that any value for aip will set this parameter i.e. just its presence in the data hit will activate the feature. Therefore ensure aip is not present in your URL if you do not wish to use this feature.

IMPORTANT: When you activate anonymize IP you will also need to edit any View filters that use IP addresses as the partial IP will no longer match. Simply remove the last 3 digits of your ip filter (and always test your changes before deploying on a production setup!)

Credit where credit is due

This research was a collaboration with Huiyan Wan of Media Monks who did the heavy lifting for this study and produced the full report.

BTW, if you are interested in what I am building in this space – a forensic GA data auditing tool with an emphasis on GDPR compliance – visit verified-data.com.

Looking for a keynote speaker, or wish to hire Brian…?

If you are an organisation wishing to hire me and my team, please view the Contact page. I am based in Sweden and advise organisations in Europe as well as North America.

You May Also Like…

Sayonara Universal Analytics

Sayonara Universal Analytics

My first Google Analytics data point was 15th May 2005 for UA-20024. If you are of a certain age, that may sound off...

4 Comments

  1. Julian Erbsloeh

    Hi Brian,
    There was a great but sadly un-answered question in the comments of Huiyan’s post – does anonymizing IPs via GTM in this way impact ISP data and therefore ISP level view filters? I notice an abnormally high number of (not set) ISPs in properties where IP anonymisation is enabled but have not properly looked into this.
    Do you see a higher % of ISP (not set) in your anonymised test property for example?
    Regards,
    Julian

    Reply
    • Brian Clifton

      Hello Julian – I have not seen any change in the Service Provider report of clients I have implemented this on. I would expect this type of thing (i.e. reverse IP lookup), to be done while GA is processing the data stream. That is, not impacted by anonymize IP which happens after processing, but before storage.

      One thing to note is that Google does occasionally/rarely have to reprocess its stored data – to fix a bug for example. If that happens, then I would expect a spike for “not set” for the reprocessed period, as the full IP address was not stored and hence the lookup will not work.

      Reply
  2. Rémi

    I’ve read both your articles, this one and the one linked in the first paragraph, but I still have a question regarding anonymization.

    You say IP anonymization brings “extra assurance”. But do I need that assurance of I got explicit consent through a popup advice stating the analytics nature of the processing.

    What would be the impact on my GDPR compliance if I keep the IP on ?

    Reply
    • Brian Clifton

      Hello Remi – if you are certain *all* visits have explicit consent then I agree IP anonymize is not required. But how are you monitoring this?

      Also, do you really need the geographic info to be measured at such a high fidelity?

      I simply suggest a cautionary approach. Better to be safe than sorry with GDPR…

      Reply

Submit a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share This