Anonymize IP GA Impact Assessment 2.7

Summary

Additional summary analysis by Brian Clifton can be read here.

Overall Discrepancy

Accuracy
Disprepancy by Distance
Continent Level Accuracy Country Level Accuracy City Level Accuracy Avg. Distance (km) Distance within 50km % Distance within 200km %
99.88% 99.60% 75.66% 139 86.95% 93.03%

Most sessions were attributed to the same country/continent after switching on IP anonymisation. However, when it comes to the city level accuracy, 76% of the IP-anonymised visits were attributed to the same city, and 11% of the visits were attributed to the adjacent cities within 50km (About the distance from London to Luton). Therefore, despite the average disprepancy in distance being 139 km apart, 87% of the visits have less than 50km disprepancy (or no disprepancy at all).

You may find the details on the discrepancy distribution plot in the next section.


Discrepancy by Device

Accuracy
Disprepancy by Distance
Device Continent Level Accuracy Country Level Accuracy City Level Accuracy Avg. Distance (km) Distance within 50km % Distance within 200km %
Desktop/Tablet 99.78% 99.36% 72.31% 160 85.54% 92.67%
Mobile 99.95% 99.79% 78.21% 122 88.03% 93.30%

Mobile visits are generally more accurate comparing to desktop/tablet visits, especially at the city level.


Discrepancy by Continent

Accuracy
Disprepancy by Distance
Continent Continent Level Accuracy Country Level Accuracy City Level Accuracy Avg. Distance (km) Distance within 50km % Distance within 200km %
Africa 99.63% 99.63% 76.92% 114 91.95% 94.08%
Americas* 99.93% 99.39% 86.66% 264 92.67% 95.57%
Asia 99.50% 99.12% 75.54% 222 89.92% 92.93%
Eastern Europe 99.93% 99.55% 84.54% 88 89.82% 94.73%
Europe* 99.88% 99.51% 73.78% 124 84.58% 92.67%
Northern America 99.94% 99.83% 75.47% 147 88.56% 92.97%
Oceania 99.90% 99.85% 86.85% 192 88.01% 93.39%

Americas (excluding Northern America), Eastern Europe, and Oceania appear to have smaller discrepency in terms of identifying cities after IP anonymisation. Their city level accuracy are generally higher than other continents by 10%.

Note:

Americas* excludes North America.

Europe* excludes Eastern Europe.

The following section will further breakdown the continents by sub-continents.


Discrepancy by Sub Continent

Accuracy
Disprepancy by Distance
Sub Continent Sub Continent Level Accuracy Country Level Accuracy City Level Accuracy Avg. Distance (km) Distance within 50km % Distance within 200km % Number of Sessions
Africa
Eastern Africa 98.34% 98.34% 90.03% 324 97.91% 98.26% 301
Middle Africa 86.67% 86.67% 70.00% 1221 91.30% 91.30% 30
Northern Africa 96.20% 96.20% 85.17% 291 93.95% 96.37% 263
Southern Africa 99.91% 99.91% 75.94% 93 91.61% 93.83% 7610
Western Africa 95.65% 95.65% 90.43% 262 95.41% 95.41% 115
Americas
Caribbean 99.69% 98.13% 81.31% 342 91.56% 99.35% 321
Central America 99.52% 99.43% 80.65% 551 89.83% 94.09% 6828
Northern America 99.92% 99.83% 75.47% 147 88.56% 92.97% 335661
South America 99.44% 99.40% 89.95% 112 94.12% 96.22% 13010
Asia
Central Asia 98.52% 98.52% 95.83% 224 97.00% 97.14% 743
Eastern Asia 99.78% 99.44% 70.39% 213 88.45% 91.73% 31200
Southeast Asia 99.44% 99.36% 86.69% 167 94.50% 95.72% 7474
Southern Asia 97.38% 97.38% 86.76% 375 90.18% 93.21% 2138
Western Asia 98.16% 97.30% 84.68% 302 91.33% 96.03% 4178
Europe
Eastern Europe 99.61% 99.55% 84.54% 88 89.82% 94.73% 67576
Northern Europe 99.66% 99.45% 73.57% 137 83.43% 92.45% 131890
Southern Europe 99.78% 99.70% 71.12% 208 84.55% 91.99% 58816
Western Europe 99.68% 99.50% 74.53% 97 85.20% 92.95% 245376
Oceania
Australasia 99.90% 99.84% 86.48% 190 87.55% 93.14% 5065
Melanesia 100.00% 100.00% 100.00% 0 100.00% 100.00% 121
Micronesian Region 100.00% 100.00% 86.84% 419 100.00% 100.00% 38
Polynesia 100.00% 100.00% 94.87% 829 100.00% 100.00% 39

As shown, Northern Europe, Southern Europe, Western Europe, Eastern Asia, Northern America and Southern Africa have relatively higher discrepency (<80% accuracy rate) after switching on IP anonymisation.

Please note that some sub-continents like Middle Africa don’t have sufficient number of visits to generate a confident assessment on accuracy. So please take into account the number of sessions shown in the last column and view the results with caution.


Distribution of Discrepancy in Distance


Though the maximum distance can be as large as 18330km, judging by the tiny long tails of the distance distribution (Y-axis are distorted to make the tails more visible), the majority of the visits are being located to the same city, or cities with 50km.


Zoom in on Device & Continent


Comparing to desktop/tablet, IP-anonymised visits from mobile generally have higher city-level accuracy rate across all continents.


Zoom in on Country

City Level Accuracy grouped by Country

EU specific - City Level Accuracy grouped by Country





Zoom in on City


Where did the sessions go?

After IP anonymisation, almost half of the sessions that were originally attributed to Atlanta (US) got allocated to Atlanta anymore. So where did all those sessions go?

Please check the details in here.


Geo-location can’t be identified by Google Analytics (not set)

% of Sessions with Unidentified Location
Geo.Level Full.IP Anonymised.IP Percentage.Change
City 3.62% 4.01% 10.78%
Country 0.10% 0.10% -1.55%

The number of sessions with (not set) value in their city field increased by 10.78% after IP anonymisation. Interestingly, on a country level, the number of country-unknown sessions decreased by 1.55% when switching to use anonymised IP.



Methodology Notes:

  • Website: Collected data is from a European headquarter global brand, categorised in the “sports” vertical. Receiving approximately 3M sessions per month during the period of this study, May/June 2018.
  • Session Data: Collected in duplicate via GTM. That is, one data hit sent to a standard (full-ip) Google Analytics property, and an identical hit sent to an anonymizedIP (aip=1) property of the same GA account.
  • Google Analytics Configuration: All property and view configuration was identical, in particular: “Enable Users Metric in Reporting” (property level); Filters applied and their order (view level); Robots excluded via Google’s auto-blocking method (view level).
  • Anomalies:
    • Loss of Users and Sessions. With anonymizedIP, there was an unexplained small loss of users (-0.10%) and a small rise in session count (+0.07%). This may be attributable to a “rogue” tag in the full-ip property that was not configured in the aip=1 property (the GA account contained 63 GTM tags at the time of the study).
    • Bounce Rate. There was a significant difference in the reported bounce between the full-ip and aip=1 properties, 22.84% versus 35.07% respectively. I attribute this to the impact the setting of “Exclude all hits from known bots and spiders” for each view. That is, although the method employed by Google for achieving this is unknown, it is likely to be a machine learning method where historical data has a significant impact when determining if a session if from a robot or spider. At the time of this research, the full-ip property is 7 years old, while the aip=1 property is less than 1 month old. Hence my hypothesis that the property age is the cause of this. This was verified by creating a new full-ip-cc (carbon copy) property and comparing with the older full-ip property where the same discrepancy, and direction, was observed.
  • Data volume: The test was conducted on 957253 sessions.
  • Discrepancy distance mentioned in this assessment is the distance between cities. So it doesn’t go deeper than the city level (not which street/blocks the visitors actually are).