I must admit that the new Google Analytics documentation about the GDPR specific Data Retention setting is confusing! The whole area is surprisingly badly worded by Google…
The assumption is that ALL data older than NN months is going to be deleted i.e. a cliff-edge effect where any report older than say 26 months is going to be empty. The setting is important, but its effect is not a dramatic as you might first think…
The Reset Setting Explained
In the Google Analytics admin interface, click Tracking Info > Data Retention (see top image). You are then presented with a screenshot similar to this. I have highlighted the 3 key phrases:
Now let’s focus on Reset on new activity.
This little switch means the data retention settings can be set either per data point (OFF), or as a per user (ON) value.
With “Reset on new activity” set to OFF, the retention period is based on the age of the data points. Any data older than the retention period is purged each month. A cliff-edge effect.
However, if “Reset on new activity” set to ON the data retention setting works as if applied on a per user basis. That is, if the user returns before the purge the data retention is renewed as the current_date + retention_period. The data retention period is then applied as a period of inactivity from your visitors. For example, a user returning every month effectively never has their data purged because they keep renewing the lease, so to speak.
Note, data retention settings do NOT affect aggregate reports which are the vast majority of GA reports. That is, unless you are doing very user-specific tracking (why?), you will be hard pushed to notice any reporting impact.
What about GDPR compliance?
Allowing a visitor to be forgotten is an important principal of GDPR compliance. That is, it is not reasonable to hold a user’s tracking information forever, so some retention setting needs to be set – I am against the “do not automatically expire” setting. A value of 26 months seems a reasonable time limit to me as organisations need to understand year-on-year trends and users generally appreciate and expect that level of data retention (assuming all tracking is anonymous and first party).
Bear in mind that the chances of your visitor retaining your original tracking cookie diminish rapidly with time (consider yourself lucky if it is still around 12 months later!).
If a visitor returns to your website on a regular basis I see no reason why retaining their data should be a problem. Hence I recommend Reset on new activity=ON.
(Remember I am not a legal expert.)
Example scenarios with Reset on new activity=ON:
Assuming your settings match my screenshot above, if a visitor does not return to your site within 26 months, their user-specific data is dropped i.e. the cookieID is removed (and some other user-specific identifiers), and their individual click-stream data cannot be stitched back together.
However, the vast majority of Google Analytics reports are preprocessed and therefore do not make use of click-stream data – for reasons of data privacy that date back to to when Urchin/GA was formed. So the impact is minimal.
Conversely, if a visitor was to return to your website in month 25, their data retention setting is updated for another 26 months. So if they continued to return each 25th month, their user-specific data would never be deleted.
BTW, if you are interested in what I am building in this space – a forensic GA data auditing tool with an emphasis on GDPR compliance – visit verified-data.com.