Noise or Music?

Clarifying the Analytics Data Retention Settings

Categories: GDPR & Privacy / Comments: 24

Share Button

I must admit that the new Google Analytics documentation about the GDPR specific Data Retention setting is confusing! The assumption from many clients is that ALL data older than NN months is going to be deleted i.e. a cliff-edge effect where any report older than say 26 months is going to be empty. The setting is important, but its effect is not a dramatic as you might first think…

For the below screenshot, taken from the Admin area of Google Analytics, I have highlighted the 3 key phrases to focus on:

Clarifying the analytics data retention settings

The Settings Clarified

1. Data retention is a “per user” value defined as a period of inactivity from your visitors. Data that is inactive will be deleted after a certain time period has elapsed.

2. These settings do NOT effect aggregate reports which are the vast majority of GA reports. That is, unless you are doing very user-specific tracking (why?), you will be hard pushed to notice any reporting impact.

3.  “Reset on new activity” means that the data retention period is reset if your visitor returns. Think of reset as renewed.

Example scenario:

If a visitor does not return to your site for 26 months, their user-specific data is dropped i.e. the cookieID is removed (and some other user-specific identifiers), and so their individual click-stream data cannot be stitched back together.

However, as per point 2, the vast majority of Google Analytics reports are preprocessed and therefore do not make use of click-stream data – for reasons of data privacy that date back to to when Urchin/GA was formed.

Conversely, if a visitor was to return to your website every month, their data retention setting is updated for another 26 months. So if they continued to return each month, their user-specific data would never be deleted.

Please let me know if this has clarified i.e. put your mind to rest, the issue for you…

BTW, if you are interested in what I am building in this space – an automated GA data auditing tool – visit verified-data.com.

Share Button

Comments (most recent first)

  1. If I have a user-level custom dimension, e.g. sending the User-ID to GA as a custom dimension (at hit-level when they are logged in only of course ;-)) or am sending the GA Client ID as a custom dimension, can google identify this to be deleted or not? I guess, it’s easy for them to identify the GA Client ID, but if you are not using the User-ID View, how can they know this is a user-level dimension versus something else?

    I really don’t have an issue with Google deleting whatever they are going to delete, it’s more that I want certainty that they can identify and delete all user level data.

    Also, what about Transaction IDs? Those are user-level too.

    • Hello Nick – I can’t answer that specific question, but I am sure they have figured it out 😉

      I would suggest any “custom field” defined either within GA or by the user is going to be dropped after NN months. If you don’t want that, then set to not expire. However, check with your legal advisors about doing that. Essentially the days of never expiring data is over…

      • Expiry of data is a good development, for sure. The concept I struggle with is that this has been brought about by GDPR which is in essence an EU law. There are financial penalties to companies outside the jurisdiction of the EU for non-compliance when they sell from their websites to customers inside the jurisdiction of the EU. I live in NZ and companies that don’t do much business with the EU still have to jump through all the hoops to comply in that market due to a small percentage of customers that are EU, even though they have no physical presence in the EU.

        Of course, the moral argument of data expiry applies globally, but there are also arguments around sovereignty and the fact that those EU citizens are choosing to fulfil their needs from a website in another country. The EU’s penalty enforcement process pulls in International Law and the potential need for a “representative” in the EU if Article 9 is triggered due to the sensitive nature of the data.

        However, the companies I work with that sell to the EU mostly don’t trigger Article 9, but they still feel that they need to comply with GDPR because it’s unclear whether the EU can or will fine them under international law or not. I think it’s unlikely that this will happen for most non-EU websites out there, but there’s so much hysteria around GDPR that everyone feels that they have to bend over backwards to comply if they do any business with EU customers.

        Sorry, this is off-topic from the blog post but triggered by your “the days of never-expiring data are over” response.

        Thanks Brian.

        • I understand your frustration Nick, though I would argue that if you wish to benefit from global trading that the internet brings, you need to abide by global rules/standards – and I am pretty sure that GDPR will become a global standard in the not too distant future (when the dust has settled).

          I like to use the following analogy – treat data with the same seriousness and rigour as you treat money/income/revenue. Orgs hire accountants to audit finances and discuss in great deal what to do with it (pay bonuses, pay dividends, save, invest, purchase etc), and now it is expected that orgs will hire data auditors for for analogous discussions around compliance. That is a good thing imo, but it obviously puts a disproportionate strain on small businesses. A good rule to ask yourself: “Do we really need this data to do business?” and drop any techniques for collecting data that have low value to you.

          Remember GDPR are *not* the internet police. Its a legal framework to support existing laws on EU citizens human right to privacy by default. It will be used by complainants to go after people/orgs that abuse that principal i.e. its not intended to go after orgs that take data privacy seriously and have a process to prove that.

          HTH

          • Thanks Brian. I agree with your points, but my point is that GDPR is not a global rule yet, but somehow overseas companies are treating it as such if they have any EU customers, regardless of how harmless the data they capture is. I’m grateful for the additional clarity your blog post has shed on this subject. Bring on 25th of May! I’ll be building some “unusual” custom reports to see what happens!

          • Well GDPR is global in the sense that if your business is capturing data on EU citizens… Essentially if you sell a product into country-X you need to abide by the laws of country-X, not the other way around.

            Big business already has that process in place, this is just tightening the data aspects of it. Its a lot harder for smaller business of course (I am one!), but its still applicable.

            Good luck with your custom reports. Please feel free to share your insights 🙂

  2. Franz says:

    Hi Brian and thank you for your post!
    I was wondering, does the “Do not automatically expire” option remove all the doubts we have about the data retention period? I mean, If we choose this option, does it not mean that everything remains as it is? Thanks!

    • Yes, “Do not automatically expire” keeps things as is. Though check if that is an option for your org. Essentially, the principal of keeping data forever is over in a GDPR world. Its the same principal as applied to a business contract – that is, you cannot sign a person up for life, every contract must have an expiry date/clause.

  3. Simon James says:

    What I’m still not clear on is whether there are any implications/penalties (from Google) for setting the retention period to a higher level, particularly “Do not automatically expire”.

    • No implications from Google – it is the responsibility of the website owner (the Controller in GDPR terminology) to ensure their data retention policy meets the applicable laws.

      Bear in mind that GDPR is applicable to the organisation – not any one specific tool or technology.

      • Simon James says:

        Hi Brian

        I’ve heard people voicing concerns on social media that if they set “Do not automatically expire”, the Google gods will look unfavourably on them because G wants use this opportunity to reduce the volume of data currently stored on their servers.
        Do you think it’s just an urban myth then?

        Best
        Simon

  4. Sam Vining says:

    Hi Brian, thanks for writing this post! I think the biggest issue is the inability to use custom dimensions or segments with any reports that contain data past the retention limit, which I imagine will be a significant limitation for many users.

    Agree that it’s not the end-of-the-world, but think people should be aware of the potential to lose functionality.

    • I think most segmenting will work – of course it depends, but segments in general will still work with older data e.g. show me a report for EU visitors only is still an aggregate report at the end of the day.

      • Sam Vining says:

        Maybe I’m interpreting the definition too literally, but my understanding is that segments / custom dimensions will not work at all. From the support page: “The user and event data managed by this setting is needed only when you use certain advanced features like applying custom segments to reports or creating unusual custom reports.”

        I guess we’ll find out what “unusual” means on the 25th…!

        • Steven says:

          This is the biggest thing. I’ve heard multiple people give their interpretation differently, but nobody truly knows without Google telling or waiting until that time.

        • Segments will still work – just not all segments

          • Simo Ahava says:

            My interpretation is that if all user and event data is deleted for a specific user, then that user’s data will not be available to ANY ad hoc calculations such as segments, custom reports, etc.

            So a segment for EU users would not include deleted users’ data because segment queries are not precalculated or aggregated.

          • Thanks for the input Simo – my EU country example could well end up not working… My thought was that not all segments are built dynamically. So if ‘sessions by country’ is a pre-aggregated table, then perhaps combining multiple country tables does not require a new ad-hoc query to the backend…

          • Simo Ahava says:

            Brian, you are actually probably correct (I guess we both are). IF GA has an aggregated data set that matches the segment + table combination you are querying, the report will contain all available data.

            So in some tables, having the EU segment might work, if it’s a data set that’s already available via an aggregated report. But I don’t know what that type of a data set would be. I tested around in a couple of reports having Country === Finland as the session segment, and all the reports were sampled.

          • This help article may shed some light: https://support.google.com/analytics/answer/2637192?hl=en

            Ad-hoc reports
            “Analytics first goes to the aggregated data tables to see if all of the requested information from your ad-hoc query is available there. If the information is not available there, Analytics queries the complete, unfiltered set of data to satisfy the query request.”

Leave a Reply

Your email address will not be published. Required fields are marked *

Anti-spam question (required):

This site uses Akismet to reduce spam. Learn how your comment data is processed.

© Brian Clifton 2018
Best practice privacy statement