Google Analytics Data Retention Settings Explained

I must admit that the new Google Analytics documentation about the GDPR specific Data Retention setting is confusing! The whole area is surprisingly badly worded by Google…

The assumption is that ALL data older than NN months is going to be deleted i.e. a cliff-edge effect where any report older than say 26 months is going to be empty. The setting is important, but its effect is not a dramatic as you might first think…

The Reset Setting Explained

In the Google Analytics admin interface, click Tracking Info > Data Retention (see top image). You are then presented with a screenshot similar to this. I have highlighted the 3 key phrases:

Clarifying the analytics data retention settings

Now let’s focus on Reset on new activity.

This little switch means the data retention settings can be set either per data point (OFF), or as a per user (ON) value.

With “Reset on new activity” set to OFF, the retention period is based on the age of the data points. Any data older than the retention period is purged each month. A cliff-edge effect.

However, if “Reset on new activity” set to ON the data retention setting works as if applied on a per user basis. That is, if the user returns before the purge the data retention is renewed as the current_date + retention_period. The data retention period is then applied as a period of inactivity from your visitors. For example, a user returning every month effectively never has their data purged because they keep renewing the lease, so to speak.

Note, data retention settings do NOT affect aggregate reports which are the vast majority of GA reports. That is, unless you are doing very user-specific tracking (why?), you will be hard pushed to notice any reporting impact.

What about GDPR compliance?

Allowing a visitor to be forgotten is an important principal of GDPR compliance. That is, it is not reasonable to hold a user’s tracking information forever, so some retention setting needs to be set – I am against the “do not automatically expire” setting. A value of 26 months seems a reasonable time limit to me as organisations need to understand year-on-year trends and users generally appreciate and expect that level of data retention (assuming all tracking is anonymous and first party).

Bear in mind that the chances of your visitor retaining your original tracking cookie diminish rapidly with time (consider yourself lucky if it is still around 12 months later!).

If a visitor returns to your website on a regular basis I see no reason why retaining their data should be a problem. Hence I recommend Reset on new activity=ON.

(Remember I am not a legal expert.)

Example scenarios with Reset on new activity=ON:

Assuming your settings match my screenshot above, if a visitor does not return to your site within 26 months, their user-specific data is dropped i.e. the cookieID is removed (and some other user-specific identifiers), and their individual click-stream data cannot be stitched back together.

However, the vast majority of Google Analytics reports are preprocessed and therefore do not make use of click-stream data – for reasons of data privacy that date back to to when Urchin/GA was formed. So the impact is minimal.

Conversely, if a visitor was to return to your website in month 25, their data retention setting is updated for another 26 months. So if they continued to return each 25th month, their user-specific data would never be deleted.

BTW, if you are interested in what I am building in this space – a forensic GA data auditing tool with an emphasis on GDPR compliance – visit verified-data.com.

Looking for a keynote speaker, or wish to hire Brian…?

If you are an organisation wishing to hire me and my team, please view the Contact page. I am based in Sweden and advise organisations in Europe as well as North America.

You May Also Like…

26 Comments

  1. Jamie Lum

    Do you know if there is a way to not track data in GA for anyone coming from the EU or purge data for EU traffic? We have about 40 customers in the EU compared to over 40k customers in Australia. EU isn’t an important market for us but short of blocking any EU traffic even reaching our site is there a way we can just purge EU data. We don’t need to track them in GA but we do need to track our Aussie customers.

    Reply
    • Brian Clifton

      Hello Jamie – assuming you use GTM for all your tracking pixels i.e. GA, Facebook, Marketo etc., you essentially need to block triggers firing based on geolocation. Purging will not be compliant. That is, collecting the data first and then deleting later means the problem has already occured. Even if it was a solution, it is very inefficient – you would need to do this for every platform a tracking hit is sent to.

      Have a look at tools such as Cookiebot.com – they can load a banner based on geolocation of your visitors ip address.

      Also, take a look at verified-data.com to understand your current data governance health i.e. its not just about GDPR compliance, customers expect businesses to take their privacy seriously…

      Reply
  2. Nick Guebhard

    If I have a user-level custom dimension, e.g. sending the User-ID to GA as a custom dimension (at hit-level when they are logged in only of course ;-)) or am sending the GA Client ID as a custom dimension, can google identify this to be deleted or not? I guess, it’s easy for them to identify the GA Client ID, but if you are not using the User-ID View, how can they know this is a user-level dimension versus something else?

    I really don’t have an issue with Google deleting whatever they are going to delete, it’s more that I want certainty that they can identify and delete all user level data.

    Also, what about Transaction IDs? Those are user-level too.

    Reply
    • Brian Clifton

      Hello Nick – I can’t answer that specific question, but I am sure they have figured it out 😉

      I would suggest any “custom field” defined either within GA or by the user is going to be dropped after NN months. If you don’t want that, then set to not expire. However, check with your legal advisors about doing that. Essentially the days of never expiring data is over…

      Reply
      • Nick Guebhard

        Expiry of data is a good development, for sure. The concept I struggle with is that this has been brought about by GDPR which is in essence an EU law. There are financial penalties to companies outside the jurisdiction of the EU for non-compliance when they sell from their websites to customers inside the jurisdiction of the EU. I live in NZ and companies that don’t do much business with the EU still have to jump through all the hoops to comply in that market due to a small percentage of customers that are EU, even though they have no physical presence in the EU.

        Of course, the moral argument of data expiry applies globally, but there are also arguments around sovereignty and the fact that those EU citizens are choosing to fulfil their needs from a website in another country. The EU’s penalty enforcement process pulls in International Law and the potential need for a “representative” in the EU if Article 9 is triggered due to the sensitive nature of the data.

        However, the companies I work with that sell to the EU mostly don’t trigger Article 9, but they still feel that they need to comply with GDPR because it’s unclear whether the EU can or will fine them under international law or not. I think it’s unlikely that this will happen for most non-EU websites out there, but there’s so much hysteria around GDPR that everyone feels that they have to bend over backwards to comply if they do any business with EU customers.

        Sorry, this is off-topic from the blog post but triggered by your “the days of never-expiring data are over” response.

        Thanks Brian.

        Reply
        • Brian Clifton

          I understand your frustration Nick, though I would argue that if you wish to benefit from global trading that the internet brings, you need to abide by global rules/standards – and I am pretty sure that GDPR will become a global standard in the not too distant future (when the dust has settled).

          I like to use the following analogy – treat data with the same seriousness and rigour as you treat money/income/revenue. Orgs hire accountants to audit finances and discuss in great deal what to do with it (pay bonuses, pay dividends, save, invest, purchase etc), and now it is expected that orgs will hire data auditors for for analogous discussions around compliance. That is a good thing imo, but it obviously puts a disproportionate strain on small businesses. A good rule to ask yourself: “Do we really need this data to do business?” and drop any techniques for collecting data that have low value to you.

          Remember GDPR are *not* the internet police. Its a legal framework to support existing laws on EU citizens human right to privacy by default. It will be used by complainants to go after people/orgs that abuse that principal i.e. its not intended to go after orgs that take data privacy seriously and have a process to prove that.

          HTH

          Reply
          • Nick Guebhard

            Thanks Brian. I agree with your points, but my point is that GDPR is not a global rule yet, but somehow overseas companies are treating it as such if they have any EU customers, regardless of how harmless the data they capture is. I’m grateful for the additional clarity your blog post has shed on this subject. Bring on 25th of May! I’ll be building some “unusual” custom reports to see what happens!

          • Brian Clifton

            Well GDPR is global in the sense that if your business is capturing data on EU citizens… Essentially if you sell a product into country-X you need to abide by the laws of country-X, not the other way around.

            Big business already has that process in place, this is just tightening the data aspects of it. Its a lot harder for smaller business of course (I am one!), but its still applicable.

            Good luck with your custom reports. Please feel free to share your insights 🙂

  3. Franz

    Hi Brian and thank you for your post!
    I was wondering, does the “Do not automatically expire” option remove all the doubts we have about the data retention period? I mean, If we choose this option, does it not mean that everything remains as it is? Thanks!

    Reply
    • Brian Clifton

      Yes, “Do not automatically expire” keeps things as is. Though check if that is an option for your org. Essentially, the principal of keeping data forever is over in a GDPR world. Its the same principal as applied to a business contract – that is, you cannot sign a person up for life, every contract must have an expiry date/clause.

      Reply
  4. Simon James

    What I’m still not clear on is whether there are any implications/penalties (from Google) for setting the retention period to a higher level, particularly “Do not automatically expire”.

    Reply
    • Brian Clifton

      No implications from Google – it is the responsibility of the website owner (the Controller in GDPR terminology) to ensure their data retention policy meets the applicable laws.

      Bear in mind that GDPR is applicable to the organisation – not any one specific tool or technology.

      Reply
      • Simon James

        Hi Brian

        I’ve heard people voicing concerns on social media that if they set “Do not automatically expire”, the Google gods will look unfavourably on them because G wants use this opportunity to reduce the volume of data currently stored on their servers.
        Do you think it’s just an urban myth then?

        Best
        Simon

        Reply
        • Brian Clifton

          Very much a myth. “Data is the new currency”. Think of it this way – the more data Google collects/stores that you (as the Controller) feels is valuable, the more likely you are to stay with Google and buy further services from them…

          Reply
          • Aurelie Pols

            the Google gods 😉
            data is not a currency Brian, not in Europe: it can’t be

            having said that, totally agree with “the more likely you are to stay with Google and buy further services from them…”

          • Brian Clifton

            Just a metaphor e.g. in a similar vein, Twitter has never made a profit, but its still valuable…

  5. Sam Vining

    Hi Brian, thanks for writing this post! I think the biggest issue is the inability to use custom dimensions or segments with any reports that contain data past the retention limit, which I imagine will be a significant limitation for many users.

    Agree that it’s not the end-of-the-world, but think people should be aware of the potential to lose functionality.

    Reply
    • Brian Clifton

      I think most segmenting will work – of course it depends, but segments in general will still work with older data e.g. show me a report for EU visitors only is still an aggregate report at the end of the day.

      Reply
      • Sam Vining

        Maybe I’m interpreting the definition too literally, but my understanding is that segments / custom dimensions will not work at all. From the support page: “The user and event data managed by this setting is needed only when you use certain advanced features like applying custom segments to reports or creating unusual custom reports.”

        I guess we’ll find out what “unusual” means on the 25th…!

        Reply
        • Steven

          This is the biggest thing. I’ve heard multiple people give their interpretation differently, but nobody truly knows without Google telling or waiting until that time.

          Reply
        • Brian Clifton

          Segments will still work – just not all segments

          Reply
          • Simo Ahava

            My interpretation is that if all user and event data is deleted for a specific user, then that user’s data will not be available to ANY ad hoc calculations such as segments, custom reports, etc.

            So a segment for EU users would not include deleted users’ data because segment queries are not precalculated or aggregated.

          • Brian Clifton

            Thanks for the input Simo – my EU country example could well end up not working… My thought was that not all segments are built dynamically. So if ‘sessions by country’ is a pre-aggregated table, then perhaps combining multiple country tables does not require a new ad-hoc query to the backend…

          • Simo Ahava

            Brian, you are actually probably correct (I guess we both are). IF GA has an aggregated data set that matches the segment + table combination you are querying, the report will contain all available data.

            So in some tables, having the EU segment might work, if it’s a data set that’s already available via an aggregated report. But I don’t know what that type of a data set would be. I tested around in a couple of reports having Country === Finland as the session segment, and all the reports were sampled.

          • Brian Clifton

            This help article may shed some light: https://support.google.com/analytics/answer/2637192?hl=en

            Ad-hoc reports
            “Analytics first goes to the aggregated data tables to see if all of the requested information from your ad-hoc query is available there. If the information is not available there, Analytics queries the complete, unfiltered set of data to satisfy the query request.”

Leave a Reply to Nick Guebhard Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share This