Noise or Music?

Remove PII from Google Analytics – The Smart Way

Categories: GA & GTM, GDPR & Privacy / Comments: 67

Share Button

This is my extension to the GTM Tips post by the excellent Simo Ahava (his post: Remove PII From Google Analytics Hits).

Essentially, I had been looking for a way to block Personally Identifiable Information (PII) hits at the collection level i.e. using GTM, before the hit is sent to Google Analytics. Why do this? Putting the obvious requirement to not gather PII to one side, if you are adding filters to your GA Views in order to delete PII it is too late – the problem has already occurred. That is, if you have already sent the personal data to Google, then you have already broken the GDPR compliance!

Previously, by using GTM I would simply drop any hits containing page URLs with an @ symbol i.e. in case the URL contained an email address. Apart from being quite blunt (not all URLs with an @ symbol contain an email address), this approach would not tackle email addresses being present in other hit types e.g. events, e-commerce data etc. It also did not tackle other PII types – such as telephone numbers, zip codes, usernames etc. Hence, the much better approach of Simo’s method – using GTM’s new customTask feature – was very interesting to me!

In this post, I extend his method by building out the regex more – for a more sophisticated email detection, and to capture other PII types…

Redact, rather than remove PII

The important thing here is to remember we are redacting the PII – not blocking or removing it. This is an important distinction. If PII is present, it is almost certain that the same PII is being logged elsewhere on your network – your web server logfile at the very least. Reporting this in your Google Analytics in redacted form means you have a monitoring system to flag to your web dev/IT team in order to fix and keep on top of. Essentially, to be compliant, PII issues need to be fixed at their source by your organisation. Alternatively, if you deleted the PII data from your reports is simply stopped collecting it in GA, you would metaphorically be sweeping the problem under the carpet.

Here is my adjusted code for your Custom JavaScript variable.

IMPORTANT: This is a straight replacement to Simo’s code. Replace domain\.com with the domain of your website (lines 7 and 11). More on what this is for later. Thank you to the excellent David Vallejo for his JavaScript help – my skills are simply too rusty nowadays! As always, when working with code it’s up to you to test it and ensure it works correctly. No liability accepted!

UPDATE: This code was rewritten 29-Aug-2018 for better handling of the GA hit. In particular, it now works with GTM’s native YouTube trigger.  Simply swap out the original code for this new one.

function() {
  return function(model) {
    try{
      // Add the PII patterns into this array as objects
      var piiRegex = [{
        name: 'EMAIL',
        regex: /[^\/]{4}(@|%40)(?!example\.com)[^\/]{4}/gi,
        group: ''
      },{
      name: 'SELF-EMAIL',
        regex: /[^\/]{4}(@|%40)(?=example\.com)[^\/]{4}/gi,
        group: ''
      },{
        name: 'TEL',
        regex: /((tel=)|(telephone=)|(phone=)|(mobile=)|(mob=))[\d\+\s][^&\/\?]+/gi,
        group: '$1'
      },{
        name: 'NAME',
        regex: /((firstname=)|(lastname=)|(surname=))[^&\/\?]+/gi,
        group: '$1'     
      },{
        name: 'PASSWORD',
        regex: /((password=)|(passwd=)|(pass=))[^&\/\?]+/gi,
        group: '$1'
      },{
        name: 'ZIP',
        regex: /((postcode=)|(zipcode=)|(zip=))[^&\/\?]+/gi,
        group: '$1'
      }

    ];		    
      // Fetch reference to the original sendHitTask
      var originalSendTask = model.get('sendHitTask');
      var i, hitPayload, data, val;


      model.set('sendHitTask', function(sendModel) {
          hitPayload = model.get('hitPayload');	
          //  Let's convert the current querystring into a key,value object
          data = (hitPayload).replace(/(^\?)/,'').split("&").map(function(n){return n = n.split("="),this[n[0]] = n[1],this}.bind({}))[0];
		  //  We'll be looping thu all key and values now
          for(var key in data){

              // Let's have the value decoded before matching it against our array of regexes
              var val = decodeURIComponent(data[key]);			        
              piiRegex.forEach(function(pii) {		
                // The value is matching?
                if(val.match(pii.regex)){
                  // Let's replace the key value based on the regex and let's reencode the value
                  data[key] = encodeURIComponent(val.replace(pii.regex, pii.group + '[REDACTED ' + pii.name + ']'));			          
                }                        
              });  
            			    
          }        
          // Going back to roots, convert our data object into a querystring again =)    
          sendModel.set('hitPayload', Object.keys(data).map(function(key) { return (key) + '=' + (data[key]); }).join('&'), true);
          // Set the value
          originalSendTask(sendModel);
      });    
    }catch(e){}
  };
}

Edit Your Tags

In order to function as intended, the customTask field needs to be added to ALL Google Analytics tags. That of course is cumbersome and does not scale with the volume of tags used. Therefore it is much better to apply this as a one-time fix in a Google Analytics settings variable. You can read more about the power of the Universal Analytics settings variable approach from Simo.

 

Now any hits sent by these tags will be parsed by this variable, which replaces the instances of PII with the string [REDACTED pii_type]. For example, a URL with path:

/test?tel=+44012345678&email=brian@me.com&other=bclifton@DOMAIN.com&firstName=brian&password=hello

would be replaced with:

/test?tel=[REDACTED TELEPHONE]&email=b[REDACTED EMAIL]om&other=bcli[REDACTED SELF-EMAIL]OMAIN.com&firstName=[REDACTED NAME]&password=[REDACTED PASSWORD]

The Regex Changes Explained

-Extending the Email regex

For the EMAIL check, I make two changes to Simo’s original regex:

regex: /[^\/]{4}@(?!domain\.com)[^\/]{4}/gi,

Firstly, this matches any character that is not a forward slash / 4 times, followed by @. Then, so long as this is not followed by domain.com, it matches the next 4 characters which are not a forward slash.

So apart from looking for an email address, I am doing two extra things:

1. I exclude any “innocent” links that may be captured as outbound links containing an @. Common examples are Google Maps and Flickr links, which contain a forward slash – the [^\/] part. Example links:

  • www.google.com/maps/place/University+of+San+Francisco+-+Folger+Bldg,+101+Howard+St,+San+Francisco,+CA+94105/@37.7908871,-122.3925594,17z/data=!3m1!
  • www.flickr.com/photos/123456@N06/sets/721576344/Other PII data types

2. I exclude the domain of the website itself from this check using a negative look ahead – the (?!….) part. Remember to replace domain\.com with your own domain e.g. brianclifton\.com in my case. I match for this separately next.

My suggestion for a separate regex is to catch and redact any payloads containing the SAME email domain as the site itself, with a different “name” value to the regular email redaction. That way such emails will be reported differently in Google Analytics, allowing the site owner to ignore these and monitor real PII infringements.

For example:

  • If a visitor comes to my site and I capture their email address as simo@hissite.com, that is redaction_message [REDACTED EMAIL]
  • If a visitor comes to my site and I capture my own email address as an outbound click-through to the site owner e.g. mysite@brianclifton.com, that is redaction_message [REDACTED SELF-EMAIL]

As the site owner, the first message is the one I should be paying attention to. The second message (not really PII as it belongs to the site owner) keeps me compliant with Google’s terms of service.

For the SELF-EMAIL check, the regex is almost identical:

regex: /[^\/]{4}@(?=domain\.com)[^\/]{4}/gi,

The difference now is that I do wish to include my own domain in the match and this is achieved via a positive look ahead – the (?=….) part.

-Extending the regex to capture other PII

The original post by Simo was a simple pattern match – easy to use and maintain when you know the structure of the match you are looking for e.g. an @ symbol to match email addresses, or a well structured set of characters and numbers for strings like personal ID and social security numbers. However, I want to extend this to match less structured PII, for example people’s names, addresses, telephone numbers, zip codes etc.

To do this, we need a regex anchor. That is, a common string likely to contain such PII. I am assuming all such matches are contained within URL strings as query parameters (though name=value pairs in the URL path are also matched) e.g.

/test?tel=+46(0)12398765&firstname=Brian&zip=abc123

The anchor is the query name and we match for common PII culprits – these are tel, firstname and zip in my example. Of course these should be adjusted for your particular language. Anchors are the reason why the group key is required:

name: 'ZIP',
regex: /((postcode=)|(zipcode=)|(zip=))[^\/\?&]+/gi,
group: '$1'

In this case, $1 is the value of the string (our anchor) just before and including the = sign. We keep this in place for the data hit, and redact what follows. Without applying the grouping, the entire name=value pair would be redacted making troubleshooting difficult. I use [^&\/\?] in order to conclude the match within paths, or query parameters…

Happy compliance testing 🙂

BTW, you do you know I am building a data auditing and compliance tool to measure and monitor Google Analytics data quality, right?

Share Button

Comments (most recent first)

  1. Nathan says:

    Hi,

    Am currently trying to achieve something similar but more comprehensive. Originally I was trying to come up with a way to change the “Exclude Query Parameters” function to “Include Query Parameters”.

    For this, I arrived at a filter set.

    1. Change the query prefix of things you want e.g. ?q= or &q= too some different symbol (e.g. ~~~)
    2. Apply a blanket filter for any item ? or & to be removed completely (replace with blank)
    3. Change anything with ~~~ back to a query string form

    Really annoying, but theoretically sound. I was about to test this though, when I realised there will be a difference between human or tech error (e.g. when your email platform accidentally sends a name etc) and malicious injection (e.g. I type exampledomain.com/personsname/email/phone) which results in data being collected that is personal.

    Wondering if there is a combination of the two needed. Or do we all agree that we’ll never be able to protect against external bad actors?

    Nathan

    • The thing is… Anything done within GA is too late. The PII has already been collected and so adding filters post collection is only really hiding the issue.

      • Nathan says:

        Ok, but if you miss a piece of PII through this method, since you’re essentially manually declaring it then you are also in contravention of PII. Further, under GDPR it goes beyond PII to personal data, so IPs, pseudonymous identifiers (e.g. within email etc).

        Not criticising either approach, just wondering how we work to a solution. Seems like something Google needs to get involved in.

  2. Billy H says:

    Hi Brian,

    Thanks for this in depth explanation, it was exactly what I needed. However, I ran into an issue with the provided javascript. I’m unsure if my issue was isolated to my site or if it has something to do with a recent browser standard change or similar issue (I’m using Chrome), but I was able to find a solution and thought I’d share it here in case anyone else runs into something similar.

    The code, as provided, was only redacting the last entry of the PII Regex list, so only the ZIP was being redacted. After spending a bit of time in GTM’s preview mode and a lot of uses of console.log(), I realized that the page URI’s encoding within the ‘hitpayload’ was preventing it from being split using “&” as the delimiter (line 40). Because of that, the entire URI was being tested as a single Key Value, and line 50 was editing the original Key Value with every iteration, overwriting previous changes.

    The simplest solution I found was to add “val = decodeURIComponent(data[key]);” after line 46, so that the loop would be working with the updated value after each iteration, thereby maintaining previous changes.

    Still not sure if this was an edge case situation, but perhaps I’ll save someone some troubleshooting.

    Thanks!

  3. Nice article.

    I have 1 question. Lets say my regex is

    regex: /((id=)|(cusid=))[^&\/\?]+/gi,

    This matches:
    all that ends with id= like:
    testid=
    tokenid=

    I want it to only match id= and cusid=
    What do i need to change to get that to work?

    • Your first check is causing you this issue: id=

      Let us assume you are referring to query params to match against, in which case you can check as your first match: &|\?id=

      HOWEVER, remember the customTask is matching against the entire GA data hit. Therefore be care in what you match for. For example, id= would match UA-ID= and that could prevent your data even reaching your GA account…

  4. Stu Bowker says:

    Hi Brian, love this but I’ve found the PII custom variable prevents some tags from being sent to GA despite being triggered correctly. The GA code would get so far, then in the console we’d see ‘[Violation] ‘setTimeout’ handler took 105ms’.

    This happened for YouTube and Scroll events using GTMs built in functionality. Both were based on users reaching a certain percentage. These events incurred the error. However, the YouTube start and pause events worked fine.

    I’ve added a temporary fix for the PII variable to only work if the event = gtm.js. Can you recommend something more robust in case there’s events that include PII.

    • Hello Stu – are you using the latest version? I updated it a week or so ago so that YouTube can be tracked. It is working for me, but no guarantees

      • Stu Bowker says:

        Hi Brian,
        Thanks for the update. It works great for YouTube now, but there’s still something not right as it’s not allowing Scroll Depth events to be sent 🙁

        Any ideas?

        • Scroll depth – yuck! Why would anyone want to track that?

          Seriously, the script is something I developed for myself and share with others freely as-is. It isn’t something I support.

        • Hi Stu, without further details it’s hard to know what’s going on. Please change the line 60 from:

          }catch(e){}
          to
          }catch(e){ console.log(“PII SCRIPT ERROR”, e); }

          And let us know what’s the error being thrown to the console.

          Another helpful details would be knowing which category/action/label values are you using for your events in order to be able to replicate your setup

  5. Novak Mirkovic says:

    I can see [REDACTED EMAIL] in the real-time report, however, when I try to look back and perform content drill-down all those visits are missing. Most likely I am missing something. Looking forward to your input. Thank you!

    • Do you literally mean the “Site Content/Content Drilldown” report in GA? If so, that is only the performance of directories. Page performance is one-up on the side menu: All Pages. That is where you will find the redacted URLs and Page Titles. Note, the technique redacts ALL of the data hit. So it redacts where ever the issue is (event, custom dimension, e-commerce fields etc.)

  6. Peter says:

    Hi Brian,

    When implementing the script above I encountered an issue on my site today. May you can give me your thoughts. For some reason, the beacon is not submitted to Google anymore on “/” and “/de/”. As soon as I reset the customTask (remove) things work smoothly again.

    Do you have any idea, why this might happen?
    p.s.: have the script implemented on another site with same CMS where I don’t see this odd behaviour.

    Thanks in advance

  7. Christina says:

    Hi Brian,

    I really love your modified version of Simo’s script.
    One question – Since you are specifying the website domain. How would this work on a rollup account that holds data for multiple domains?

    Thanks much!
    Christina

  8. Bryan says:

    Hi Brian,

    I am currently using the “Exclude URL Query Parameters” option in my GA view settings to remove PII from GA (not a filter). Will this prevent PII from getting recorded on GA’s servers?

    • That will work, BUT that removes the important signal that something is wrong i.e. the PII will be logged elsewhere – such as web server log files and routers around the internet. Better to redact PII by default – see this post: https://brianclifton.com/blog/2017/09/07/remove-pii-from-google-analytics/, setup a GA alert for when this appears in your reports, then fix the issue at source.

      That would be GDPR compliant i.e. you have a process to monitor and fix PII issues. Your quick fix is not…

      • Dear Brian,

        I believe you are mistaken. No view settings (not a filter, not the exclusion of query parameters, not any view setting) will effectively prevent PII from being recorded in GA, since a view setting only alters the way you SEE your data. The captured data, including PII, will still be stored on a property level, which is still prohibited by the terms of GA.

        So Bryan, applying any sort of view setting to hide/filter/exclude your PII won’t prevent the recording of PII on the Google servers. The only way to prevent PII being recorded in GA is preventing that it even enters GA by either using cool scripts (in GTM) like Brian’s or Simo’s, or just fixing your website.

        Hope it helps.

      • Bryan says:

        Thank you for the reply, Brian! I will implement your custom JavaScript in GTM to prevent PII from getting on GAs servers.

        Is there something I can do to remove the PII that’s been getting pulled into GA, or is it too late?

      • Bryan says:

        Hi Brian,

        Should customTask popup in the Field Name dropdown of the “More Settings” in my Universal Analytics tag? I’m not seeing it. Does customTask need to be added to my GTM container? I’m a little confused here.

        Would you be able to point me in the right direction?

  9. Lifan Shiu says:

    Hi,

    Thanks for the post. It works, but I want to extend this and want to use this to redact IBAN (banknumber within Europe). An IBAN number consist of:
    – Countrycode (2 letters) for example: NL
    – 2 check numbers for example 53
    – Bank code (4 characters) for example: ABNA
    – Bankaccountnumber (10 numbers): 1234567890

    The whole IBAN number would be: NL53ABNA1234567890

    I tried the following regex: /^[a-zA-Z]{2}[0-9]{2}[a-zA-Z0-9]{4}[0-9]{7}([a-zA-Z0-9]?){0,16}$/gi;

    but that didn’t work. I don’t a lot of regex so if someone can help me out, that would be great.

    Thanks in advance.

    • Looks like you are missing a set of brackets i.e. to wrap the preceding token before the quantifier {0,16}

      So: ^(everything in here){0,16}$/gi

      Remember this only works in GA. So if you have an issue collecting this type of data, better to fix the underlying problem as this is also likely to be in your server/router logfiles.

      HTH

  10. Jeroen says:

    Great article so many thanks for sharing.

    Can you give any advice on how to test this in preview before I publish?

    Is this even possible?

    Thanks in advance

  11. John says:

    When you say replace “domain\.com” with your domain, does that mean “example.com” becomes “example\.com” ?

  12. Bob says:

    Hi Brian,

    Is there a step by step video or article to walk me through this? I’m new to GTM and not overly familiar with the process of setting up triggers with variables.

    Thanks,

  13. Peter says:

    Great article so many thanks for sharing.
    Got a quick question on the script itself. Was wondering, if there is a way to get the full functionality scope of anonymizing PII, but avoid special characters in title tags etc. to be encoded in the reports. (characters like ü, ä, etc.)

    Any advice on that would be much appreciated
    thanks

    • Hello Peter + are you seeing an issue with these chars? I live in Sweden and have used this method with åäö chars without problems…

      • Peter says:

        Hi Brian,

        Yes, I do indeed. If I roll back the customTask implementation in GTM everything goes back to “normal”. That’s why I believe its related to the PII script in your post. Results are like: https://prnt.sc/joyft8
        So main characters that mess around are the German ü,Ü,ä,Ä,ö,Ö but also symbols like “»”. Onsite Meta Tags are decoded correctly, just in the reports the encode for some reason

        • Did you use the “Update” notice below the code i.e. changing line 41?

          I am using that change (some servers handle such chars differently) and it works for me.

          • Peter says:

            UPDATE: Apologize Brian! Your line 41 update actually solved it. My mistake. Sorry again and thanks for helping out

  14. nicolaos says:

    Thanks for this Awesome post ! Quick question: Why not treat EMAIL query as all others like firstname and telephone.
    In order words, why not remove all the value of the email query instead just the 4 keys before and after the @

    Thanks

    • Thanks for the feedback. The simple answer is because you can i.e. there is a nice anchor (the @) to use that other fields don’t have. That way, the owner of the data is able to understand exactly what email addresses are being captured. For example, it could be legitimate mailto links to resellers i.e. not really PII. There is just no way to do this with other potential PII.

      • nicolaos says:

        Superb! Thanks.

        Any ideas how to implement this for Adwords too?

        Unfortunately couldn’t add a customTask for Adwords the same way I did for GA.

        Thanks in advance

        • No, this is specific to GA at present…

          • Hi nicolaos and Brian,

            What we’ve done in our case to prevent PII to be sent to AdWords is to simply create a GTM variable which compares the audited URL and compares it with the original page URL. If both of them are the same, nothing was audited (URL is OK), if they are not the same, likely PII was detected.

            Using the output from this variable either in your trigger for firing the AdWords tag(s), or by embedding this variable in an exception trigger, you can control whether the AdWords tag is fired or not; preventing PII to be sent to AdWords.

            Yes, you may miss some remarketing / conversion information, but at least PII will no longer be sent to AdWords (assuming that you’ve tackled all forms of PII possible on your webpage).
            Using Google Analytics you can find out what the problematic page is and make sure it will be fixed by the web developer. Ideally there should never be any personal information present in the page URL.

  15. Ankit says:

    Great post.
    Just one simple question.
    Do you know how I can use customTask for ‘AdWords Remarketing’ tags (to redact PII)? I use it very well for ‘Universal Analytics’ tags.

  16. Marco says:

    Hi Brian,

    Thank you very much for posting this useful information.

    I am quite interested on implementing this mechanism on my website, which is coded over PHP instead of Javascript. Do you know if this is a limitation to implement GTM (i think the frame of code is just available on .js coding?) and consequently your custom JavaScript variable?

  17. Travis B says:

    Any way to go about doing this in GA without using Tag Manager?

    • You really want to do this at the point of data collection i.e. via GTM (or other tag manager solution) and not once the data is already in your GA account. Essentially, once you have collected the PII data you have already broken data privacy laws – regardless of where it is then stored/processed.

      However, note this technique is a monitoring system for you – using GA to redact and then flag up PII issues with your website. Ultimately you these need to be flagged to the IT/Web Dev team to sort out the underlying issue as even if the data does not get into GA it is very likely to be in web server log files, router log files and firewall log files etc.

      • Travis B says:

        Thanks Brian! I’m actually having to use analytics.js just for the sake of the way our e-commerce tracking is set up. I found out I can load analytics.js and then the ecommerce.js plugin without sending a page view. and then use GTM for page views and setting up this PII flagging. Great post! Thanks for the response.

  18. Matt says:

    Thanks for sharing! I really like the idea of capturing more types of PII. However, it seems like this code snippet only works for the EMAIL and SELF-EMAIL patterns. For the other types of PII, it seems like the regex pattern isn’t getting applied to the right piece.

    It looks like the line
    “`
    hitPayload = sendModel.get(‘hitPayload’).split(‘&’)

    breaks the URL into an array of strings, removing and splitting it at any `&`’s.

    So, a starting URL of:
    “`
    /test?tel=+44012345678&email=brian@me.com&other=bclifton@DOMAIN.com&firstName=brian&password=hello
    “`

    Would look like this as a JavaScript array:
    “`
    [
    “/test?tel=+44012345678”,
    “email=brian@me.com”,
    “other=bclifton@DOMAIN.com”,
    “firstName=brian”,
    “password=hello”
    ]
    “`

    You then cycle through this array with a for loop:
    “`
    for (i = 0; i < hitPayload.length; i++) {

    }
    “`

    Inside the for loop, you break each string into another array of strings with by the `=`:
    “`
    parts = hitPayload[i].split('=');
    “`

    So a piece of the larger array like:
    “`
    "firstName=brian"
    “`

    Becomes:
    “`
    ["firstName", "brian"]
    “`

    You then decode and assign to `val` the second item in the array.
    “`
    val = decodeURIComponent(unescape(parts[1]));
    “`

    Here’s what looks like a problem to me: only "val" gets the regex pattern applied to it:
    “`
    piiRegex.forEach(function (pii) {
    val = val.replace(pii.regex, pii.group + '[REDACTED ' + pii.name + ']');
    });
    “`

    The code seem to be cycling through the PII regex patterns and applying both the pattern and replacement only to the values of any query parameters, but not the query. The anchor needs to be searching for terms like `firstName`, but I think it’s only seeing values like "brian" all the way down.

    You then encode the value again, join each part back by the =, and join everything back up into one url with the &.
    “`
    parts[1] = encodeURIComponent(val);
    hitPayload[i] = parts.join('=');
    } // end for loop
    sendModel.set('hitPayload', hitPayload.join('&'), true);
    “`

    Am I missing something? It seems like the regex pattern is only getting applied to the second part of `parts` or the value in the query parameter (e.g. “brian"), but never the full query parameter (e.g. “firstName=brian")

  19. Simms says:

    Great post but one small question.

    Is it possible to just make this apply to GA and not our own software system. Is this code a blanket block for all tags or could some tags be on. White list?

    • Hello @simms – this method uses a Custom JS variable that is applied via Google Analytics’s new customTask feature. You apply it to which ever GA tags you need, but this will not work with non-GA tags unless the customTask feature is available to use. Note: customTask is a very new feature (Aug 2017) and I am not aware of other tags that can use this.

      That said, if you are collecting PII in plain text e.g. within URLs (regardless of where you are sending it), then you have a privacy issue as it will be logged by default on your webserver and on every router the URL passes through…

  20. Stu Bowker says:

    Thanks Brian. Might be worth adding ‘username’ too.

    • Yes exactly. I would customise the regex for your environment. Accountname, uname, user_name, customer etc., are all English possibilities. And if you work in multiple markets you will need to consider others as well e.g. kund(er) for the Swedish market. Essentially, a generic regex for all users doesn’t really work, it needs to be tailored…

  21. Jon Hibbitt says:

    Thanks! Awesome post and nice enhancements on Simo’s original. This will be going into the toolbox for sure.

  22. Hi Brian,

    I’m surprised that you included [redacted self-email] as an issue with regards to GA TOS. I would argue that the Terms of Service are concerned with personally identifying visitors to your site, and are not concerned with the personal details of the users of the website itself.

    As such, I find the customTask to be way to broad with regards to how it redacts information. My take is that doing things such as tracking who the user is trying to contact is not problematic. That applies to phone numbers tapped and mailto: links clicked. Additionally, while I would agree that a users address is out of bounds for data collection in GA, I don’t see how a postal code is a problem. Please argue the other side with me in the comments; I’m open to hearing it.

    Hope you’re doing well.

    Yehoshua

    • Hej @Yehoshua – although capturing your own orgs’ email addresses with GA reports is not strictly PII, it would likely be picked up by Google’s compliance robots. And who knows what that would result in – possibly data being deleted, your account suspended etc. Although you would be correct to argue the fine nuance of the reality, my suggestion is to avoid it in the first place i.e. good luck talking to a human compliance/legal officer at Google to make your case!

      In terms of zip codes, these can get very specific. For example, in the UK they can be limited to a set of 5 houses. Also, I found this article specific to Canada:

      “…it was found that 87.9% of the postal code locations were within 200 meters of the true address location (straight line distances) and 96.5% were within 500 meters of the address location (straight line distances).”

      https://ij-healthgeographics.biomedcentral.com/articles/10.1186/1476-072X-3-5

      • Stu Bowker says:

        I believe that capturing the first part of a UK postcode is OK because it’s at a city level, not street. For example, if the postcode equals “BA1 1AA”, then collecting just “BA1” in GA is fine.

        • Most likely true for the UK (always check such things with your legal/compliance team) and the regex can be customised accordingly. I would suggest if the post code is present, then the street address should also be checked and redacted…

      • Milos says:

        Precisely. As one GA expert once told me, don’t f*k with it 🙂
        If there is a remote possibility that GA will interpret the data as PII, don’t import it. Great post.

Leave a Reply

Your email address will not be published. Required fields are marked *

Anti-spam question (required):

This site uses Akismet to reduce spam. Learn how your comment data is processed.

© Brian Clifton 2018
Best practice privacy statement