Noise or Music?

How do I exclude internal traffic in Google Analytics?

Categories: GA & GTM, Implementation Fundamentals / Comments: 0

Excluding or filtering internal i.e. staff visits from Google Analytics is an interesting challenge as it’s not as straightforward as you may first think. This is because there is no obvious way to identify a staff visitor… Hence this detailed post showing a possible alternative.

In this article I discuss why applying a filter is flawed and suggest a quick fix that may help you overcome its limitations. However the smarter and more robust way is to use a visitor label – a.k.a Custom Dimension.

Segments versus Filters

Firstly, before embarking on this exercise ask yourself do you really want to remove your own staff visits? Typically you just want them out of your marketing reports, rather than deleted which is what a filter does. However, excluding visits from certain reports is a segmentation exercise i.e. identifying the target audience in your tsunami of data. Use segments to temporary modify your reports and apply them retroactively, rather than the permanent effect of filters that only apply moving forward in time.

Essentially, segments that can be turned on/off on a per report basis are much more flexible than filters which permanently modify your data and only in a forwards direction:

using a segment in Google Analytics

 

Also ask yourself, is it worth it…?

Consider the volume of traffic you are actually trying to remove/segment. For example, if it only constitutes a small fraction of your total traffic, it probably is not worth the effort (see below). Ensure you understand how your staff use your site first and how it differs from your non-staff visitors before making that decision.

Important: Filters are destructive, therefore use with care! Filtering incoming hits permanently includes, excludes, or alters your hits for the view they are applied to. Always maintain an unfiltered view of your data so you have access to your unfiltered data-set.

Why it’s not so obvious to filter staff visits

The “standard” way of excluding staff visits is to filter via user IP address. And from the smart work being done at verified-data.com, 80% of audited websites have some kind of filter to try to exclude internal visits. However, using IP addresses has a few flaws you need to be aware of:

  • A) User IP addresses are rarely fixed – Remember remote workers are likely to have dynamic IP addresses at home or in small offices. Also staff accessing your site via a mobile device on the move will also have a dynamic IP.
  • B) IP addresses can get unmanageable – With an IP address for each user, the amount to manage can get out of control. Forget it if you are going to require more than 20 or so regex filters to capture these. Remember IP addresses can and do change – usually you receive no warning of this and likely you will never know there has been a change.
  • C) Anonymise IP addresses voids IP filters – If your GA settings are set to anonymise IP address, then IP filters using the full IP address will fail as anonymise IP removes the last 3 digits. You could of course reset the to aaa.bbb.ccc, but that is very blunt. Note, in some EU countries anonymising IP addresses is the law. That position is likely to become more the norm as GDPR gets augmented with the ePrivacy Regulation that is due sometime in 2020. So it’s best practice to get comfortable with this setting and avoid IP filters.

BTW from my Google Analytics Audits – An Enterprise Study, fewer than 20% of EU based organisation turn on anonymise IP. This is surprising considering the impact assessment of using anonymize IP is considered minimal. However I do expect anonymise IP usage to increase as more data managers become aware of it.

Alternative 1: A Potential Quick Fix…

For large organisations, a quick fix is to review the “Audience / Technology / Network” report – see screenshot. Specifically the Service Provider report which lists the ISPs visitors are using to reach your website. Search through this list looking for your company name and any potential abbreviation. Also ask your IT department about how this is used for remote users/offices, VPN etc. Any static name that is specific to your business is what you should use in your filter:

Google Analytics service provider report

 

If you find your own “company service provider” name in this report, apply an exclude filter based on this:

service provider filter

NOTE: This is a quick fix that can prove more reliable than IP address filters. Importantly it will work even if you use anonymise IP. However it is not bulletproof, as of course it relies on ALL your staff visitors using the same service provider(s). That may be the case if your staff mostly connect from the office and rarely work from home, or your organisation enforces a VPN policy.

Alternative 2: Visitor Labelling

This is much more a robust approach but does require a little more work. Essentially, you setup your website to drop a persistent “label” on visitors that you know are staff. What I call a visitor label is termed a custom dimension in Google Analytics.

Custom dimensions come in three scopes – hit, session and user. It is the user scope that is persistent. That is, it uses a cookie to label a visitor. That means when the same visitor returns to your website the label can be read and matched with previous visits e.g. existing customer, staff etc.

Of course cookie limitations apply – that is, they are device specific and they do not last forever! However, the custom dimension label can provide a robust approach, so long as your visitors return quite often e.g. weekly.

You can then filter your Google Analytics data based on this label, or use GTM to send the labelled data to a different Google Analytics property. Essentially using custom dimensions is a great segmentation technique for all your visitors.

APPLYING A CUSTOM DIMENSION

There are multiple ways to achieve this, but here is my method when using GTM.

Note for this to work, we have to make some assumptions about your website. I have found these fit with many corporate websites, wordpress and edu sites, but these will not fit every case.

First, the checklist summary:

  1. Ask your web developers to push a variable to the data layer when it is possible to determine a staff visitor.
  2. Define a data layer variable in your GTM container that will scoop up this variable value if/when set.
  3. Add the data layer variable to your GA settings variable in GTM.
  4. Setup a user-scoped custom dimension in your Google Analytics property so that reports can be built with it.

A more detailed explanation with screenshots…

1. Push a variable to the data layer

This is the most difficult step, so good to get this done first! It’s not that it is rocket science, but you do need to go to your web dev team to explain your needs. The clever part they will be looking to achieve is identifying what constitutes a staff visitor.

For example, a login event for a staff visit is an obvious one that can be applied to future sessions from the same visitor – even if they do not log in. Of course you need some type of login to do this, perhaps a CMS. On this website the following cookie exists for me and I can safely say if that if any other visitor had this cookie present they would be a part of my team i.e. internal visitors:

login cookie from wordpress

An example of checking for an existing cookie, is one left from staff using the organisation’s intranet site – this assumes the same top level domain e.g. intra.corpsite.com. Perhaps the fact that such a cookie exists is enough to confirm a staff member. Note, the cookie will need to be one set by the same top level domain as it is not possible to access cookies set by other domains – that’s a built-in browser security feature.

Staff could also be identified as the result of loading a very specific page i.e. one that is not available to visitors unless they know the URL (and excluded by robots.txt).

Once it is known that the visitor is a staff member, the JavaScript snippet to push this variable into the data layer, BEFORE the GTM container snippet, is as follows:

<script>
    window.dataLayer = window.dataLayer || [];
    dataLayer.push({
        'internalUser' : true
    });
</script>

In my example, the variable name is internalUser and its value is true. You can use any variable name you wish, just be consistent in its use from here on.

If your web team has more fine-grain information than true/false, use it. For example, you may have visitorType with values such as: full-time staff, part-time staff, intern, investor, partner etc. Of course no personally identifiable information is allowed – both in terms of the Google terms of service and in GDPR law.

2. Within GTM define your data layer variable

With Step 1, the most important and potential difficult step out of the way, its plain sailing from here :). Within your GTM account define a data layer variable that will scoop up the variable value when set. Be sure to set a default value so that if no value is set in step 1, the default applies.

Tip: I often format the case of variable values to lowercase in order to maintain consistency in your GA reports. That is, you don’t want separate data rows just because somebody used “True” as a value in a different part of your site.

3. Within GTM add your data layer variable to your GA settings variable

Again this is straightforward, but to make these types of changes super-simple, use a Google Analytics Settings variable. In this way the values set can be applied to your ALL tags in bulk. Otherwise you will need to add the data layer variable to each tag that may fire for your staff visitor. Potentially, as we are setting a user-scoped custom dimension, your general pageview tag that fires on an “All pages” trigger will suffice. However, it is worth double checking such a tag will fire under all circumstances of a staff visit. Tip: Avoid this worry by implementing GA using the Google Analytics settings variable.

As before, ensure you are consistent in you variable name (the dimension value). Also use an index that is available in your Google Analytics property settings – see next step. If you have no other custom dimensions setup you will use index 1.

custom dimension in GTM

4. Within Google Analytics setup your custom dimension

So everything is now in place in terms of data collection. All that is required is to setup the user-scoped custom dimension in your Google Analytics property so that reports can be built with it.

As a Google Analytics admin, simply define the custom dimension in your property settings: as per the Google documentation

custom dimension in GA

 

Setup Complete – Now include/exclude staff from your reports

staff visits segment

Summary

Mitigating your internal staff visits is tricky though not impossible to do. There are alternative routes other than IP filtering, but assess your requirements and potential impact in detail before investing in the effort. Even the more robust method proposed here is not bulletproof as it depends on what features you have available on your site to provide a “hook” in order to label your visitors. So aiming for completely “staff free” reports is simply never 100% possible. So ask yourself first if it is worth trying.

BTW, if you are interested in what I am building in this space – a forensic Google Analytics auditing tool with an emphasis on GDPR compliance – visit verified-data.com.

Share Button

Leave a Reply

Your email address will not be published. Required fields are marked *

Anti-spam question (required):

This site uses Akismet to reduce spam. Learn how your comment data is processed.

© Brian Clifton 2019
Best practice privacy statement