What Is PII versus Personal Data?

In summary, GDPR law means that organisations need to keep records of all personal data, be able to prove that consent was given, show where data is going, what it’s being used for, and how it is being protected. And it applies to anyone processing data on EU citizens.

But what defines personal data…? And is PII different…?

Personal Identifiable Information (PII):

The jigsaw effect of collecting anonymous data

This is a term used to group the obvious suspects of name, address, telephone details, email address etc., as personal data. That is, they very quickly identify me as a unique individual. However how far can we go down this path of collecting data that becomes less and less unique and hence less likely to identify me? For example, are social media handles considered PII? Mine are because I use my name, but say my twitter handle was @PrivacyMan98…

As you move away from collecting obvious PII, the connection to a person gets weaker. At some point you would expect the collected data to be fully anonymised. That is, it would have no direct connection to being able to identify me. However that turns out to be impossible, due to the vast quantity of data points being continuously collected about people’s browsing habits. It’s called data triangulation, or the jigsaw puzzle effect.

Personal Data:
This is data that at first appears benign, such as a visitor’s gender, age, language, type of car owned, demographic group etc. These in isolation are harmless in that they do not identify an individual. However, string them together and associate them with a web visitor who may be searching for a very specific item in a geographic location, or accessing their Facebook/LinkedIn/any_social_network account, and pretty soon you can identify who that person is. Therefore:

ALL “click-stream” data is personal data.

I repeat it for the emphasis of just how important this concept is – ALL “click-stream” data is personal data. By click-stream, I am referring to to clicks and pageviews of a person browsing the web. If that data can be tied to a single person, no matter how anonymised it is – the jigsaw effect means that person can ultimately be identified. In fact it is trivial for the tech giants to do this, and its the basis of pretty much all online advertising today.

PII can therefore be defined as a very specific sub-set of personal data, but it is meaningless to consider it separate from the whole.

Data Triangulation – The Jigsaw Effect

A classic case of this happening was the AOL data scandal of 2006. This initially involved the authorised release of a large volume of “anonymised” search query data – intended for research purposes. However, New York Times  journalists (and others) were able to analyse this and subsequently identify individuals by data triangulation.

See also: nature.com/articles/s41467-019-10933-3 (only 15 data points needed!) and  zdnet.com/article/mozilla-research-browsing-histories- are-unique-enough-to-reliably-identify-users/ (150 URL histories).

Data triangulation connects the dots between anonymous data so that it becomes personal data.

Put simply, I do not know of a single commercial website that does not collect personal data at some level. If not, half your marketing budget would be wasted and we would still be using sites that looked like they were built in 1998 i.e. no data on which to base decisions on, such as improving the UX.

For obvious reasons I am not against collecting click-stream data. It is also expected by customers/visitors as standard practice. That is they want a continuously improved user experience and relevant offers – so nothing wrong with collecting such data per se. However, as a website or app owner (or DPO) you must determine, and be able to verify, if consent from your website visitors is required in order to track them. Essentially, taking no action on this point is not an option if you serve customers from the largest single trading block in the world.

IMPORTANT: Note that “Consent required” means explicit consent i.e. the visitor explicitly opts-in, not implied or assumed. And that means obtaining consent before you track them. See my consent whitepaper on the best practice approach for doing this while maintaining a high opt-in rate.

What About Google Analytics…?

From a previous post (Google Analytics, GDPR and Consent), I recommend you request consent to track your web visitors by default. If you don’t do this, then the onus is on you to verify consent is not required by auditing ALL the tracking pixels on ALL your pages – and do this regularly to confirm compliance. That is doable, but a really huge undertaking.

My reason for requesting consent by default is that the GDPR is applicable to your organisation and therefore its website(s) as a whole. The law is not specific to any tool or technology used for tracking. If, by auditing all potential pixels you are able to confirm that no other tracking collects personal on your website (or if there are, each one is compliant), the Google Analytics position is as follows:

Google’s consent policy for Google Analytics:

When using Google Analytics Advertising Features, Google state you must also comply with the European Union User Consent Policy.

Taken from: Policy requirements for Google Analytics Advertising Features

The Google Analytics Advertising features include Demographics and Interest ReportsRemarketing with GA and DCM Integration. The reasoning is that these features require the use of 3rd-party cookies i.e. the sharing of data with organisations other than the website being visited itself. Hence the privacy implications.

Summary of Google’s Advice:
If you use these Advertising features in Google Analytics, you must request explicit consent. If you do not, then Google considers its tracking tool as collecting benign data – hence no consent required.

Conclusions

1. When it comes to website data, GDPR is clear in that the law is applicable not just to PII data i.e. the obvious types of name, email address etc., it also applies to any data consider personal – which is ALL click-stream data. These are data points that at first glance may appear benign, but when combined with other benign data i.e. triangulated, they can identify an individual.

2. Every commercial website collects personal data at some level (possibly every website does), hence my interpretation of the GDPR is that website owners request explicit consent from all EU visitors, and before tracking begins.

3. GDPR is not specific to any tool or technology. Therefore, unless you can verify that ALL tracking scripts are compliant, or can verify that the only tracking pixel on your website is Google Analytics and its setup does not include Google’s advertising features, then you need to request tracking consent from your visitors.

Update Jan 2022: There is a growing consensus across EU data protection authorities that Google Analytics cannot be considered benign, even with Advertising Features disabled. Essentially Google is more than capable of stitching together such benign data to identify an individual and there is no formal documentation from them to deny this. For example, see this page from the French authority CNIL.

4. Although the GDPR is specific to EU citizens wherever they may roam, in my opinion this is very likely to become a global data standard. After all, many have said that data is the new currency. Hence just like the financial markets, regulation is required and indeed desired by the vast majority of ordinary people – not just from within the EU.

BTW, if you are interested in what I am building in this space – a forensic GA data auditing tool with an emphasis on GDPR compliance – visit verified-data.com.

Looking for a keynote speaker, or wish to hire Brian…?

If you are an organisation wishing to hire me and my team, please view the Contact page. I am based in Sweden and advise organisations in Europe as well as North America.

You May Also Like…

Sayonara Universal Analytics

Sayonara Universal Analytics

My first Google Analytics data point was 15th May 2005 for UA-20024. If you are of a certain age, that may sound off...

14 Comments

  1. Roi

    Assuming my App collects your geo location, search query (in full) and search engine used (all under a Unique User ID, no names, IPs, or anything of sort), would you consider it PI or PII?

    Reply
  2. Zoi

    Hi Brian. Excellent job! Thank you! I would like your view on the following: According to the GDPR, if a website uses a GDPR-compliant web analytics solution and it does not send data over to Google Maps of YouTube on page load, consent is not required, right? Is that so for governmental websites as well?

    Also, there is the possibility that the website allows for consent once the embedded YouTube video loads on the page, in which case the user is informed that data will be sent when/if they click “Play”. See example at: https://ec.europa.eu/home-affairs/

    Is that approach GDPR compliant?

    Reply
    • Brian Clifton

      Thanks for the feedback Zoi. Firstly I would say be a little careful when interpreting GDPR – it is about organizations, not tools. I am applying GDPR to an org’s website data collection methods – both intended and unintended data collection. Therefore, it is the site as a whole that needs to be compliant. Also, the law applies to ALL websites – commercial, non-profit, personal and government – that collect data on EU citizens (item 3 on my conclusions).

      My point with this article is that although benign tracking with no data sent to any 3rd-party would *not* need to ask for consent – as it could be argued this is “necessary and proportionate”, the triangulation issue plus the plethora of embedded 3rd-party content means this is unattainable in practice. For users of Google Analytics, the issue is it runs on Google’s omnipotent platform. For example, consider Google Signals.

      Related LinkedIn article:
      https://www.linkedin.com/pulse/why-i-think-sky-falling-web-analytics-brian-clifton-phd-/

      Reply
  3. Sol

    If one were to write an article for an external website such as Medium.com or even Quora, would it be legally acceptable to include 3rd party tracking links in said article? TIA

    Reply
    • Brian Clifton

      I would say that medium.com is responsible for all content on their site i.e. the data controller, and I would be very surprised if they allow *any* embedded widgets/tracking on user generated content. It is most likely locked down technically, if not I would check their T&Cs, or simply asked them.

      Reply
  4. Martin

    “Are your visitors connecting from the EU?” Is not really the correct question because (as you say in conclusion 4) its EU citizenship not location that is important to GDPR. So a German businessman in Singapore needs a banner and a US businesswoman visiting Spain does not.

    In practice this probably means everyone needs a banner regardless of IP as there is no way to know citizenship otherwise.

    Reply
    • Brian Clifton

      An interesting scenario. I am not attempting to be a legal expert, but my view would be that a citizen of country-X visiting country-Y needs to obey the laws of country-Y, not the other way around… so the German business man in Singapore is bound by the law in Singapore. And the US woman gets extra privacy protection while visiting Spain…

      Reply
      • Roi

        I tend to agree with Brian. Plus, GDPR’s definition to ‘EU Data Subject’ isn’t about your citizenship or place of residents, but the location where the data is collected.
        Therefore, such a US businesswoman visiting Spain will subject US companies providing her services which collects data, either via mobile apps or websites, to GDPR practices.

        Reply
  5. Jacob

    I’m not sure I 100% agree with your Google Analytics recommendation. Would’nt you be home safe with ip anonymization, because nothing else sent to GA are PII, and cannot be stringed together without the specific ip?

    Reply
    • Brian Clifton

      Hello Jacob – Bear in mind GDPR is not tool specific…

      Assuming GA is the only tracking pixel on a site, then I am simply reflecting Google’s own advice (see link). That is, if you enable their ad features you are sharing data with 3rd-parties, and that requires consent.

      AnonymiseIP is a good thing to have by default (and I will be publishing a study soon on the impact of it), though it is Google’s tracking cookie (clientID) that is user/device specific. Hence why you can see individual users in the User Explorer report regardless of whether anonymiseIP is used.

      Reply
      • Jacob

        You’re completely right – GDPR is not a tool specific. 🙂

        And sharing PII with 3rd party should be an opt-in and advertised. But. When you anonymize the IP there is no information linked to any specific person. Yes you have a clientID but that’s not specified as PII? Clear cookies and you will get a new clientID.

        I don’t think sharing anonymized data is against GDPR?

        Reply

Submit a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share This