What Is PII versus Personal Data?

Under the EU General Data Protection Regulation (GDPR) and the ePrivacy Directive (ePD) laws, organisations need to keep records of all personal data processing, be able to show they have a legal basis to collect/store it e.g. prove consent was given, show where data is going, what it’s being used for, and how it is being protected. It applies to anyone processing data about identifiable EU individuals.

But what defines personal data…? And is Personal Identifiable Information (PII) any different…?

What is Personal Data?

Under GDPR, any information that relates to an identifiable individual qualifies as personal data.

This definition is quite broad and can include direct identifiers (names, email addresses), indirect identifiers (IP addresses, device IDs, geolocation) and behavioural data (clickstream patterns if linkable to a person).

Indirect identifiers and behavioural data are the categories most associated with tools such as Google Analytics and marketing tracking scripts. It is the most difficult to define, because at first, such data can appear harmless and benign. For example, a visitor’s gender, age, language, type of car owned, demographic group etc.

As isolated data points, these are harmless – they do not identify an individual. However, string them together and associate them with a unique id for a specific web visitor and you can identify a person.

The jigsaw effect

Hence: Unless aggregated, ALL individual “click-stream” data is personal data.

For click-stream, I am referring to the clicks, actions and page views of a person viewing a website. If that data can be tied to a single person – no matter how anonymised it is – the jigsaw effect means that person can ultimately be identified (see below).

In fact, it is trivial for the tech giants to do this and it’s the basis of pretty much all online advertising today.

What is Personal Identifiable Information (PII)?

The term PII is predominantly a U.S. term used in privacy and security contexts. It generally refers to direct identifiers that uniquely identify or can be used to distinguish or trace a person’s identity, such as:

name
address
telephone details
email address
social security number etc.,

These identify a user (or a very small cohort, such as a family) as a unique person.

Hence: PII is defined as a sub-set of personal data.

All PII (as commonly understood) falls within GDPR’s concept of personal data when processed about an identifiable EU individual.

In other words, what you think of as traditional direct identifiers (name, email, phone number) are definitely personal data under GDPR. However, PII is not a legal term defined in the GDPR.

The Jigsaw Effect – Data Triangulation

A classic case of triangulating indirect identifiers and behavioural happening was the AOL data scandal of 2006. This initially involved the authorised release of a large volume of “anonymised” search query data – intended for research purposes. However, New York Times journalists (and others) were able to analyse this and subsequently identify individuals by data triangulation.

Other studies confirming the same triangulation effect:

nature.com/articles/s41467-019-10933-3 (only 15 data points needed!)
Mozilla research: Browsing histories are unique enough to reliably identify users (150 URL histories).

Data triangulation connects the dots between anonymous data so that it becomes personal data.

Put simply, all commercial websites these days collect personal data at some level. I am not against collecting click-stream data and it is expected by customers/visitors as standard practice to improve user experience and relevant offers – so nothing wrong with collecting such data per se. However, as a website or app owner (or DPO) you must determine, and be able to verify, if you have a legal basis to track your website visitors – for example, obtain their consent*.

Essentially, taking no action on this point is not an option if you serve customers from the largest single trading block in the world.

*Under the current GDPR/ePrivacy framework, consent is just one of six lawful bases for processing personal data (Article 6(1)). I refer to consent as it is the most common for a commercial website i.e. the visitor explicitly opts-in (not implied or assumed consent), and this happens before any tracking commences. See my consent whitepaper on the best practice approach for doing this while maintaining a high opt-in rate.

What About Google Analytics…?

From a previous post (Google Analytics, GDPR and Consent), I recommend you request consent to track your web visitors by default. If you don’t do this, then the onus is on you to verify consent is not required by auditing ALL the tracking pixels and cookies on ALL your pages – and do this regularly to confirm compliance. That is doable, but a really huge undertaking.

My reason for requesting consent by default is that the GDPR is applicable to your organisation and therefore its website(s) as a whole. The law is not specific to any tool or technology used for tracking.

Interestingly, Google’s position is that if you use Advertising features in Google Analytics, you must request explicit consent. But if you do not use these features, then Google considers its tracking tool as collecting benign data i.e. no consent required. The Google Analytics Advertising features include Demographics and Interest Reports, Re-marketing and DoubleClick (DCM) Integration.

However, I do not recommend the Google approach.

Google’s reference docs:

Google state you must also comply with the European Union User Consent Policy.
Policy requirements for Google Analytics Advertising Features

Update Jan 2022: There is a growing consensus across EU data protection authorities that Google Analytics cannot be considered benign, even with Advertising Features disabled. Essentially Google is more than capable of stitching together such benign data to identify an individual and there is no formal documentation from them to deny this. For example, see this page from the French authority CNIL.

Conclusions

This article is phrased from a practical perspective that all behavioural data used by adtech giants is personal data because it can ultimately be tied to an individual. That is a reasonable interpretative stance in many real-world tracking scenarios, though I appreciate it is not a strict legal definition that the GDPR states in exactly those terms.

1. When it comes to website data, GDPR is clear in that the law is applicable not just to PII data i.e. the obvious types of name, email address etc., it also applies to any data considered personal – which is ALL click-stream data if not aggregated. These are data points that at first glance may appear benign, but when combined with other benign data i.e. triangulated, they can identify an individual.

2. Every commercial website collects personal data at some level (possibly every website does), hence my interpretation of the GDPR is that website owners require explicit consent from all EU visitors, and before tracking begins.

3. GDPR is not specific to any tool or technology. Therefore, unless you can verify that ALL tracking scripts are compliant, or can verify that the only tracking pixel on your website is Google Analytics and its setup does not include Google’s advertising features, then you need to request tracking consent from your visitors. See my update Jan 2022

4. Although the GDPR is specific to the EU, in my opinion this is very likely to become a global data standard. After all, many have said that data is the new currency. Hence just like the financial markets, regulation is required and indeed desired by the vast majority of ordinary people – not just from within the EU.

Brian Clifton is a measurement and data privacy strategist, author, and founder of Verified Data — the leading data quality and compliance audit platform. Formerly Google’s Head of Web Analytics for EMEA, Brian has spent over two decades helping organisations build trust in their data. He is the author of the best-selling books Successful Analytics and Advanced Web Metrics with Google Analytics and a certified member of the European Association of Data Protection Professionals.

← Prev: Google Analytics Data Retention Settings Explained Next: Google Analytics Anonymize IP - An Impact Study →

Looking for a keynote speaker, or wish to hire Brian…?

If you are an organisation wishing to hire me and my team, please view the Contact page. I am based in Sweden and advise organisations in Europe as well as North America.

14 Comments

Roi on December 21, 2020 at 10:37 am

Assuming my App collects your geo location, search query (in full) and search engine used (all under a Unique User ID, no names, IPs, or anything of sort), would you consider it PI or PII?
Reply
Zoi on November 11, 2020 at 3:31 pm

Hi Brian. Excellent job! Thank you! I would like your view on the following: According to the GDPR, if a website uses a GDPR-compliant web analytics solution and it does not send data over to Google Maps of YouTube on page load, consent is not required, right? Is that so for governmental websites as well?

Also, there is the possibility that the website allows for consent once the embedded YouTube video loads on the page, in which case the user is informed that data will be sent when/if they click “Play”. See example at: https://ec.europa.eu/home-affairs/

Is that approach GDPR compliant?
Reply
- Brian Clifton on November 12, 2020 at 1:09 pm
  
  Thanks for the feedback Zoi. Firstly I would say be a little careful when interpreting GDPR – it is about organizations, not tools. I am applying GDPR to an org’s website data collection methods – both intended and unintended data collection. Therefore, it is the site as a whole that needs to be compliant. Also, the law applies to ALL websites – commercial, non-profit, personal and government – that collect data on EU citizens (item 3 on my conclusions).
  
  My point with this article is that although benign tracking with no data sent to any 3rd-party would *not* need to ask for consent – as it could be argued this is “necessary and proportionate”, the triangulation issue plus the plethora of embedded 3rd-party content means this is unattainable in practice. For users of Google Analytics, the issue is it runs on Google’s omnipotent platform. For example, consider Google Signals.
  
  Related LinkedIn article:
  https://www.linkedin.com/pulse/why-i-think-sky-falling-web-analytics-brian-clifton-phd-/
  Reply
Sol on November 17, 2019 at 4:53 pm

If one were to write an article for an external website such as Medium.com or even Quora, would it be legally acceptable to include 3rd party tracking links in said article? TIA
Reply
- Brian Clifton on November 18, 2019 at 9:37 am
  
  I would say that medium.com is responsible for all content on their site i.e. the data controller, and I would be very surprised if they allow *any* embedded widgets/tracking on user generated content. It is most likely locked down technically, if not I would check their T&Cs, or simply asked them.
  Reply
Martin on July 10, 2018 at 8:21 am

“Are your visitors connecting from the EU?” Is not really the correct question because (as you say in conclusion 4) its EU citizenship not location that is important to GDPR. So a German businessman in Singapore needs a banner and a US businesswoman visiting Spain does not.

In practice this probably means everyone needs a banner regardless of IP as there is no way to know citizenship otherwise.
Reply
- Brian Clifton on August 20, 2018 at 9:51 pm
  
  An interesting scenario. I am not attempting to be a legal expert, but my view would be that a citizen of country-X visiting country-Y needs to obey the laws of country-Y, not the other way around… so the German business man in Singapore is bound by the law in Singapore. And the US woman gets extra privacy protection while visiting Spain…
  Reply
  - Roi on December 21, 2020 at 10:33 am
    
    I tend to agree with Brian. Plus, GDPR’s definition to ‘EU Data Subject’ isn’t about your citizenship or place of residents, but the location where the data is collected.
    Therefore, such a US businesswoman visiting Spain will subject US companies providing her services which collects data, either via mobile apps or websites, to GDPR practices.
    Reply
Jacob on May 24, 2018 at 12:50 pm

I’m not sure I 100% agree with your Google Analytics recommendation. Would’nt you be home safe with ip anonymization, because nothing else sent to GA are PII, and cannot be stringed together without the specific ip?
Reply
- Brian Clifton on May 24, 2018 at 5:34 pm
  
  Hello Jacob – Bear in mind GDPR is not tool specific…
  
  Assuming GA is the only tracking pixel on a site, then I am simply reflecting Google’s own advice (see link). That is, if you enable their ad features you are sharing data with 3rd-parties, and that requires consent.
  
  AnonymiseIP is a good thing to have by default (and I will be publishing a study soon on the impact of it), though it is Google’s tracking cookie (clientID) that is user/device specific. Hence why you can see individual users in the User Explorer report regardless of whether anonymiseIP is used.
  Reply
  - Jacob on May 24, 2018 at 6:34 pm
    
    You’re completely right – GDPR is not a tool specific. 🙂
    
    And sharing PII with 3rd party should be an opt-in and advertised. But. When you anonymize the IP there is no information linked to any specific person. Yes you have a clientID but that’s not specified as PII? Clear cookies and you will get a new clientID.
    
    I don’t think sharing anonymized data is against GDPR?
    Reply
    - Brian Clifton on May 24, 2018 at 7:36 pm
      
      Cookie IDs are classed as PII in Europe… Refreshing them is analogous to refreshing your ip, which happens to most home users when they connect.
      Reply
      - Jacob on May 25, 2018 at 5:49 pm
        
        Do you have a reference for that? I’m not able to find any.
      - Brian Clifton on May 31, 2018 at 10:20 am
        
        Lots of references to this via Google: https://www.google.se/search?q=%2Bgdpr+are+cookie+ids+personal+data&oq=%2Bgdpr++are+cookie+ids+personal+data