In summary, GDPR law means that organisations need to keep records of all personal data, be able to prove that consent was given, show where data is going, what it’s being used for, and how it is being protected. And it applies to anyone processing data on EU citizens.
But what defines personal data…? And is PII different…?
Personal Identifiable Information (PII):
This is a term used to group the obvious suspects of name, address, telephone details, email address etc., as personal data. That is, they very quickly identify me as a unique individual. However how far can we go down this path of collecting data that becomes less and less unique and hence less likely to identify me? For example, are social media handles considered PII? Mine are because I use my name, but say my twitter handle was @PrivacyMan98…
As you move away from collecting obvious PII, the connection to a person gets weaker. At some point you would expect the collected data to be fully anonymised. That is, it would have no direct connection to being able to identify me. However that turns out to be impossible, due to the vast quantity of data points being continuously collected about people’s browsing habits. It’s called data triangulation, or the jigsaw puzzle effect.
This is data that at first appears benign, such as a visitor’s gender, age, language, type of car owned, demographic group etc. These in isolation are harmless in that they do not identify an individual. However, string them together and associate them with a web visitor who may be searching for a very specific item in a geographic location, or accessing their Facebook/LinkedIn/any_social_network account, and pretty soon you can identify who that person is. Therefore:
ALL “click-stream” data is personal data.
I repeat it for the emphasis of just how important this concept is – ALL “click-stream” data is personal data. By click-stream, I am referring to to clicks and pageviews of a person browsing the web. If that data can be tied to a single person, no matter how anonymised it is – the jigsaw effect means that person can ultimately be identified. In fact it is trivial for the tech giants to do this, and its the basis of pretty much all online advertising today.
PII can therefore be defined as a very specific sub-set of personal data, but it is meaningless to consider it separate from the whole.
Data Triangulation – The Jigsaw Effect
A classic case of this happening was the AOL data scandal of 2006. This initially involved the authorised release of a large volume of “anonymised” search query data – intended for research purposes. However, New York Times journalists (and others) were able to analyse this and subsequently identify individuals by data triangulation.
See also: nature.com/articles/s41467-019-10933-3 (only 15 data points needed!) and zdnet.com/article/mozilla-research-browsing-histories- are-unique-enough-to-reliably-identify-users/ (150 URL histories).
Put simply, I do not know of a single commercial website that does not collect personal data at some level. If not, half your marketing budget would be wasted and we would still be using sites that looked like they were built in 1998 i.e. no data on which to base decisions on, such as improving the UX.
For obvious reasons I am not against collecting click-stream data. It is also expected by customers/visitors as standard practice. That is they want a continuously improved user experience and relevant offers – so nothing wrong with collecting such data per se. However, as a website or app owner (or DPO) you must determine, and be able to verify, if consent from your website visitors is required in order to track them. Essentially, taking no action on this point is not an option if you serve customers from the largest single trading block in the world.
IMPORTANT: Note that “Consent required” means explicit consent i.e. the visitor explicitly opts-in, not implied or assumed. And that means obtaining consent before you track them. See my consent whitepaper on the best practice approach for doing this while maintaining a high opt-in rate.
What About Google Analytics…?
From a previous post (Google Analytics, GDPR and Consent), I recommend you request consent to track your web visitors by default. If you don’t do this, then the onus is on you to verify consent is not required by auditing ALL the tracking pixels on ALL your pages – and do this regularly to confirm compliance. That is doable, but a really huge undertaking.
My reason for requesting consent by default is that the GDPR is applicable to your organisation and therefore its website(s) as a whole. The law is not specific to any tool or technology used for tracking. If, by auditing all potential pixels you are able to confirm that no other tracking collects personal on your website (or if there are, each one is compliant), the Google Analytics position is as follows:
Google’s consent policy for Google Analytics:
When using Google Analytics Advertising Features, Google state you must also comply with the European Union User Consent Policy.
The Google Analytics Advertising features include Demographics and Interest Reports, Remarketing with GA and DCM Integration. The reasoning is that these features require the use of 3rd-party cookies i.e. the sharing of data with organisations other than the website being visited itself. Hence the privacy implications.
Summary of Google’s Advice:
If you use these Advertising features in Google Analytics, you must request explicit consent. If you do not, then Google considers its tracking tool as collecting benign data – hence no consent required.
1. When it comes to website data, GDPR is clear in that the law is applicable not just to PII data i.e. the obvious types of name, email address etc., it also applies to any data consider personal – which is ALL click-stream data. These are data points that at first glance may appear benign, but when combined with other benign data i.e. triangulated, they can identify an individual.
2. Every commercial website collects personal data at some level (possibly every website does), hence my interpretation of the GDPR is that website owners request explicit consent from all EU visitors, and before tracking begins.
3. GDPR is not specific to any tool or technology. Therefore, unless you can verify that ALL tracking scripts are compliant, or can verify that the only tracking pixel on your website is Google Analytics and its setup does not include Google’s advertising features, then you need to request tracking consent from your visitors.
Update Jan 2022: There is a growing consensus across EU data protection authorities that Google Analytics cannot be considered benign, even with Advertising Features disabled. Essentially Google is more than capable of stitching together such benign data to identify an individual and there is no formal documentation from them to deny this. For example, see this page from the French authority CNIL.
4. Although the GDPR is specific to EU citizens wherever they may roam, in my opinion this is very likely to become a global data standard. After all, many have said that data is the new currency. Hence just like the financial markets, regulation is required and indeed desired by the vast majority of ordinary people – not just from within the EU.
BTW, if you are interested in what I am building in this space – a forensic GA data auditing tool with an emphasis on GDPR compliance – visit verified-data.com.