How to assess whether data contain personal data?

On this page: sensitive data, privacy-sensitive, personal data, when is data privacy-sensitive, identifiability, identifier
Date of last review: 2022-08-23

Whether your data contain personal data depends on which data you are collecting (nature) and under which circumstances (context). A date like “12 December 1980”, is not personal data – it is just a date. However, that date becomes personal data if it refers to someone’s birthday.

In assessing whether data are personal, you should take into account all the means that you and others may reasonably likely use to identify your data subjects, such as the required money, time, or (future) developments in technology (rec. 26).

Data can be identifiable when:

  • They contain directly identifying information.
    For example: name, image, video recording, audio recording, patient number, IP address, email address, phone number, location data, social media data.
  • It is possible to single out an individual

    This can happen when there are unique data points or unique behavioural patterns which can only apply to one person.

    Examples:

    • You have a data subject who is 2.10 meters tall. If this is a unique value in your dataset, this distinguishes this person from others and thus can make them identifiable.
    • You have a data subject who only follows far-right accounts on Twitter. If they are the only one in your dataset who do so, this distinguishes this person from others and can make them identifiable.

  • It is possible to infer information about an individual based on information in your dataset
    For example:
    • Inferring a medical condition based on registered medications.
    • Guessing that someone lives in a certain neighbourhood based on where they go to school.
  • It is possible to link records relating to an individual.

    This can happen when combining multiple variables within your dataset (e.g., demographic information, indirect identifiers). However, it can also happen when combining your dataset with other datasets (the “Mosaic effect”). In that case, your data still contain personal data, even if the data in your own dataset are not identifiable by themselves.

    Linkage is often possible with demographic information (age, gender, country of origin, education, workplace information, etc.) and indirect identifiers (pseudonyms, device ID, etc.), for example:

    • In the year 2000, 87% of the United States population was found to be identifiable using a combination of their ZIP code, gender and date of birth. You can see for yourself on this website.
    • An agricultural company’s Uniek Bedrijfsnummer (UBN) can be used to search for the address of the company in the I&R mobile app. Often, this address is also the owner’s home address.
    • Geographical data tracking individuals are particularly sensitive because of the multiplicity of data points. This video nicely explains why.

  • De-identification is still reversible.
    This often happens when data are pseudonymised, but there is still a way to link the pseudonymised data with identifiable data, for example when a name-pseudonym key still exists.

You can assume that you are processing personal data when you collect data directly from people, even if the results of that collection are anonymous. But also when you use data that are observed or derived from people, even if those data were previously collected, made public or used for non-research purposes.

In short, even if you cannot find out someone’s real identity (name, address), the data you process can still contain personal data under the GDPR. Besides the examples mentioned here, there are many other examples of personal data. If you need help assessing whether or not your data contain personal data, please contact your privacy officer.