Step-by-step de-identification

On this page: anonymous, pseudonymous, step-by-step, workflow, deidentification, safeguard, protection measure
Date of last review: 2023-05-02

Below is a step-by-step workflow that you can use to de-identify your data. Alternatively, you could also use this de-identification plan template to plan and document your de-identification steps. Whether or not the de-identification results in a pseudonymised or an anonymised dataset is highly dependent on the characteristics of the dataset and the context in which it was obtained.

  1. Perform the de-identification in a safe storage or processing environment: remember that you are working with personal data, and as long as the data are not anonymous, they will be subject to the GDPR!

  2. Identify any potentially identifying information in your data.

  3. Assess whether you need to collect this information at all. For example:
    1. Do you really need IP addresses in your survey data?
    2. Do you really need to record audio or video?
    3. Do you really need a consent form with a name, contact information, and signature on it?
    4. Replace names with pseudonyms in filenames and within the data where possible.

  4. If you do not need directly identifying information to answer your research question, but you do need it to, for example, contact data subjects:
    1. Separate directly identifying information from the research data.
    2. Use pseudonyms or hashes to refer to individuals instead of names.
    3. Create a keyfile to link the pseudonyms to the names.
    4. Store the directly identifiable information and the keyfile in a separate location from the research data and/or in encrypted form.

  5. Consider which types of information may lead to indirect identification, such as demographic information (age, education, occupation, etc.), geolocation, specific dates, medical conditions, unique personal characteristics, open text responses, etc.

  6. De-identify the directly and indirectly identifiable data using (a selection of) the techniques described on the next page.
    1. Before you start, save a copy of the raw, untouched dataset, in case anything in the process goes wrong.
    2. Document the steps you took, for example in a programming script or README file, which always accompanies the data.
    3. Whether you can delete the raw (non-pseudonymised) version of the dataset, depends on whether it needs to be preserved for verification purposes. Specific restrictions may also apply if the Dutch Medical Research Involving Human Subjects Act (WMO) and/or Good Clinical Practice apply to your research.

  7. Treat the data according to their sensitivity. If the data are not fully anonymised, they are pseudonymous and thus still need to be handled according to the GDPR guidelines!

How de-identified is de-identified enough? You can read more about this in the chapter Statistical approaches to privacy.