Data pseudonymisation
On this page: pseudonymous, de-identification, replacement, open science, reuse
Date of last review: 2023-03-30
YOUth (Youth of Utrecht) is a longitudinal child cohort study that collects data about the behavioural and cognitive development of children in the Utrecht area. The study follows about 4000 children and their parents in two cohorts. One from birth until around the age of six, one from around 9-years-old until adolescence. YOUth collects a wide variety of data types, ranging from questionnaires to biological samples. Because of the large amount of data and the sensitive nature of the data and the participants (minors), the data can be considered as very sensitive, and thus should be pseudonymised where possible.
General steps
YOUth is committed to sharing their data for reuse, and thus the datasets that they share should contain as little personal information as possible. For that purpose, the YOUth data manager implements a number of measures:
- All data are pseudonymised as much as possible (see below).
- Every dataset that is shared for reuse is first checked for identifiable information. Special category information is taken out of the datasets as much as possible, and no unnecessary information such as date of birth is shared.
- Using the tool AnonymoUUs, participant pseudonyms are replaced with artificial pseudonyms, and all dates with a fake date, each time a new set of data is prepared for sharing. This limits the ability of external researchers to link multiple requested datasets together and thus to form a more complete image of each participant. It also prevents singling out participants based on the day they visited the research centre.
Pseudonymisation per data type
Below is an overview of the data types and pseudonymisation measures taken by the YOUth data manager. Besides these pseudonymisation measures, YOUth has implemented a data request procedure which delineates the conditions under which researchers can access the data, and the steps they have to take to request access.
Questionnaire data (tabular)
Children and their parents/caretakers (sometimes their teacher) fill out several questionnaires about, among others, their mental and physical development, living conditions, and social environment.
Pseudonymisation measures:- A script removes unnecessary (special category) personal data from the shared dataset where possible, such as religion, ethnicity and open text responses.
- If a researcher needs demographic information only to describe the sample, the data manager shares a frequency table of the requested information, for example for ethnicity and socio-economic status, instead of sharing the raw responses.
- The Anonymouus tool replaces the pseudonym and date in the questionnaire data and file names.
In the future, the data manager would like to share only scale scores, instead of responses to individual questions in standardised questionnaires.
Computer tasks (tabular)
Logbook- and experiment book data (tabular)
Video tasks (video recording)
During two tasks (the Hand game and the Delay of gratification task), children are video- and audiotaped to be able to analyse their behaviour. Parents may also be visible in the background, as well as a research assistant.
To pseudonymise these data, both the videos from the Hand game and the Delayed gratification task will be coded/scored on the variables of interest (e.g., does the child take the candy out of the bag or not). This way, no actual video recordings will need to be shared with other researchers.
Parent-child interaction (video recording)
Magnetic Resonance Imaging (MRI) data (3D image)
MRI data of children are collected to study structural (3D image of the brain, skull, and outer layers of the head) and functional (brain activity) properties of the brain.
To pseudonymise the MRI data, structural MRI scans (DICOM) are defaced using mri_deface (v1.22), resulting in NIfTi files. Additionally, the AnonymoUUs tool replaces the pseudonym in the filenames.