On this page: data donation, digital trace data, local processing, interactive
Date of last review: 2023-05-15
People leave all kinds of digital traces, for example on social media, smartphones, search engines, email, banks, energy providers, and online shops. Article 15 of the GDPR mandates that individuals have the right to request access to a copy of their personal data collected and stored on those digital platforms. The owners of those platforms will then make that digital trace data available through individual “Data Download Packages” (DDPs). These DDPs can contain a lot of personal information that may be too sensitive to share, but can also be of high value for scientific research purposes.
The data donation approach as developed by Boeschoten et al. (2020) allows researchers to automatically analyse the digital traces found in DDPs, while preserving the privacy of research participants. The workflow is as follows:
- Data subjects are recruited as respondents like in a regular research project.
- The researcher determines which DDPs are relevant for the research question and writes a script to extract the relevant information.
- Each data subject requests their DDPs with the selected providers and stores these locally on their own device.
- Stored DDPs are then locally processed with the software PORT to extract relevant research variables. PORT executes the script provided by the researcher and locally extracts the data from the DDP. PORT uses Pyodide technology to run in its own secure environment which is completely separated from the device. This environment is destroyed as soon as the browser page is closed.
- The data subject inspects the information resulting from the analysis and is asked to provide informed consent to share it with the researcher.
- If the data subject consents, the derived information is encrypted and sent to the researcher for further analysis.
When to use
The data donation approach can be used:
- As an alternative or in addition to surveys to study human behaviour.
- To analyse data that are too sensitive to transfer in raw form.
- To allow data subjects a large degree of control over their (personal) data.
- To access data that are representative of a population of interest, in contrast to, for example, data retrieved from APIs, which often pertain a non-random subset of a platform’s user group.
- As a user-centric approach that is independent from platforms or data controllers: private companies cannot suddenly withdraw from a collaboration or restrict access to a dataset, because the data were not obtained directly through them. It is important, however, to review the Terms of Service of the platforms you use, to review if there are restrictions on data usage for scientific research.
Implications for research
- DDPs may be large and contain different types of information. For both the analysis (writing the script) and the informed consent (informing data subjects specifically), it is important that you know which specific data are of interest.
- The structure of DDPs varies by provider and by person, making it difficult to set up analysis scripts generically. Moreover, DDPs change over time. Analysis scripts should regularly be checked and updated (Boeschoten et al., 2021).
- Analysis scripts are usually developed based on sample data. However, due to the sensitive content, DDP sample data are difficult to obtain. As sample data, you could use your own DDP, synthetic data (example), or already available open data (example).
- It is important to make sure that data subjects understand what they are consenting to when presenting the results that will be shared with you (step 5). Do they understand the risks involved (if any)? We recommend talking to a privacy officer and/or testing this among data subjects before you start your Data Donation project.
Examples and resources
- Several projects have made use of the data donation approach, such as one using Google semantic location history data, and one with Whatsapp data.
- A more elaborate data donation platform is being developed in the PDI-SSH-funded Digital Data Donation Infrastructure (D3I) project.
- Read further about the framework, the proof of concept of the PORT software, a comparison with a browser plug-in approach, and promises and pitfalls of the approach for social media data.
- Or read more about how Data Download Packages can be de-identified.