privacy-engineering-tools

De-identification Tools - User interfaces

Here you can find tools meant for de-identifying (research) data and/or checking how well the de-identification process went, which do not require any programming experience in R or Python - most of the tools listed below are user interfaces, meaning there is no programming required.

Important note

Some of the below tools promise data “anonymization”, but it is by no means guaranteed that the dataset resulting from the de-identification process will indeed be anonymous under the GDPR. You can read more about this in the Data Privacy Handbook.

Tabular data

Sorted alphabetically, here are the most relevant de-identification tools we found:

Name Description Data type Privacy models More info License Maintenance GitHub stars
Amnesia Anonymization using generalization and masking, can export to Zenodo and Dataverse Relational tables, set collections, DICOM metadata k-anonymity, km-anonymity Online demo, tutorials, documentation BSD-3 Clause Active 100-500
ARX Anonymization using generalization and top- and bottom-coding + analyze privacy risks and utility Tabular (CSV, Excel, Database files) k-anonymity, l-diversity, t-closeness, β-likeness, δ-disclosure, k-Map, δ-presence, differential privacy, game-theoretic model Instruction video Apache 2.0 Active 500-1000
Datacheck Open source R package and web app to check the presence of common identifiers Tabular data (CSV) - Project report and demo MIT Active 0-10
mu-Argus Statistical Disclosure Control from CBS microdata (individual-level) Global recoding, local suppression, top and bottom coding, PRAM. User manual, more manuals EUPL-1.2 Active 0-10
OpenPseudonymiser Hashing software (registration required) Tabular (CSV) Hashing (SHA-256) Documentation GPL-v3 Inactive NA
sdcMicro R package and web app to apply generalization, top- and bottom coding, recoding + analyze privacy risks and utility Tabular data (.Rdata, .sav, .sasb7dat, .csv, .txt, .dta) k-anonymity Documentation, demo GPL-v2 Active 10-100
tau-Argus Statistical Disclosure Control from CBS tables (e.g., frequency tables) Recoding, suppression (hypercube, modular, network, optimal) User manual, more manuals EUPL-1.2 Active 0-10

You can find a more detailed comparison between ARX, mu-Argus, sdcMicro and Amnesia in Stenersen (2020) (persistent link).

Textual data

Name Description Data type More info License Maintenance GitHub stars
Stanford Named Entity Recognizer Java-based recognition interface of English entities (e.g., Person, Organization, Location) Text - GNU-GPL-v2 Inactive -
Text Anonymization Helper Microsoft Word plugin for finding and labelling potentially identifiable text Text Download Custom license Inactive -