Here you can find tools meant for de-identifying (research) data and/or checking how well the de-identification process went, which do not require any programming experience in R or Python - most of the tools listed below are user interfaces, meaning there is no programming required.
Some of the below tools promise data “anonymization”, but it is by no means guaranteed that the dataset resulting from the de-identification process will indeed be anonymous under the GDPR. You can read more about this in the Data Privacy Handbook.
Sorted alphabetically, here are the most relevant de-identification tools we found:
Name | Description | Data type | Privacy models | More info | License | Maintenance | GitHub stars |
---|---|---|---|---|---|---|---|
Amnesia | Anonymization using generalization and masking, can export to Zenodo and Dataverse | Relational tables, set collections, DICOM metadata | k-anonymity, km-anonymity | Online demo, tutorials, documentation | BSD-3 Clause | Active | 100-500 |
ARX | Anonymization using generalization and top- and bottom-coding + analyze privacy risks and utility | Tabular (CSV, Excel, Database files) | k-anonymity, l-diversity, t-closeness, β-likeness, δ-disclosure, k-Map, δ-presence, differential privacy, game-theoretic model | Instruction video | Apache 2.0 | Active | 500-1000 |
Datacheck | Open source R package and web app to check the presence of common identifiers | Tabular data (CSV) | - | Project report and demo | MIT | Active | 0-10 |
mu-Argus | Statistical Disclosure Control from CBS | microdata (individual-level) | Global recoding, local suppression, top and bottom coding, PRAM. | User manual, more manuals | EUPL-1.2 | Active | 0-10 |
OpenPseudonymiser | Hashing software (registration required) | Tabular (CSV) | Hashing (SHA-256) | Documentation | GPL-v3 | Inactive | NA |
sdcMicro | R package and web app to apply generalization, top- and bottom coding, recoding + analyze privacy risks and utility | Tabular data (.Rdata, .sav, .sasb7dat, .csv, .txt, .dta) | k-anonymity | Documentation, demo | GPL-v2 | Active | 10-100 |
tau-Argus | Statistical Disclosure Control from CBS | tables (e.g., frequency tables) | Recoding, suppression (hypercube, modular, network, optimal) | User manual, more manuals | EUPL-1.2 | Active | 0-10 |
You can find a more detailed comparison between ARX, mu-Argus, sdcMicro and Amnesia in Stenersen (2020) (persistent link).
Name | Description | Data type | More info | License | Maintenance | GitHub stars |
---|---|---|---|---|---|---|
Stanford Named Entity Recognizer | Java-based recognition interface of English entities (e.g., Person, Organization, Location) | Text | - | GNU-GPL-v2 | Inactive | - |
Text Anonymization Helper | Microsoft Word plugin for finding and labelling potentially identifiable text | Text | Download | Custom license | Inactive | - |