We found the following Python packages to de-identify data, sorted alphabetically:
Name | Description | Data type | More info | License | Maintenance | GitHub stars |
---|---|---|---|---|---|---|
anonymoUUs | Replace identifiable strings in multiple files and folders at once using pattern matching | Text (.txt, .html, .json, .csv) | - | MIT | Active | 0-10 |
deduce | Rule-based (pattern matching) de-identification for Dutch clinical text | Text | Article, documentation | GPL-v3 | Active | 10-100 |
DeepPrivacy2 | Replaces real people with synthetically (GAN) generated people in images and videos | Images, videos | Documentation, article, video presentation | Apache 2.0 | Active | 100-500 |
Masked-Piper | Masking personal identities in visual recordings while preserving multimodal information, outputting kinematic overlays and time-series | Video | Article, notebook | MIT | Active | 0-10 |
MaskMyPy | Donut and street masking of geographic data | Geodataframes | Documentation | MIT | Active | 0-10 |
mysto | Generalization and masking of identifiers | Tabular data | On PyPi | Apache 2.0 | Inactive | 0-10 |
presidio | Automated sensitive data recognition, includes customization options | Text, images | Documentation | MIT | Active | 1000+ |
pynonymizer | Replaces sensitive data with fake data using faker | Relational databases | Package on PyPI | MIT | Active | 10-100 |
Textwash | Machine-learning-based, automated replacement of identifiable information | Unstructured text | The FAMTAFOS project is creating a version of this tool for Dutch text | GPL-v3 | Active | 0-10 |