We found the following Python packages to de-identify data, sorted alphabetically:
| Name | Description | Data type | More info | License | Maintenance | GitHub stars |
|---|---|---|---|---|---|---|
| anonymoUUs | Replace identifiable strings in multiple files and folders at once using pattern matching | Text (.txt, .html, .json, .csv) | - | MIT | Active | 0-10 |
| deduce | Rule-based (pattern matching) de-identification for Dutch clinical text | Text | Article, documentation | GPL-v3 | Active | 10-100 |
| DeepPrivacy2 | Replaces real people with synthetically (GAN) generated people in images and videos | Images, videos | Documentation, article, video presentation | Apache 2.0 | Active | 100-500 |
| Masked-Piper | Masking personal identities in visual recordings while preserving multimodal information, outputting kinematic overlays and time-series | Video | Article, notebook | MIT | Active | 0-10 |
| MaskMyPy | Donut and street masking of geographic data | Geodataframes | Documentation | MIT | Active | 0-10 |
| mysto | Generalization and masking of identifiers | Tabular data | On PyPi | Apache 2.0 | Inactive | 0-10 |
| presidio | Automated sensitive data recognition, includes customization options | Text, images | Documentation | MIT | Active | 1000+ |
| pynonymizer | Replaces sensitive data with fake data using faker | Relational databases | Package on PyPI | MIT | Active | 10-100 |
| Textwash | Machine-learning-based, automated replacement of identifiable information | Unstructured text | The FAMTAFOS project is creating a version of this tool for Dutch text | GPL-v3 | Active | 0-10 |