privacy-engineering-tools

De-identification Tools in Python

We found the following Python packages to de-identify data, sorted alphabetically:

Name Description Data type More info License Maintenance GitHub stars
anonymoUUs Replace identifiable strings in multiple files and folders at once using pattern matching Text (.txt, .html, .json, .csv) - MIT Active 0-10
deduce Rule-based (pattern matching) de-identification for Dutch clinical text Text Article, documentation GPL-v3 Active 10-100
DeepPrivacy2 Replaces real people with synthetically (GAN) generated people in images and videos Images, videos Documentation, article, video presentation Apache 2.0 Active 100-500
Masked-Piper Masking personal identities in visual recordings while preserving multimodal information, outputting kinematic overlays and time-series Video Article, notebook MIT Active 0-10
MaskMyPy Donut and street masking of geographic data Geodataframes Documentation MIT Active 0-10
mysto Generalization and masking of identifiers Tabular data On PyPi Apache 2.0 Inactive 0-10
presidio Automated sensitive data recognition, includes customization options Text, images Documentation MIT Active 1000+
pynonymizer Replaces sensitive data with fake data using faker Relational databases Package on PyPI MIT Active 10-100
Textwash Machine-learning-based, automated replacement of identifiable information Unstructured text The FAMTAFOS project is creating a version of this tool for Dutch text GPL-v3 Active 0-10