CLI tool to clean atlassian support.zip from various data
Even if Atlassian cleans some data out when creating a Confluence support zip, there still remains plenty information (amongst others also personal information) about your system and users.
This tool is designed to clean out the base URL and Usernames (see How it works what this means exactly).
(May work with other support zip files, but is only tested for Confluence.)
Python ≥ 3.5
./supportcleaner.py BASEURL [--supportzip SUPPORT_ZIP] [--filterfile FILTER_FILE]
MAX_TMP_DIR_SIZE: Set maximum allowed size in bytes of temporary directory for extraction. DELETE_AFTER_DAYS: Set maximum age for files, after which they should be flagged for deletion.
./supportcleaner.py my-base-url.net --supportzip Confluence_support_2019-02-14-11-43-28.zip
This tool does cleaning the support zip by
- extracting the zip file or giving instructions for placing relevant files manually in the temporary directory, if no supportzip is supplied
- replacing in files (defined via LOGDIR, default: all):
Example | Replacement |
---|---|
subdomain.BASEURL | URL_CLEANED |
userName: NAME_WITHOUT_SPACE | userName: USERNAME_CLEANED |
customer@business.com | EXTERNAL_EMAIL_SHA256:b6c4586387_CLEANED |
employee@URL_CLEANED | INTERNAL_EMAIL_SHA256:2fdc017705_CLEANED |
*.smhss.de | SMEDIA_DOMAIN_CLEANED |
127.0.0.1 | 127.0.0.IP_CLEANED |
----- BEGIN RSA PRIVATE KEY ----- ... | PRIVATE_KEY_CLEANED |
----- BEGIN CERTIFICATE ----- ... | CERTIFICATE_CLEANED |
Big Business LLC | BUSINESS_CLEANED |
Herr Müller | NAME_CLEANED |
-
remove files that are older than DELETE_AFTER_DAYS (default: 180) days (interactive)
-
remove files that are within the LARGEST_PERCENT of files (default: largest 10%) (interactive)
-
you can add additional filters by using the filterfile argument, which takes a filepath:
- each line of the file is treated as one filter
- each filter is splitted by "||" in two parts
- first: search pattern (Python regex), second: replacement
-
if supportzip was supplied: creating new zip named "cleaned.zip" (if already existing, it will be deleted at program start)
Several filters are already defined in the filters.txt. These can and should be modified and extended to suit your specific needs.
As is, there are filters for cleaning:
- Usernames (in a specific format, replacement includes truncated SHA256 hash)
- Baseurl (including subdomains)
- Mail addresses (differentiates between internal (contains URL_CLEANED) and external. Replacement includes truncated SHA256 hash of the local part)
- smhss.de-Domains
- IP-Addresses (only the last byte is redacted)
- Private keys and certificates (in a specific format)
- Business names
- Personal names in addresses (starting with 'Frau' or 'Herr')
Examples for these filters can be seen above. Some extended documentation is provided with the regexes in filters.txt.
Please be aware, that due to creating a new zip all original timestamps as well as file owners and permissions are lost.
Please be aware, that this tool cannot guarantee to cleanup all personal information or information related to your company.
MIT © 2019-2020 //SEIBERT/MEDIA GmbH