#hd_MailToolbox
This is a collection of tools we use to handle too much mail. At Hudora we have users receiving more than 30 GB of mail per year and there are issues in effectively handling that load.
The whole code collection is somewhat ad-hoc but you might find on or another tool helpful.
Currently the following tools are published
- RemoveAttachments - archives and removes attachments
- departicularifier - Extracts files from message/partial MIME messages generated by NRG MP 161 scanfax appliances.
- gmailsignature - Set all signatures for all users on a Google Apps account.
- imap2html - Archive mails.
All Tools requre python, see requirements.txt
for the Python libraries
required. If you have pip installed type sudo pip install -r requirements.txt
to install all requirements.
Several tools also need a running CouchDB2 instance and acess to Amazon
S33. Set the environment variables AWS_ACCESS_KEY_ID
and
AWS_SECRET_ACCESS_KEY
to your Amazon access credentials. Set S3BUCKET
to
the S3 bucket name you desire to use, e.g. email.example.com
(don't use
exactly that - it's already taken). Create the S3 bucket you named in
S3BUCKET
.
A utility to help you handle too many email attachments.
It can archive attachments in a CouchDB/Amazon S3 database, remove attachments from emails, or both. Some filtering options allow you to restrict the search to only mails sent after a certain date, or mails that are larger than a certain size. The removing an attatchment, it is replaced by a link where users can download the attatchment. When modifying emails to remove attachments, this script attempts to preserve message flags and its internal date.
Typical use is to call the script on an account which is getting to big and remove all attachments from mails older than 6 Months and bigger than 100 kb. This usually greatly remduces mailbox size whil still keeping old mails searchable. Attachments can be accessed with the Archiive-IDs embedded in the
This script will recurse through all folders (mailboxes) on the server that you connect it to.
The CouchDB database will be created if it does not already exist.
Run the script with the --help parameter to see the options. It is self explanatory from there.
You almost certainly want to use --gmail when contacting a Gmail IMAP server.
Known issues:
- httplib2-0.5.0 has a bug which badly breaks couchdb-python. Use v0.4.0 http://code.google.com/p/couchdb-python/issues/detail?id=85
- Might be some minor behavioural issues with some email clients. For example, Mozilla Thunderbird only stores local records of manually-marked-as-unread emails. So those emails might be marked as read by this script, when it removes attachments.
- You can't delete mails over Gmail's IMAP interface, you have to use the web interface. The default remove-attachments settings will not quite work: removing the email from the IMAP server will delete it from all labels but it will still remain stored in "All Mail" and occupying space. To work around this, use the --gmail option, which works around this by copying the mails into the Trash folder before removing them from the original folder. Gmail will then delete the mail after 30 days, or when you manually empty trash from the web interface.
- The star on Gmail starred messages is lost when attachments are deleted. I can't find a way to apply this flag through the IMAP interface, and it does not apply the "\Flagged" mail flag on request.
- Because of the dance that we have to do to work with gmail, mails that are marked as unread in Gmail are displayed as read after this script removes the attachments. The Gmail server is not complying to the standards here and it does not seem possible to retain this flag without hacks that I wouldn't trust in the long run.
Django 1.1 application to display mails archived by RemoveAttachments.
Extracts files from message/partial MIME messages generated by NRG MP 161 scanfax appliances.
See this blogposting for further details.
Set all signatures for all users on a Google Apps account.
Archive mails.