-
-
Notifications
You must be signed in to change notification settings - Fork 172
Using msg extractor In Your Own Code
Before anything else, you will first, of course, need to import the module.
import extract_msg
All example codes will be assuming that you have imported the module like so.
It is highly recommended that you do not open an MSG file directly using a class, but rather use openMsg
to do this. This function will automatically determine if the MSG file has support in any form, and if it does, which class to use for reading it.
msg = extract_msg.openMsg('path/to/msg/file.msg')
MSG files can be closed in the same way that a normal file can, simply using the close
method of the class.
msg.close()
MSGFile, the base class for all MSG file types, supports the __enter__
and __exit__
magic functions, allowing you to use the with
context manager with them. At the end of the with
context manager, the file will automatically be closed.
with extract_msg.openMsg('path/to/msg/file.msg') as msg:
# Do some stuff
openMsg
takes a filename for its first argument. It can also the raw bytes that would make up an MSG file or a file-like object. File-like objects at minimum will require a read
, seek
, tell
, and close
method. The read
method MUST return bytes and MUST return at most the number of bytes requested.
While most of the classes support saving, not all of them do. If you try to call a save method when a class doesn't have it, the method will raise a NotImplementedError
. The save
method requires no arguments, but it is likely that you will want to use some of them. The method also returns a reference to the current instance, allowing for you to chain certain methods directly. However, given the possibility of errors, it is not recommended to chain opening, saving, and closing all together, as this could lead to a file handle that doesn't properly get closed when you expect it to.
msg.save()
Additionally, MSGFile has the saveAttachments
method
While the previous section covered the absolute basics of using the module, this section will go over slightly more in depth usage, much of which is going over the advanced details of the functions mentioned in the Basic Usage section.
The openMsg
function takes a number of keyword arguments that can allow you to customize the way it (and the MSGFile instance it produces) behaves. The first and most important keyword argument is strict
. If this is set to False
, the function will return an instance of MSGFile (not a subclass) in the event that it cannot find a more suitable class to open it with. This allows your code to have at least minimal support for a file that would otherwise not be supported, so long as it uses the MSG standard. This argument is set to True
by default.
While strict
customizes the behavior of openMsg
directly, it also affects the behavior of attachments that are, themselves, embedded MSG files. While normally an unsupported embedded MSG file would be a source of NotImplementedError
s, strict
will stop those. This is because all keyword arguments given to openMsg
are given to the MSGFile when it is opened, and these keyword arguments (with the exception of those noted below) get used when opening an embedded MSG file.
The rest of the keyword arguments are exclusively used for customizing the behavior of the class used to open the MSGFile, and their descriptions are as follows, with an * marking those that do not apply to embedded MSG files of the file you are opening.
Arguments for all MSGFiles:
-
prefix
*: An advanced argument that is used internally for handling embedded MSG files. The value tells the code where to find the directory containing the data for the embedded MSG file. If you know exactly where in the main MSG file your desired file is and want to reduce the number of MSGFile instances produces, then this argument is for you. Otherwise, this should probably be left alone. -
parentMsg
*: Another advanced argument that is used internally for handling embedded MSG files. It's used for ensuring that only one OleFileIO instance is created when opening anMSGFile
(except in the case of signed messages, where embedded MSG files are not stored in the same way) and for syncing the named properties. Embedded MSG files have the details of the named properties stored in the top level MSG file, which reduces the data for shared streams. This also means that we can get away with parsing that data once and sharing the result. It is not recommended to ever set this yourself outside of the internal code. -
attachmentClass
: The class theMSGFile
will use for attachments, should they be supported by it. If not set, the defaultAttachment
class will be used. If you need to change the behavior of theAttachment
class in any way for your file, this is the way to do it. -
delayAttachments
: Delays the initialization of attachments until the user attempts to retrieve them. Setting this toTrue
is one of the ways MSG files with bad/unsupported attachments can be loaded. -
filename
: The filename to be used by default when saving. This is related to specific save arguments. If the argument used to open the MSG file is an actual path, this will default to that path. -
attachmentErrorBehavior
: The behavior to use in the event of an error when parsing the attachments. Should be an int or an instance ofextract_msg.enums.AttachErrorBehavior
. This is the other method of opening an MSG file with bad/unsupported attachments. -
overrideEncoding
: An encoding to use that overrides the value that is found in the MSG file. This is used if something is wrong with the value in the file that causes issues decoding the data, including no value being set. If you have manually set this and are getting encoding errors, do not report them.
Arguments for subclasses of MessageBase
:
-
recipientSeparator
: The separator string to use between recipients. -
ignoreRtfDeErrors
: If set toTrue
, tells the code to silently ignore any errors from the RTFDE module. RTFDE is used for extracting encapsulated HTML from the RTF body should no HTML body be found, however it is not perfect. Currently there are several critical errors, and the last commit was in January 2022. The developer is working on fixing many of these errors with a significant rewrite, but until then this argument can be used to try to work around it. Alternatively, the next argument can be used to override how deencapsulation works. -
deencapsulationFunc
: A callable that will override the way that HTML/text is deencapsulated from the RTF body. This function must take exactly 2 arguments, the first being the RTF body from the message (abytes
instance) and the second being an instance of the enumextract_msg.enums.DeencapType
which will tell the function what data has been requested. This function must return a string if the requested data is text, otherwise it must return bytes for the HTML. If an error occurs, the function must return either None or raise one of the appropriate exceptions fromextract_msg.exceptions
to signify what happened. If any other exceptions are thrown, they will not be caught by the class.
Arguments for subclasses of MessageSignedBase
:
-
recipientSeparator
: The separator string to use between recipients. -
signedAttachmentClass
: LikeattachmentClass
, except specifically used for handling signed attachments. Defaults to theSignedAtttachment
class.