Tuesday, February 13, 2007

What is an Outlook MSG file?




This is going to get technical real fast. But since you really wanted to know, here is your answer.




Short Answer:


Outlook MSG files are outlook messages saved as files. They are saved as a COM structured storage OLE2 compound document, which is the same technique used by various Microsoft applications like Word, Excel, etc.



Um OK, that's great but it does not help me..




Sorry, but it is what it is. If you want to really understand all this then put on your propeller hat, give it a good hard spin, fill your pocket protector with fresh pens, re-tape your glasses, and read on....




So what is Structured Storage?



Structured storage (variously also known as COM structured storage or OLE structured storage) is a technology developed by Microsoft as part of its Windows operating system for storing hierarchical data within a single file. Strictly speaking, the term structured storage refers to a set of COM interfaces that a conforming implementation must provide, and not to a specific implementation, nor to a specific file format, like Outlook MSG files (in fact, a structured storage implementation need not store its data in a file at all). In addition to providing a hierarchical structure for data, structured storage may also provide a limited form of transactional support for data access. Microsoft provides an implementation that supports transactions, as well as one that does not (called simple-mode storage, the latter implementation is limited in other ways as well, although it performs better).

Structured storage is widely used in Microsoft Office applications, although newer releases (starting with Office 2007) will use a new XML-based format by default. It is also an important part of both COM and the related Object Linking and Embedding (OLE) technologies. Other notable applications of structured storage include MSSQL, the Windows shell, and many third-party CAD programs.

What was the motivation to use Structured Storage?

Structured storage addresses some inherent difficulties of storing multiple data objects within a single file. One difficulty arises when an object persisted in the file changes in size due to an update. If the application that is reading/writing the file expects the objects in the file to remain in a certain order, everything following that object's representation in the file may need to be shifted backward to make room if the object grows, or forward to fill in the space left over if the object shrinks. If the file is large, this could be a costly operation. Of course, there are many possible solutions to this difficulty, but often the application programmer does not want to deal with low level details such as binary file formats.

Structured storage provides an abstraction known as a stream, represented by the interface IStream. A stream is conceptually very similar to a file, and the IStream interface provides methods for reading and writing similar to file input/output. A stream could reside in memory, within a file, within another stream, etc., depending on the implementation. Another important abstraction is that of a storage, represented by the interface IStorage. A storage is conceptually very similar to a directory on a file system. Storage's can contain streams, as well as other storage's.

If an application wishes to persist several data objects to a file, one way to do so would be to open an IStorage that represents the contents of that file and save each of the objects within a single IStream. One way to accomplish the latter is through the standard COM interface IPersistStream. OLE depends heavily on this model to embed objects within documents.

The format

Microsoft's implementation uses a file format known as compound files, and all of the widely deployed structured storage implementations read and write this format. Compound files use a FAT-like structure to represent storage's and streams. Chunks of the file, known as sectors (these may or may not correspond to sectors of the underlying file system), are allocated as needed to add new streams and to increase the size of existing streams. If streams are deleted or shrink leaving unallocated sectors, these sectors can be reused for new streams.




Conclusion




Although you could, not easily, write code that could open an MSG file, the exact structure of the streams within the file are not documented and therefore, you would have no reference as to what the individual streams represent, the type of data within the stream, and how to consume the data properly. Furthermore, since Microsoft could change the way data is stored in the MSG file at any time, you would be plagued with support for the rest of your life for whatever solution you try to build on your own. For this reason, Microsoft does not document the file structure and the only method they provide for working with Outlook MSG files is the Outlook client.


There is a happy ending to this story!

So, if you need tools to work with the Outlook MSG file format, and don't have time to play pin the tale on the donkey,visit the MSG Technologies page for a complete set of applications and tools for working with Outlook MSG files. They have done all the hard work so you can simply focus on delivering your solution without needing to understand the underlying structure of a Outlook MSG file.
Good news, you can toss that propeller hat and loose the pocket protector!
Cheers!

2 comments:

Anonymous said...

This post rocks! Nice work...

Anonymous said...

This is great info to know.