Thursday, January 17, 2008

Microsoft to make binary file formats documentation public

I originally heard about this on the POI mailing list and it was later carried on slashdot. To summarize:
  1. Microsoft will let anyone download the full documentation for the binary file formats
  2. Microsoft will start an open source project to create tool to convert from the old binary formats into the new ones.

All of this is supposed to happen on or before Feb 15. I think it's very cool and exciting. It frees me from my NDA. This means I can participate in open source projects that do stuff with the Office binary file formats.

I'm not sure what this means for everyone else. I wouldn't expect APIs to work with the file formats to just start magically appearing. The documentation for the Office 97 file formats is been available since...well 1997. That is what most of my POI work was based on. Also, Microsoft would send anyone a copy of the most recent file format documentation if you sent an email to them and asked for it.

That's what I did when I formed my company. They sent me a hard copy of 5 specs within a week(Word 97-2007, Excel 2007, Excel 97-2003, and Powerpoint 97-2007, Microsoft Drawing format). I was already familiar with these documents because I used them when I worked at Microsoft and at SoftArtisans. They have not improved much since '97. New records have been added for the various new features but they still leave a lot to be desired with notable features being undocumented or underdocumented.

I'm more interested in seeing how this open source project goes. It could be really useful. Haha, I can see them just ripping out the C++ code from Word that does the conversion between doc and docx, slapping a main method on it and dumping it into a subversion repository.

2 comments:

Michael said...

Any plans to update textmining?

Ryan Ackley said...

With the next release of my product there will be an update to textmining.org. I'm shooting for the end of Jan. No promises though. I will be creating a sourceforge project for it. That way it's easier for the community to contribute.