Tuesday, February 26, 2008

Joel Spolsky's very bad advice

I was pointed to an essay by good ol' Joel on file format release

It's kind of interesting, he was working for Microsoft in the relatively early days of Office so there are some insights if you're into that sort of thing. I had to laugh when I saw this:

You have a web-based application that needs to output existing Word files in PDF format. Here’s how I would implement that: a few lines of Word VBA code loads a file and saves it as a PDF using the built in PDF exporter in Word 2007. You can call this code directly, even from ASP or ASP.NET code running under IIS. It’ll work. The first time you launch Word it’ll take a few seconds. The second time, Word will be kept in memory by the COM subsystem for a few minutes in case you need it again. It’s fast enough for a reasonable web-based application.

I've done almost exactly what he recommends and it was a mess. One of my responsibilities at an earlier job was to write and maintain a program to automate converting Word documents, along with other file types to PDF in an enterprise server-based application. It seems so simple on the surface but it is a technical mine field.

It was way before Word 2007 (Word 2000 I think) and we used Adobe Acrobat Distiller. This is a "printer" that you can print to and it creates a pdf. There are free ones all over the place now. In Windows, there is a library function called ShellExecute that you can pass in a file name and a verb like "print", "open", etc. and Windows will take care of the rest. This is what we used for most applications.

Distiller had some special features for Word, Excel, and Powerpoint. For example, in Word, it would convert heading levels to bookmarks. These special features were integrated as part of the application and you had to initiate the conversion directly from the application. So this means we had to use VBA to initiate the PDF conversion for Word, Excel, and Powerpoint files.

The first thing we ran into was that the process would appear to hang for no reason, we would go to the server room and there would be a dialog box waiting for someone to click "OK". VBA scripting is very similar to automating a user interface. When Office was launched using VBA, it behaved identically to if a user sitting at the server terminal launched it. We searched for an application or VBA setting to repress any dialogs or other UI interactions. We exhaustively researched this and came to the conclusion that there was no way to prevent it from happening. I had to write an entire scripting language to listen for various window creation events and interact with the UI.

The next problem we ran into was that occassionally, an application would just hang for no reason. It wouldn't bring down the server or anything but it would hold up the conversion queue. We solved this by launching a timer thread that would time out and forcibly kill the process we had launched. This solution led to memory leaks and de-stabilized the server.

Finally, when we would close the application through VBA, sometimes, the process would stick around. This was completely unpredictable. The close function returned normally but if you ran task manager and saw 50 instances of WINWORD.EXE processes you knew sometithing was up.

Last time I talked to my old boss, he told me they had put this on a virtual server. Any time they have problems they just reboot their vm instance. Pretty cool. He did say that my scripting language has grown :-)

2 comments:

Anonymous said...

Pretty hilarious actually because I had to do something similar to get PDF output from Crystal Reports in the days before it had that option. The PDF library was awful and we'd have similar instances where you'd have dialogs on the screen (waiting for OK), many hung processes, etc.

Joel is a smart guy but that doesn't mean all of his ideas are going to be good ones ;) There are good PDF libraries out there now that are designed for server environments. Better to use those than hack something together with Word and VBA. Sheesh.

Anonymous said...

Oh...that library was ActivePDF when it first came out. Looks like they're still around but I haven't looked at that product in years.