Wednesday, February 27, 2008

SQL Server Reporting Services 2008, I wrote that feature!

The latest CTP for SQL Server 2008 was released today and it contains a feature I wrote from start to finish. You can now render reports to Microsoft Office Word format.

Why would someone want a report in the Word format? The scenario we pictured was a user would want to create a report they could then edit. For example, a specific use case would be a salesperson that generates an invoice using reporting services and then wants to tweak it a little (apply discounts, add some comps, etc.) before passing it on to the customer.

A friend at Microsoft sent me an email telling me that Steve B. made a point to mention it at the launch event. Cool! I'm sure many readers aren't big fans of Microsoft or binary formats but as a developer, I get a rush knowing that millions of people will be using something I created.

I spent most of my time at Microsoft working on a team that completely re-wrote the reporting engine for SQL Server 2008 for performance. I wrote the Word renderer towards the end of my time there after the re-write was tapering off.

Since performance was a central theme of the SSRS 2008 release, the Word renderer was written to be extremely fast and use as little memory as possible. With my casual testing, it would generate large reports in less time and with less memory than Word would use to open the file (I'm sure someone will end up proving me wrong on this :-p).

Anyway, I'm done patting myself on the back. Getting back to work now.

Tuesday, February 26, 2008

Joel Spolsky's very bad advice

I was pointed to an essay by good ol' Joel on file format release

It's kind of interesting, he was working for Microsoft in the relatively early days of Office so there are some insights if you're into that sort of thing. I had to laugh when I saw this:

You have a web-based application that needs to output existing Word files in PDF format. Here’s how I would implement that: a few lines of Word VBA code loads a file and saves it as a PDF using the built in PDF exporter in Word 2007. You can call this code directly, even from ASP or ASP.NET code running under IIS. It’ll work. The first time you launch Word it’ll take a few seconds. The second time, Word will be kept in memory by the COM subsystem for a few minutes in case you need it again. It’s fast enough for a reasonable web-based application.

I've done almost exactly what he recommends and it was a mess. One of my responsibilities at an earlier job was to write and maintain a program to automate converting Word documents, along with other file types to PDF in an enterprise server-based application. It seems so simple on the surface but it is a technical mine field.

It was way before Word 2007 (Word 2000 I think) and we used Adobe Acrobat Distiller. This is a "printer" that you can print to and it creates a pdf. There are free ones all over the place now. In Windows, there is a library function called ShellExecute that you can pass in a file name and a verb like "print", "open", etc. and Windows will take care of the rest. This is what we used for most applications.

Distiller had some special features for Word, Excel, and Powerpoint. For example, in Word, it would convert heading levels to bookmarks. These special features were integrated as part of the application and you had to initiate the conversion directly from the application. So this means we had to use VBA to initiate the PDF conversion for Word, Excel, and Powerpoint files.

The first thing we ran into was that the process would appear to hang for no reason, we would go to the server room and there would be a dialog box waiting for someone to click "OK". VBA scripting is very similar to automating a user interface. When Office was launched using VBA, it behaved identically to if a user sitting at the server terminal launched it. We searched for an application or VBA setting to repress any dialogs or other UI interactions. We exhaustively researched this and came to the conclusion that there was no way to prevent it from happening. I had to write an entire scripting language to listen for various window creation events and interact with the UI.

The next problem we ran into was that occassionally, an application would just hang for no reason. It wouldn't bring down the server or anything but it would hold up the conversion queue. We solved this by launching a timer thread that would time out and forcibly kill the process we had launched. This solution led to memory leaks and de-stabilized the server.

Finally, when we would close the application through VBA, sometimes, the process would stick around. This was completely unpredictable. The close function returned normally but if you ran task manager and saw 50 instances of WINWORD.EXE processes you knew sometithing was up.

Last time I talked to my old boss, he told me they had put this on a virtual server. Any time they have problems they just reboot their vm instance. Pretty cool. He did say that my scripting language has grown :-)

Saturday, February 9, 2008

The full-text search feature of your typical web app needs work.

In spite of a couple of dozen alternatives, Google has web-search pretty much locked up at the moment. What I think the world needs is a developer component you could drop into your application and use as the full-text search piece of your software that works as well as Google.

The problem is that any time I use the embedded search feature on a forum, wiki, blog, etc. It reminds me of doing a web search in 1997. It takes a lot of time to narrow down your search terms enough and/or click through pages of results to find what you are looking for. It seems very primitive. Yes, some web-facing apps can be indexed by Google so that is a workaround but many of these sit behind a firewall on intranets.

The Google mini is another solution but it's not something you can ship with your products. So your customers end up having to connect to it and maintain it and this adds to the overall cost of your product. Also, are the relevancy algorithms in the Google mini the same as web search? I would think there should be different heuristics for each app that should be tweakable by the developer.

There is Lucene, but I've been disappointed by the search results in applications that use Lucene for search, so I am assuming that there hasn't been a lot of progress for relevancy algorithms outside of term frequency. Although, it does indexing and searching really well.

This could be an area to focus on for one of the startups who are trying to challenge Google in the web-search market.