Saturday, March 15, 2008
Why did Java fail as an internet client platform?
Monday, March 3, 2008
To test my software on a Mac, I have to buy one?
I haven't used a Mac since college. They look beautiful and I am looking forward to having one. However, my mind has been trained to view a computer as a sum of upgradeable parts not a single shrink-wrapped device like an iPod. What would happen if Apple became the dominant computer system. They would have a software + hardware monopoly. Very scary.
Wednesday, February 27, 2008
SQL Server Reporting Services 2008, I wrote that feature!
Why would someone want a report in the Word format? The scenario we pictured was a user would want to create a report they could then edit. For example, a specific use case would be a salesperson that generates an invoice using reporting services and then wants to tweak it a little (apply discounts, add some comps, etc.) before passing it on to the customer.
A friend at Microsoft sent me an email telling me that Steve B. made a point to mention it at the launch event. Cool! I'm sure many readers aren't big fans of Microsoft or binary formats but as a developer, I get a rush knowing that millions of people will be using something I created.
I spent most of my time at Microsoft working on a team that completely re-wrote the reporting engine for SQL Server 2008 for performance. I wrote the Word renderer towards the end of my time there after the re-write was tapering off.
Since performance was a central theme of the SSRS 2008 release, the Word renderer was written to be extremely fast and use as little memory as possible. With my casual testing, it would generate large reports in less time and with less memory than Word would use to open the file (I'm sure someone will end up proving me wrong on this :-p).
Anyway, I'm done patting myself on the back. Getting back to work now.
Tuesday, February 26, 2008
Joel Spolsky's very bad advice
It's kind of interesting, he was working for Microsoft in the relatively early days of Office so there are some insights if you're into that sort of thing. I had to laugh when I saw this:
You have a web-based application that needs to output existing Word files in PDF format. Here’s how I would implement that: a few lines of Word VBA code loads a file and saves it as a PDF using the built in PDF exporter in Word 2007. You can call this code directly, even from ASP or ASP.NET code running under IIS. It’ll work. The first time you launch Word it’ll take a few seconds. The second time, Word will be kept in memory by the COM subsystem for a few minutes in case you need it again. It’s fast enough for a reasonable web-based application.
I've done almost exactly what he recommends and it was a mess. One of my responsibilities at an earlier job was to write and maintain a program to automate converting Word documents, along with other file types to PDF in an enterprise server-based application. It seems so simple on the surface but it is a technical mine field.
It was way before Word 2007 (Word 2000 I think) and we used Adobe Acrobat Distiller. This is a "printer" that you can print to and it creates a pdf. There are free ones all over the place now. In Windows, there is a library function called ShellExecute that you can pass in a file name and a verb like "print", "open", etc. and Windows will take care of the rest. This is what we used for most applications.
Distiller had some special features for Word, Excel, and Powerpoint. For example, in Word, it would convert heading levels to bookmarks. These special features were integrated as part of the application and you had to initiate the conversion directly from the application. So this means we had to use VBA to initiate the PDF conversion for Word, Excel, and Powerpoint files.
The first thing we ran into was that the process would appear to hang for no reason, we would go to the server room and there would be a dialog box waiting for someone to click "OK". VBA scripting is very similar to automating a user interface. When Office was launched using VBA, it behaved identically to if a user sitting at the server terminal launched it. We searched for an application or VBA setting to repress any dialogs or other UI interactions. We exhaustively researched this and came to the conclusion that there was no way to prevent it from happening. I had to write an entire scripting language to listen for various window creation events and interact with the UI.
The next problem we ran into was that occassionally, an application would just hang for no reason. It wouldn't bring down the server or anything but it would hold up the conversion queue. We solved this by launching a timer thread that would time out and forcibly kill the process we had launched. This solution led to memory leaks and de-stabilized the server.
Finally, when we would close the application through VBA, sometimes, the process would stick around. This was completely unpredictable. The close function returned normally but if you ran task manager and saw 50 instances of WINWORD.EXE processes you knew sometithing was up.
Last time I talked to my old boss, he told me they had put this on a virtual server. Any time they have problems they just reboot their vm instance. Pretty cool. He did say that my scripting language has grown :-)
Saturday, February 9, 2008
The full-text search feature of your typical web app needs work.
The problem is that any time I use the embedded search feature on a forum, wiki, blog, etc. It reminds me of doing a web search in 1997. It takes a lot of time to narrow down your search terms enough and/or click through pages of results to find what you are looking for. It seems very primitive. Yes, some web-facing apps can be indexed by Google so that is a workaround but many of these sit behind a firewall on intranets.
The Google mini is another solution but it's not something you can ship with your products. So your customers end up having to connect to it and maintain it and this adds to the overall cost of your product. Also, are the relevancy algorithms in the Google mini the same as web search? I would think there should be different heuristics for each app that should be tweakable by the developer.
There is Lucene, but I've been disappointed by the search results in applications that use Lucene for search, so I am assuming that there hasn't been a lot of progress for relevancy algorithms outside of term frequency. Although, it does indexing and searching really well.
This could be an area to focus on for one of the startups who are trying to challenge Google in the web-search market.
Thursday, January 17, 2008
Microsoft to make binary file formats documentation public
- Microsoft will let anyone download the full documentation for the binary file formats
- Microsoft will start an open source project to create tool to convert from the old binary formats into the new ones.
All of this is supposed to happen on or before Feb 15. I think it's very cool and exciting. It frees me from my NDA. This means I can participate in open source projects that do stuff with the Office binary file formats.
I'm not sure what this means for everyone else. I wouldn't expect APIs to work with the file formats to just start magically appearing. The documentation for the Office 97 file formats is been available since...well 1997. That is what most of my POI work was based on. Also, Microsoft would send anyone a copy of the most recent file format documentation if you sent an email to them and asked for it.
That's what I did when I formed my company. They sent me a hard copy of 5 specs within a week(Word 97-2007, Excel 2007, Excel 97-2003, and Powerpoint 97-2007, Microsoft Drawing format). I was already familiar with these documents because I used them when I worked at Microsoft and at SoftArtisans. They have not improved much since '97. New records have been added for the various new features but they still leave a lot to be desired with notable features being undocumented or underdocumented.
I'm more interested in seeing how this open source project goes. It could be really useful. Haha, I can see them just ripping out the C++ code from Word that does the conversion between doc and docx, slapping a main method on it and dumping it into a subversion repository.