Wednesday, August 22, 2007

ASP.NET vs. JBoss, Hibernate, Spring, WebWork, Velocity, Sitemesh...

I saw this blog about Red Hat Developer Studio. There was this comment that struck a chord with me:
Wicket + Spring + Hibernate/JPA + Tomcat, all running in MyEclipse. JSF?
Seam? Ugh. Layers upon layers, and a configuration nightmare.

I love Java. I was working for Microsoft for almost 2 years and it was so refreshing to return to Java development. There is a rich community here and a lot of interesting things happening. That being said, I think our platform can be intimidating to newcomers because we have layers upon layers of abstractions for Web applications. Compare this to the competition, ASP.NET. They have one version of the truth. As a developer on the Microsoft platform, you have to learn a single web application technology, and you're set. Wherever you want to go work, they use the same thing. This leads me to the conclusion that from the standpoint of a business looking to hire developers, the cost for a Window's developers is cheaper because the chances of finding someone who is an expert at ASP.NET are better than the chances of finding an expert at Java web framework X. This worries me.

One thing I thought was odd about ASP.NET is they just seem to ignore the problem of ORM (Object-Relational Mapping). SQL Queries are happily embedded throughout various source files. Many .NET devs just don't see it as that big of a problem I guess. This is probably because Visual Studio provides a GUI for creating and editing SQL and binding data to UI elements.

It's actually really slick for about 75% of what you need to do. That last 25% is so painful. It's difficult to do something in ASP.NET that Microsoft didn't anticipate you doing. For example, in ASP.NET, you can drag and drop a few controls here and there and voila, you have a complete user management system with encrypted passwords and it hooks right into the form authentication scheme for your web app. However, there was one time I wanted to associate my users to a record in a database table in my application. It turned out to be ridiculously difficult since the user database was completely opaque because it was outside of my control.

I've found that Java web frameworks are technically more eloquent solutions. I believe this is because the most popular ones have fought their way to the top based on their merits instead of using a monopoly. But having a unified vision isn't so bad either. I like ASP.NET's simplicity for basic web apps. I wish we had that for Java.

Wednesday, August 1, 2007

The Word file format as a career

As I mentioned in an earlier post, I used to work on a technical document management system for NASA. We did a simple analysis and 95% of the hundreds of thousands of documents in our system were binary Word documents (.doc). So I started writing a java library for reading and writing binary Word documents in my spare time to eventually sell as a commercial product. I worked on it on and off and eventually I moved onto other things. Around this time the POI project was formed under Apache Jakarta by Andrew C. Oliver and Marc Johnson. I contacted Andy and donated what I had done so far to POI. This became the codebase for the Word piece of POI (HWPF).

Over the course of 2 years, the project never really took off at POI. I never had the time to really make it into what it could be and I was pretty much the only contributor. It barely reached a semi-working state. I launched a spin-off project so Lucene users would have a simple way to extract text from Word documents here (http://www.textmining.org/)

In 2004, the project I was working on at NASA was waning. I was getting bored so I contacted SoftArtisans and applied for a job. The main selling point was my expertise with the Word file format and SoftArtisans' loved that because one of their products was an API for creating reports in the Word file format. I joined SoftArtisans and became the lead developer for their WordWriter product. This was part of the OfficeWriter suite of products. We released a complete API for reading and writing the Microsoft Word binary file format in late spring 2005. In November of that same year, the OfficeWriter intellectual property was purchased by Microsoft. I moved to Redmond to work on SQL Server Reporting Services as a Microsoft employee.

I left Microsoft in April of this year. What has surprised me over the last few years is that most interest in the Word file format has been for reporting applications. I originally learned the format because I saw the need to extract information from a proprietary format for collaboration. Not the other way around.

A side note about Microsoft. You may be asking yourself why Microsoft would buy an API that reads and writes their own binary format. We were bought by the SQL Server group and my take on it was that the Microsoft Office team is very much against doing anything with the binary formats outside of their respective applications. I'm guessing that from an engineering point of view, the formats (especially Word) have become unmaintainable behomeths. I was told by every person that I came in contact with from the Office team without exception: "Do not, I repeat, do not under any circumstances attempt to write the binary Word file format outside of Microsoft Word, use the new XML format". I chuckle when I see the stories on Slashdot about the file formats (and other MS topics) because it just isn't the way people think it is. They are true believers in the new XML formats ;-)

Ironically, for my own product, I wrote absolutely zero code to work with the Word file format. I'm using a third party java component called Aspose.Words. When I worked at SoftArtisans, they were our main competitor. It has an easy to understand API and most importantly they have excellent support. Most questions I had were already answered in their support forums but when I did post a question, I always received a response from an employee. With all of the hubbub about Office 2.0 and Web 2.0, I'm surprised I haven't seen them mentioned more. They have a full line of Java products that work with Office files on the server. Go check them out.


Introducing the ultimate Wiki editor...Word

I just released a beta of my product that allows you to use Microsoft Word as a WYSIWYG wiki editor for Confluence. It's called Word/DAV. Go check it out here http://www.benryan.com/.

Personally, I enjoy writing wiki text. It's very much the progammer in me. There is a satisfaction to writing in plain text and then seeing it transformed into something else. 5 years ago, I was working on a technical document management system for NASA. I saw first hand a lot of the pain points of enterprise collaboration. The biggest roadblock I saw to enabling better online collaboration was that most documents were a black box. You couldn't get the content in and out of them without the host application. This made it impossible to do a lot of interesting things that people wanted automated.
  • Diffing and merging rich content.
  • Building a document from multiple smaller documents
  • Email Notifications that include the actual content that changed
  • Approve/Reject changes in their browser

Around this time I discovered wikis and fell in love. I saw the potential for wikis to be a format-neutral way to store information and to do these types of things. Confluence is by far the best wiki out there and it does some of the above things...but not as well as it could.

I would like to see wikis replace traditional document management systems in the enterprise. Word/DAV is my first step towards that goal. It creates a bridge between the old way of doing things and the new. I hope someone out there finds it useful :-)