Archive for December, 2007

evolving towards git

December 4th, 2007

I remember the first time I heard about version management. It was a bit like discovering that there is water. The idea that something so basic could actually not exist seemed altogether astounding afterwards. This was at a time when I hadn't done a lot of coding yet, so I hadn't suffered the pains of sharing code *that* much, but it was already relatable enough to understand the implications of version management. Emailing those zip files of code had been plenty annoying.

This happened just before we embarked upon a 6-month journey in college to write an inventory management application in java. There were four of us. We had done java before, but this was the first "big" application that also employed our newly acquired knowledge about databases. God only knows why, but they actually ran (and probably still do) an Oracle server at school for these projects. I don't know how much an academic license runs on that, but it's in the thousands of dollars no doubt. As the initiative taker of my group, I persuaded (not that it took much effort) the guys to forget about the Oracle server (which didn't come with any developer tools, just the awful command line interface) and use PostgreSQL (we rejected MySQL just to be on the safe side, as it lacked support for a bunch of things like views and foreign keys). I also introduced another deviation from the standard practice: CVS. I would be running both off of my tried and tested Pentium home server. :proud: We were the only group that used version management, which baffled me (why was it not offered as a service at the department?). I heard that the year after that they started offering CVS for projects. Incidentally, when you deviate off the path, you do expect to get some academic credit for the extra effort, obviously. ;)

So that was 6 months with CVS, and it worked quite well. We didn't do anything complicated with it, didn't know about tags or branches, just ran a linear development on it and that was it. And it was helpful (but not efficient) to track binary files (Word documents, bleh). But it had some annoying quirks. Not that we were all that concerned with tracking history back then, it was just about getting it finished and then obviously it would never be used, as school projects go. But the lack of renaming and moving in CVS was silly.

It's a bit funny, actually. CVS has been a standard for the longest time, and people have put up with its problems, until recently when a version management boom began. Why is that? I have a hunch that the relative calm in version management was kept intact by a predominantly corporate culture (and the corporates are obviously super conservative). But then once a certain number of people had gotten themselves into open source projects the need for better tools put this in motion. One of the first "new generation" systems was Subversion, the replacement for CVS. Subversion was adopted slowly, but quite steadily. Currently it's probably the "standard" for version management. I registered galleryforge with Berlios precisely because they offered svn services. Sourceforge also has it now, and people around the net have started talking in terms of svn a lot. My current school also uses it.

But with the momentum of version management systems in play, a lot of other ideas have surfaced besides those captured in Subversion. Arch and Darcs are two that employ the distributed paradigm. I don't think they have had much success though, and perhaps looking back they will have played the role of stepping stones rather than end products. A new (a third, if you will) generation of systems has appeared quite recently. Monotone is a newer system with its own network synchronization protocol. Mercurial seems to have enough support to replace Subversion/CVS down the line (not that it couldn't happen today, but people hate giving up something they're used to, so these things drag out). And there is Bazaar, which seems to discard the goal of being the best system and just aim to be easy and pleasant for small projects.

And somewhere in that mix we find Git, except that it has a much higher potential for success than other new systems, because it was launched into the biggest (or noisiest, at least) open source project and made to carry the heaviest loads right from the beginning, which is a good way to convince people that your product is robust. As such, it seems to me that git has had the quickest adoption of any system. It was launched in 2005 and it has already become one of the household names around the interwebs. So therefore I thought git is worth looking at. At this rate I may very well be using it on a project someday.

I was also encouraged by regular people saying it worked for them. A version management system for kernel developers is nice, but that doesn't necessarily make it right for everyone else. But here were people with one-man projects liking it, good sign. In fact, one of the things I used to read about git was "I like the simplicity".

What I discovered was that my dictionary was out of date, because it didn't carry that particular meaning of the word "simple". Git is not simple, it takes getting used to. It's one of those things you try and you don't understand, then you come back and try again and you make a little more progress and so on. I decided to open a git repo for undvd, both because I need to track the changes and so I can use git and get a feel for it.

The special thing about git is that it has well developed capabilities for interoperating with other systems. In other words, git doesn't assume it will wipe out everything else, it accepts the reality that some people will continue using whatever it is they're using (even CVS) and git can deal with that. Obviously, predicting the future is tricky, but I get the feeling that Subversion has hit that comfort spot where most people are satisfied with what it's doing and they really would need a lot of persuasion to learn something new. In other words, I expect Subversion to be around for quite a while. And therefore, git's svn tools may come in very handy. I haven't really looked at that part yet, but it's on my todo list.

So what is the future? I would say Subversion, Git, and maybe Mercurial. Version management systems are like programming languages - the more of them you use the easier it is to learn a new one. But switching is hard. When I'm thinking of checking something in version management I still begin typing svn st.. before I realize I'm using git.

what can you do with django?

December 2nd, 2007

So I'm reading about Django, which is a really groovy web framework for Python. As the Django people themselves say, there have been *a lot* of web frameworks for Python alone, and it's clear that a lot of lessons have been learned along the way. Personally I've only looked briefly at Zope before, but I was a bit intimidated by how complicated it seemed to be. I also once inherited a project that used Webware for Python, which was pretty nice to work with, but radically different from what I'd known because pages are constructed using a sort of abstract html notation (basically every html tag is a function etc).

There is a lot to like about Django. The MVC pattern is executed quite elegantly (or as they call it in Djangoland, the MTV (model - template - view) pattern). Templates are written in what resembles a server side language, but it's actually a specialized, both restricted and enhanced, Python-like templating language. Templates can inherit from each other to fill in placeholders in a cascading manner, very interesting design choice.

Then there is a nice separation between objects and persistence. Data is declared in terms of Python classes (called "the model") which are then translated somehow on-the-fly to sql-du-jour depending on your database backend. You can even run a Python shell to interact with your model to de facto edit your database.

The model and the template mechanism are both tied together with "views", which is the application code you write for your webapp. A cool thing here is that urls are decoupled from execution code, and the two things are mapped via regular expressions, so you have something like a mod_rewrite declaration (but nicer) that dispatches to functions.

All the same, if I were to work with Django I have a couple of concerns up front:

  1. Templates are written in html with inline Python-line expressions. This makes them both simple and transparent, because the html code that will be served is written by hand. However, considering the present situation with html and browser support, it would be nice if Django could enforce/validate html. But this would probably imply an abstraction over html (to prevent raw html being written) and a layer of code generation for html (which may or may not be easy/possible to cache efficiently).
  2. And that brings me to the second point. I have seen no mention of CSS anywhere. It appears that Django leaves the fairly heavy burden of supplying styles to the designer, without any intelligent mechanisms to help.
  3. Data models are represented in Python code, which is very convenient and provides a much sought after quick access to the contents of your database. Django will instantiate tables for you based on a model, but only for creation. It doesn't seem to enforce or update the correspondence between the Python model and the database schema. The Django book has a mention of strategies for keeping things in check, but this particular section is missing.

There is no doubt Django embodies a lot of what we've learnt about building web applications, but currently I would say it seems a bit incomplete (also pre v1.0 still), if terribly elegant in some parts.

Another question concerns performance. Ruby on Rails has been known to suffer when pitted against the effective but ugly php. Does Django fall in the same bracket or does it have a lower overhead? In general Python is an older and more refined environment, so one might expect Django to scale better.