Archive for 2007

evolving towards git

December 4th, 2007

I remember the first time I heard about version management. It was a bit like discovering that there is water. The idea that something so basic could actually not exist seemed altogether astounding afterwards. This was at a time when I hadn't done a lot of coding yet, so I hadn't suffered the pains of sharing code *that* much, but it was already relatable enough to understand the implications of version management. Emailing those zip files of code had been plenty annoying.

This happened just before we embarked upon a 6-month journey in college to write an inventory management application in java. There were four of us. We had done java before, but this was the first "big" application that also employed our newly acquired knowledge about databases. God only knows why, but they actually ran (and probably still do) an Oracle server at school for these projects. I don't know how much an academic license runs on that, but it's in the thousands of dollars no doubt. As the initiative taker of my group, I persuaded (not that it took much effort) the guys to forget about the Oracle server (which didn't come with any developer tools, just the awful command line interface) and use PostgreSQL (we rejected MySQL just to be on the safe side, as it lacked support for a bunch of things like views and foreign keys). I also introduced another deviation from the standard practice: CVS. I would be running both off of my tried and tested Pentium home server. :proud: We were the only group that used version management, which baffled me (why was it not offered as a service at the department?). I heard that the year after that they started offering CVS for projects. Incidentally, when you deviate off the path, you do expect to get some academic credit for the extra effort, obviously. ;)

So that was 6 months with CVS, and it worked quite well. We didn't do anything complicated with it, didn't know about tags or branches, just ran a linear development on it and that was it. And it was helpful (but not efficient) to track binary files (Word documents, bleh). But it had some annoying quirks. Not that we were all that concerned with tracking history back then, it was just about getting it finished and then obviously it would never be used, as school projects go. But the lack of renaming and moving in CVS was silly.

It's a bit funny, actually. CVS has been a standard for the longest time, and people have put up with its problems, until recently when a version management boom began. Why is that? I have a hunch that the relative calm in version management was kept intact by a predominantly corporate culture (and the corporates are obviously super conservative). But then once a certain number of people had gotten themselves into open source projects the need for better tools put this in motion. One of the first "new generation" systems was Subversion, the replacement for CVS. Subversion was adopted slowly, but quite steadily. Currently it's probably the "standard" for version management. I registered galleryforge with Berlios precisely because they offered svn services. Sourceforge also has it now, and people around the net have started talking in terms of svn a lot. My current school also uses it.

But with the momentum of version management systems in play, a lot of other ideas have surfaced besides those captured in Subversion. Arch and Darcs are two that employ the distributed paradigm. I don't think they have had much success though, and perhaps looking back they will have played the role of stepping stones rather than end products. A new (a third, if you will) generation of systems has appeared quite recently. Monotone is a newer system with its own network synchronization protocol. Mercurial seems to have enough support to replace Subversion/CVS down the line (not that it couldn't happen today, but people hate giving up something they're used to, so these things drag out). And there is Bazaar, which seems to discard the goal of being the best system and just aim to be easy and pleasant for small projects.

And somewhere in that mix we find Git, except that it has a much higher potential for success than other new systems, because it was launched into the biggest (or noisiest, at least) open source project and made to carry the heaviest loads right from the beginning, which is a good way to convince people that your product is robust. As such, it seems to me that git has had the quickest adoption of any system. It was launched in 2005 and it has already become one of the household names around the interwebs. So therefore I thought git is worth looking at. At this rate I may very well be using it on a project someday.

I was also encouraged by regular people saying it worked for them. A version management system for kernel developers is nice, but that doesn't necessarily make it right for everyone else. But here were people with one-man projects liking it, good sign. In fact, one of the things I used to read about git was "I like the simplicity".

What I discovered was that my dictionary was out of date, because it didn't carry that particular meaning of the word "simple". Git is not simple, it takes getting used to. It's one of those things you try and you don't understand, then you come back and try again and you make a little more progress and so on. I decided to open a git repo for undvd, both because I need to track the changes and so I can use git and get a feel for it.

The special thing about git is that it has well developed capabilities for interoperating with other systems. In other words, git doesn't assume it will wipe out everything else, it accepts the reality that some people will continue using whatever it is they're using (even CVS) and git can deal with that. Obviously, predicting the future is tricky, but I get the feeling that Subversion has hit that comfort spot where most people are satisfied with what it's doing and they really would need a lot of persuasion to learn something new. In other words, I expect Subversion to be around for quite a while. And therefore, git's svn tools may come in very handy. I haven't really looked at that part yet, but it's on my todo list.

So what is the future? I would say Subversion, Git, and maybe Mercurial. Version management systems are like programming languages - the more of them you use the easier it is to learn a new one. But switching is hard. When I'm thinking of checking something in version management I still begin typing svn st.. before I realize I'm using git.

what can you do with django?

December 2nd, 2007

So I'm reading about Django, which is a really groovy web framework for Python. As the Django people themselves say, there have been *a lot* of web frameworks for Python alone, and it's clear that a lot of lessons have been learned along the way. Personally I've only looked briefly at Zope before, but I was a bit intimidated by how complicated it seemed to be. I also once inherited a project that used Webware for Python, which was pretty nice to work with, but radically different from what I'd known because pages are constructed using a sort of abstract html notation (basically every html tag is a function etc).

There is a lot to like about Django. The MVC pattern is executed quite elegantly (or as they call it in Djangoland, the MTV (model - template - view) pattern). Templates are written in what resembles a server side language, but it's actually a specialized, both restricted and enhanced, Python-like templating language. Templates can inherit from each other to fill in placeholders in a cascading manner, very interesting design choice.

Then there is a nice separation between objects and persistence. Data is declared in terms of Python classes (called "the model") which are then translated somehow on-the-fly to sql-du-jour depending on your database backend. You can even run a Python shell to interact with your model to de facto edit your database.

The model and the template mechanism are both tied together with "views", which is the application code you write for your webapp. A cool thing here is that urls are decoupled from execution code, and the two things are mapped via regular expressions, so you have something like a mod_rewrite declaration (but nicer) that dispatches to functions.

All the same, if I were to work with Django I have a couple of concerns up front:

  1. Templates are written in html with inline Python-line expressions. This makes them both simple and transparent, because the html code that will be served is written by hand. However, considering the present situation with html and browser support, it would be nice if Django could enforce/validate html. But this would probably imply an abstraction over html (to prevent raw html being written) and a layer of code generation for html (which may or may not be easy/possible to cache efficiently).
  2. And that brings me to the second point. I have seen no mention of CSS anywhere. It appears that Django leaves the fairly heavy burden of supplying styles to the designer, without any intelligent mechanisms to help.
  3. Data models are represented in Python code, which is very convenient and provides a much sought after quick access to the contents of your database. Django will instantiate tables for you based on a model, but only for creation. It doesn't seem to enforce or update the correspondence between the Python model and the database schema. The Django book has a mention of strategies for keeping things in check, but this particular section is missing.

There is no doubt Django embodies a lot of what we've learnt about building web applications, but currently I would say it seems a bit incomplete (also pre v1.0 still), if terribly elegant in some parts.

Another question concerns performance. Ruby on Rails has been known to suffer when pitted against the effective but ugly php. Does Django fall in the same bracket or does it have a lower overhead? In general Python is an older and more refined environment, so one might expect Django to scale better.

do all cooks leave a mess?

November 30th, 2007

The answer is obviously no. Or at least one can hope.

I've never enjoyed cooking. Somehow I've never seen the joy of manipulating food from a raw state to something edible. I don't know what it is people get so excited about. One of the reasons I don't like it is because of how messy it is. I really get no satisfaction out of immersing my hands into that dough only to have to wash it off for 10 minutes afterwards. In fact, the whole process is just a big mess. I stop to think about how much washing up there is going to be and that alone makes me appreciate someone other than me doing the cooking.

Are some people more aware of being clean than others? People say you just don't want to get your hands dirty, and it's true. There's no reason for my hands to be dirty, when they are I go wash them. Meanwhile there are people in this world whose hands are visibly greasy and they don't seem to even notice.

latex sucks! (for presentations)

November 28th, 2007

Hold your fire.

Got your attention there, didn't I? :D

Latex is good at what it was built to do - markup big documents. Fine for writing reports, articles, books. The problem is that we have no other tools nearly as good, so we end up using latex for things it wasn't meant for. Like presentation slides.

Why is this? Well, let's see. There's Powerpoint. It's a toy. It's one step up from drawing each slide separately in Paint: it has no structure, it makes you duplicate all common elements, no indexing, no outline. What does it offer you? Oh yeah, a spell checker. It sucks. Then there's latex. You reuse everything you already know about it, it has a ton of helpful packages, and with beamer you can actually write slides without hurting too much. Except.. it's still latex.

Here is the problem. Presentation slides are not the same as prose. Everything latex is good at has to do with handling text. Everything presentations are supposed to be is not centered on text. Giving a presentation is just as much about showing as it is about telling. And you're already telling when you're talking, so you should compose your slides so that you can show people what you mean.

We have horrid tool support for writing presentations. No wonder almost every talk you see is frame after frame of bullet points that no one will remember anyway. People actually paste parts of their text from articles into slides, as if that is supposed to produce a good visual device! :googly:

I have given 8 presentations in the last 12 months, and I'm set to do 4 more by the end of the year. Almost all of them written in latex with beamer. And you know what? I didn't enjoy myself. On the surface it looks promising. The beamer class has a ton of features, you use regular latex, and you output a pdf. But when you get down to the details...

The ratio of illustrations, diagrams, source code and other example materials is much higher than it is in prose. It's supposed to be that way. This is what latex is awkward at. Presentations are visual, so the first thing people think about is the slide design. Well guess what, if you have the perfect design already, good for you, but if don't, you're stuck editing latex styles, which no one understands. Strike one.

Then there is importing images into your slides. Raster images work fairly well, but obviously you have the resize problem - a raster image displayed at a non-native resolution will look jagged. You don't want that. So you hack up a vector image in something like dia. But you can't use that directly in latex, you have to convert it to either eps or pdf. I have found the latter least painful of late. So now not only do you have to know latex and dia, you have to author Makefiles to get a sane build process out of this. That is if you can remember to include just the right latex packages in your document, make sure they are installed both locally and any other place you may want to do a last minute recompile. Strike two.

Speaking of Makefiles, everyone knows you have to recompile a latex document to see the changes you've made. But with presentations these changes are much more often concerned with layout than they are with content. And it's a huge pain to recompile again and again to debug something as simple as a two column layout (use \minipage and set the width in absolute units just right so they fit next to each other). So now I have a compile loop running every second that invokes make and I see the updates in kpdf. All because I want to tweak the layout of my content. Speaking of compiling, have you seen those latex error messages lately? They're not meant for humans. And as far as languages go, latex isn't exactly the most... robust of them all. Make one tiny mistake here and your compile detonates. Strike three.

It is a pain. It is a pain to have to recheck old documents to remind myself of how I included a piece of source code. Or how I handled a particular image. Latex is extremely hackish. It makes it friggin' hard to generalize anything so that it becomes reusable. I find myself copy pasting code all the time from one document to another. There is nothing sensible about how things are done, it's just a matter of writing something that works, and having a copy of it somewhere. The more presentations I write the more difficult to remember which trick I used where. An unmaintainable mess.

And then there's time pressure. When you're writing a technical document, you have the time to work out the little quirks you will encounter. It's a permanent document, it will continue to exist for years after the fact, it's worth the effort. But presentation slides are not. You just want to write them fairly quick so that you can give the talk and move on. You're not going to be investing time in this, it's a one time thing. And yet writing slides requires a lot more patience with latex than writing prose does.

Nevermind that, writing slides is a creative process. It is not filling in predefined sections of a document, like an article for a conference is. You want to be able to experiment, to try things just for the hell of it. Latex is horrid for this. Let's see if I can fit this figure next to the text here. Okay, let me first look up how I do that. How do you actually find this out? I already covered that earlier. Then try it, debug the layout, finally I can see if I got what I wanted. No, that didn't work out, let's switch it back to the way it was. Compile error. Goddamn it, now what? Okay, let me gradually undo to a state that does compile and that way I can figure out what's causing it.

Not. how. it. was. meant. to. be. done.

The Recruit: very refreshing

November 23rd, 2007

A different kind of spy movie. James (Colin Farrell) is the college kid who gets recruited into the CIA. We see him go from a civilian to an operative-in-training, in boot camp, learning the basic skills, not to mention learning some basic psychological devices.

What I really like about this story is how raw the characters are. This is not a story about a veteran in the field à la Jason Bourne, who knows everything, feels nothing, never fails. It's about a guy you can relate to, someone who is learning to understand, but still living through the pain of being put through all these things that are happening to him. The plot isn't brilliant, it's neat and tidy, almost simple. But the appeal of the movie isn't in that, it's in the characters, how real they are and how unrefined their reasoning and emotional response is.

The idea is nothing is what it seems, but they don't take this very far, they hold back so that the characters can tag along at their own pace. And that's.. nice. Most other stories try to push the envelope and many of them fail to resolve well. This one doesn't, and you can appreciate that they care more about the character development than the goal to have a complicated plot.

Enjoyable as it is, I just have to point out a few technical things. Forgive me, it's too much to look the other way, downright embarrassing at times. Okay, what the bad guys are after is "a virus", a computer virus. It turns out, though, that this is a magical virus that you can deploy right into the power socket in your house and it will infest the nation's whole power grid. Okay, that's just beyond bad. And James is the MIT kid who is a computer wiz. And we are supposed to buy this. Had they said it had something to do with electricity or whatever, maybe if they'd tried hard enough I'd bought it. But a computer virus on the power grid? Seriously, do 15min of research before you put that in a movie.

Secondly, as the guy duly explains, the CIA headquarters are well guarded, you can't take anything out of there. That's why the computers don't have disk drives. Yes, that's a quote. What kind of disk drives? Floppies? CDs? Zip drives? And you also can't print anything because they don't have printers. Uh-huh. Well that still leaves about a dozen different media you could use, including usb drives. And what do you know, that's precisely how the super clever villain exports this "virus" out of the building, hiding the usb stick in a coffee mug. Now, your typical computer virus is probably less than 1mb of source code, I mean these things have to be small to go unnoticed. But this mastermind is smuggling it out... in pieces. What do you got there, Photoshop? :D Like the whole thing wouldn't fit on the usb stick, so you have to do it in turns. Okay cmon, I'm on candid camera, who the hell would believe this?