Archive for the ‘reviews’ Category

OpenID deserves to die

Tuesday, May 27th, 2008

Here’s my perspective on it. We all have ideas, some good and some bad. Now it’s understandable that people who have invested themselves into a bad idea, especially if they thought it was good, are reluctant to walk away from it. It’s painful to have to realize that. But the flip side is that we have to maintain the myth of Santa Claus because, well, so many kids believe in him that we can’t let them down. Bad ideas deserve to die for the good of everyone.

The first thing a good idea must have is a real problem to solve. OpenID does very well here. The point of OpenID is to solve our common problem of the internet age: many websites, many accounts, many usernames and passwords. This is probably why OpenID still appears to some people as being a good idea.

Here’s how they do it. Instead of keeping track of your accounts on all the sites you’re a member of, just let one site keep all your account records (sound ominous yet? it did to me). Now, whenever you want to login to one of your sites, instead of using your username/password for that site, you use your OpenID login, which looks like this: http://username.myid.net. This url is effectively your OpenID provider, ie. the site you use to keep track of all your accounts. So now the site you’re logging into sends you to your provider, where you login with a username/password belonging to the account on the provider site, and that logs you into the site you were visiting. So in other words, your account on the provider is the gatekeeper to all your accounts. Sounds simple, right?

I remember when I first heard about this idea years ago. The first concern I had was that in order for this to work, I need a provider to keep track of all my accounts. So I asked myself the question: whom do I trust do this for me? The answer came back: myself. I don’t know about you, but the idea of some third party storing all my logins doesn’t make me feel warmy and cuddly. As it happens, the open in OpenID means you can choose any provider you want, including yourself. You just set up some php scritps and voila, you can use http://mysite.com as your provider. So basically, instead of storing your accounts in some “account manager” program on your computer, you do the same thing on your server. This is where the concept of OpenID died for me. I don’t want to have to depend on my own OpenID provider to work in order to use other sites. I don’t want to add a dependency on my ability to login to some other site contingent on the assumption that my own site is available and working properly at all times (which it isn’t, I have a little downtime like everyone else).

If you don’t want the hassle of being your own provider, you can pick a provider from a list. This is not an attractive fallback option, because now your account on the provider is your key to all your other accounts. If I have an account on some site and I forget my credentials, big deal, I only lose that one account. But if I lose my credentials on the provider, I lose everything.

In theory, OpenID tries to improve your overall security. The hassle of keeping track of accounts is known to us all, and we get around the problem by reusing the same (or similar) credentials on a lot of sites. This is obviously bad for security, because if someone gets your password to one site, they can access all your other accounts that use this password. So security people will always recommend that you use distinct credentials for every account. Suppose you do this, and you use OpenID to alleviate record keeping. Now, OpenID actually works against you. Your account on the OpenID provider is the key to everything. With a different password on every site, you’re that much less likely to remember what it was, therefore your account on the provider is proportionally more valuable.

There is a strange irony at play here. Supposedly, the more accounts you manage with OpenID the more useful it is. But on the other hand, the more accounts you manage with it, the more you depend on it, and the more you make it the one gateway to all your online identities for a potential attacker or for abuse by a dishonest or incompetent provider.

Most importantly, however, OpenID’s solution to the login problem isn’t a very clever solution at all. Typing http://username.myid.net is not a big improvement over a username/password form. My browser already gives me the option to login without typing anything.

Those are my reasons why OpenID is a bad idea and should have died years ago. If you want more, Stefan Brands has an exhaustive laundry list of problems with OpenID.

clocking jruby1.1

Monday, April 21st, 2008

Did you hear the exciting news? JRuby 1.1 is out! For real, you can call your grandma with the great news. :party: Wow, that was quick.

Okay, so the big new thing in JRuby is a bytecode compiler. As you may know, up to 1.0 it was just a Ruby interpreter in Java. Now you can actually compile Ruby modules to Java classes and no one will know the difference, very devious. :cool: Sounds like Robin Hood in a way, doesn’t it?

The JRuby guys are claiming that this makes JRuby on par with “regular Ruby” on performance, if not better. Hmm. Just to be on the safe side, what size shoes do you wear? Oh ouch, those are going to be tricky to fit in your mouth. :/ And Freud will say you’re stuck in the oral stage. Too much? Okay.

So here is my completely unvetted, dirty, real world test. No laboratory conditions here, you’re in the ghetto. First we need something *to* test. I don’t have a great deal of Ruby code at my disposal, but this should do the trick. How does scanning the raw filesystem for urls sound? The old harvest script actually does a half decent job of turning up a bunch of findings.

Now introducing the contenders. First up, his name is JRuby, you know him from occasional mentions on obscure blogs and the programming reddit past the top 500 entries. He promises to free all Java slaves by giving away free Rubies to everyone!

Aaand the incumbent, the famous… Ruby! You know him, your parents know him, every family would adopt him as their own child if they could. He’s the destroyer of kingdoms and the creator of empires, he’s bigger than Moses himself!

Our two drivers will be racing across a hostile territory. Your track is a 25gb ext3 live file system. During this time, I can promise you that only Firefox is likely to be writing new urls to disk, but I could be lying eheheh. Due to the unpredictable nature of this rally track, regulations allow only one racer at a time, but you will be clocked.

First up is the new kid on the block Jay….Ruby. The Ruby code will not be compiled before execution, we’ll let the just-in-time compiler do its thing.

$ time ( sudo cat /dev/sda5 | bin/jruby harvest.rb –url > /tmp/fsurls.jruby )
real 39m26.547s
user 37m19.072s
sys 1m28.406s

Not too shabby for a first run, but since this a brand new venue, we have no frame of reference yet. Let’s see how Ruby will do here.

$ time ( sudo cat /dev/sda5 | harvest.rb –url > /tmp/fsurls.ruby )
real 78m42.186s
user 62m12.537s
sys 2m18.721s

Well, look at that! The new kid is pretty slick, isn’t he? Sure is giving the old man a run for his money. Let’s see how they answered the questions.

$ lh
-rw-r–r– 1 alex alex 86M 2008-04-21 18:29 fsurls.jruby
-rw-r–r– 1 alex alex 8.6G 2008-04-21 20:58 fsurls.ruby

Yowza! No less than a hundred times more matches with Ruby. What is going on here? Did Jay just race to the finish line, dropping the vast majority of his parcels? Or did father Ruby see double and triple and quadruple, ending up with lots and lots of duplicates? Well, we don’t really *know* how many urls exist in those 25gb of data, but it seems a little bit suspect that there would be in excess of 8gb of them.

One way or the other, it’s pretty clear that the regular expression semantics are not entirely identical. In fact, you might be sweating a little right now if your code uses them heavily.

UPDATE: Squashing duplicates in both files actually produces two files of very similar size (13mb), in which the disparity of unique entries is only a very reasonable 4% (considering the file system was being written to in the process). The question still remains how did Ruby produce 8gb of output.

evolving towards git

Wednesday, December 5th, 2007

I remember the first time I heard about version management. It was a bit like discovering that there is water. The idea that something so basic could actually not exist seemed altogether astounding afterwards. This was at a time when I hadn’t done a lot of coding yet, so I hadn’t suffered the pains of sharing code *that* much, but it was already relatable enough to understand the implications of version management. Emailing those zip files of code had been plenty annoying.

This happened just before we embarked upon a 6-month journey in college to write an inventory management application in java. There were four of us. We had done java before, but this was the first “big” application that also employed our newly acquired knowledge about databases. God only knows why, but they actually ran (and probably still do) an Oracle server at school for these projects. I don’t know how much an academic license runs on that, but it’s in the thousands of dollars no doubt. As the initiative taker of my group, I persuaded (not that it took much effort) the guys to forget about the Oracle server (which didn’t come with any developer tools, just the awful command line interface) and use PostgreSQL (we rejected MySQL just to be on the safe side, as it lacked support for a bunch of things like views and foreign keys). I also introduced another deviation from the standard practice: CVS. I would be running both off of my tried and tested Pentium home server. :proud: We were the only group that used version management, which baffled me (why was it not offered as a service at the department?). I heard that the year after that they started offering CVS for projects. Incidentally, when you deviate off the path, you do expect to get some academic credit for the extra effort, obviously. ;)

So that was 6 months with CVS, and it worked quite well. We didn’t do anything complicated with it, didn’t know about tags or branches, just ran a linear development on it and that was it. And it was helpful (but not efficient) to track binary files (Word documents, bleh). But it had some annoying quirks. Not that we were all that concerned with tracking history back then, it was just about getting it finished and then obviously it would never be used, as school projects go. But the lack of renaming and moving in CVS was silly.

It’s a bit funny, actually. CVS has been a standard for the longest time, and people have put up with its problems, until recently when a version management boom began. Why is that? I have a hunch that the relative calm in version management was kept intact by a predominantly corporate culture (and the corporates are obviously super conservative). But then once a certain number of people had gotten themselves into open source projects the need for better tools put this in motion. One of the first “new generation” systems was Subversion, the replacement for CVS. Subversion was adopted slowly, but quite steadily. Currently it’s probably the “standard” for version management. I registered galleryforge with Berlios precisely because they offered svn services. Sourceforge also has it now, and people around the net have started talking in terms of svn a lot. My current school also uses it.

But with the momentum of version management systems in play, a lot of other ideas have surfaced besides those captured in Subversion. Arch and Darcs are two that employ the distributed paradigm. I don’t think they have had much success though, and perhaps looking back they will have played the role of stepping stones rather than end products. A new (a third, if you will) generation of systems has appeared quite recently. Monotone is a newer system with its own network synchronization protocol. Mercurial seems to have enough support to replace Subversion/CVS down the line (not that it couldn’t happen today, but people hate giving up something they’re used to, so these things drag out). And there is Bazaar, which seems to discard the goal of being the best system and just aim to be easy and pleasant for small projects.

And somewhere in that mix we find Git, except that it has a much higher potential for success than other new systems, because it was launched into the biggest (or noisiest, at least) open source project and made to carry the heaviest loads right from the beginning, which is a good way to convince people that your product is robust. As such, it seems to me that git has had the quickest adoption of any system. It was launched in 2005 and it has already become one of the household names around the interwebs. So therefore I thought git is worth looking at. At this rate I may very well be using it on a project someday.

I was also encouraged by regular people saying it worked for them. A version management system for kernel developers is nice, but that doesn’t necessarily make it right for everyone else. But here were people with one-man projects liking it, good sign. In fact, one of the things I used to read about git was “I like the simplicity”.

What I discovered was that my dictionary was out of date, because it didn’t carry that particular meaning of the word “simple”. Git is not simple, it takes getting used to. It’s one of those things you try and you don’t understand, then you come back and try again and you make a little more progress and so on. I decided to open a git repo for undvd, both because I need to track the changes and so I can use git and get a feel for it.

The special thing about git is that it has well developed capabilities for interoperating with other systems. In other words, git doesn’t assume it will wipe out everything else, it accepts the reality that some people will continue using whatever it is they’re using (even CVS) and git can deal with that. Obviously, predicting the future is tricky, but I get the feeling that Subversion has hit that comfort spot where most people are satisfied with what it’s doing and they really would need a lot of persuasion to learn something new. In other words, I expect Subversion to be around for quite a while. And therefore, git’s svn tools may come in very handy. I haven’t really looked at that part yet, but it’s on my todo list.

So what is the future? I would say Subversion, Git, and maybe Mercurial. Version management systems are like programming languages – the more of them you use the easier it is to learn a new one. But switching is hard. When I’m thinking of checking something in version management I still begin typing svn st.. before I realize I’m using git.

what can you do with django?

Sunday, December 2nd, 2007

So I’m reading about Django, which is a really groovy web framework for Python. As the Django people themselves say, there have been *a lot* of web frameworks for Python alone, and it’s clear that a lot of lessons have been learned along the way. Personally I’ve only looked briefly at Zope before, but I was a bit intimidated by how complicated it seemed to be. I also once inherited a project that used Webware for Python, which was pretty nice to work with, but radically different from what I’d known because pages are constructed using a sort of abstract html notation (basically every html tag is a function etc).

There is a lot to like about Django. The MVC pattern is executed quite elegantly (or as they call it in Djangoland, the MTV (model – template – view) pattern). Templates are written in what resembles a server side language, but it’s actually a specialized, both restricted and enhanced, Python-like templating language. Templates can inherit from each other to fill in placeholders in a cascading manner, very interesting design choice.

Then there is a nice separation between objects and persistence. Data is declared in terms of Python classes (called “the model”) which are then translated somehow on-the-fly to sql-du-jour depending on your database backend. You can even run a Python shell to interact with your model to de facto edit your database.

The model and the template mechanism are both tied together with “views”, which is the application code you write for your webapp. A cool thing here is that urls are decoupled from execution code, and the two things are mapped via regular expressions, so you have something like a mod_rewrite declaration (but nicer) that dispatches to functions.

All the same, if I were to work with Django I have a couple of concerns up front:

  1. Templates are written in html with inline Python-line expressions. This makes them both simple and transparent, because the html code that will be served is written by hand. However, considering the present situation with html and browser support, it would be nice if Django could enforce/validate html. But this would probably imply an abstraction over html (to prevent raw html being written) and a layer of code generation for html (which may or may not be easy/possible to cache efficiently).
  2. And that brings me to the second point. I have seen no mention of CSS anywhere. It appears that Django leaves the fairly heavy burden of supplying styles to the designer, without any intelligent mechanisms to help.
  3. Data models are represented in Python code, which is very convenient and provides a much sought after quick access to the contents of your database. Django will instantiate tables for you based on a model, but only for creation. It doesn’t seem to enforce or update the correspondence between the Python model and the database schema. The Django book has a mention of strategies for keeping things in check, but this particular section is missing.

There is no doubt Django embodies a lot of what we’ve learnt about building web applications, but currently I would say it seems a bit incomplete (also pre v1.0 still), if terribly elegant in some parts.

Another question concerns performance. Ruby on Rails has been known to suffer when pitted against the effective but ugly php. Does Django fall in the same bracket or does it have a lower overhead? In general Python is an older and more refined environment, so one might expect Django to scale better.

latex sucks! (for presentations)

Wednesday, November 28th, 2007

Hold your fire.

Got your attention there, didn’t I? :D

Latex is good at what it was built to do – markup big documents. Fine for writing reports, articles, books. The problem is that we have no other tools nearly as good, so we end up using latex for things it wasn’t meant for. Like presentation slides.

Why is this? Well, let’s see. There’s Powerpoint. It’s a toy. It’s one step up from drawing each slide separately in Paint: it has no structure, it makes you duplicate all common elements, no indexing, no outline. What does it offer you? Oh yeah, a spell checker. It sucks. Then there’s latex. You reuse everything you already know about it, it has a ton of helpful packages, and with beamer you can actually write slides without hurting too much. Except.. it’s still latex.

Here is the problem. Presentation slides are not the same as prose. Everything latex is good at has to do with handling text. Everything presentations are supposed to be is not centered on text. Giving a presentation is just as much about showing as it is about telling. And you’re already telling when you’re talking, so you should compose your slides so that you can show people what you mean.

We have horrid tool support for writing presentations. No wonder almost every talk you see is frame after frame of bullet points that no one will remember anyway. People actually paste parts of their text from articles into slides, as if that is supposed to produce a good visual device! :googly:

I have given 8 presentations in the last 12 months, and I’m set to do 4 more by the end of the year. Almost all of them written in latex with beamer. And you know what? I didn’t enjoy myself. On the surface it looks promising. The beamer class has a ton of features, you use regular latex, and you output a pdf. But when you get down to the details…

The ratio of illustrations, diagrams, source code and other example materials is much higher than it is in prose. It’s supposed to be that way. This is what latex is awkward at. Presentations are visual, so the first thing people think about is the slide design. Well guess what, if you have the perfect design already, good for you, but if don’t, you’re stuck editing latex styles, which no one understands. Strike one.

Then there is importing images into your slides. Raster images work fairly well, but obviously you have the resize problem – a raster image displayed at a non-native resolution will look jagged. You don’t want that. So you hack up a vector image in something like dia. But you can’t use that directly in latex, you have to convert it to either eps or pdf. I have found the latter least painful of late. So now not only do you have to know latex and dia, you have to author Makefiles to get a sane build process out of this. That is if you can remember to include just the right latex packages in your document, make sure they are installed both locally and any other place you may want to do a last minute recompile. Strike two.

Speaking of Makefiles, everyone knows you have to recompile a latex document to see the changes you’ve made. But with presentations these changes are much more often concerned with layout than they are with content. And it’s a huge pain to recompile again and again to debug something as simple as a two column layout (use \minipage and set the width in absolute units just right so they fit next to each other). So now I have a compile loop running every second that invokes make and I see the updates in kpdf. All because I want to tweak the layout of my content. Speaking of compiling, have you seen those latex error messages lately? They’re not meant for humans. And as far as languages go, latex isn’t exactly the most… robust of them all. Make one tiny mistake here and your compile detonates. Strike three.

It is a pain. It is a pain to have to recheck old documents to remind myself of how I included a piece of source code. Or how I handled a particular image. Latex is extremely hackish. It makes it friggin’ hard to generalize anything so that it becomes reusable. I find myself copy pasting code all the time from one document to another. There is nothing sensible about how things are done, it’s just a matter of writing something that works, and having a copy of it somewhere. The more presentations I write the more difficult to remember which trick I used where. An unmaintainable mess.

And then there’s time pressure. When you’re writing a technical document, you have the time to work out the little quirks you will encounter. It’s a permanent document, it will continue to exist for years after the fact, it’s worth the effort. But presentation slides are not. You just want to write them fairly quick so that you can give the talk and move on. You’re not going to be investing time in this, it’s a one time thing. And yet writing slides requires a lot more patience with latex than writing prose does.

Nevermind that, writing slides is a creative process. It is not filling in predefined sections of a document, like an article for a conference is. You want to be able to experiment, to try things just for the hell of it. Latex is horrid for this. Let’s see if I can fit this figure next to the text here. Okay, let me first look up how I do that. How do you actually find this out? I already covered that earlier. Then try it, debug the layout, finally I can see if I got what I wanted. No, that didn’t work out, let’s switch it back to the way it was. Compile error. Goddamn it, now what? Okay, let me gradually undo to a state that does compile and that way I can figure out what’s causing it.

Not. how. it. was. meant. to. be. done.