undvd, now in perl!

October 5th, 2008

So it turns out you can do a whole lot with bash. More than I knew. But when you get to a point where you start hitting the limitations of your language*, it gets frustrating. The biggest problem with bash is that it doesn't have functions. You can wrap a bunch of code and call it with arguments, but it doesn't return a value. I've tried to come up with a hack to emulate functions returning arguments, but in the end there just aren't enough pieces in the box to build it from.

To date, undvd has been using various tricks to get around this. Let the function echo the value and capture this in the caller. But then what if you have a failing condition? Well, you can echo the value to stdout and echo the error to stderr, so it doesn't get captured as the result of the function. And then kill $$ to force an exit (you can't just exit cause that is equivalent to a return from the function).

That kind of works, but eventually you break down when you have to return more than one string that may contain whitespace. Sure, you could quote them, let the caller find both strings based on where the quotes are, then chop off the quotes and voila. But all this just for a function call? It's too much, and it's unacceptable from a maintenance standpoint.

Bash's overall weak support for other features of a typical programming language makes it a challenge to write structured programs. undvd-0.6 is therefore pretty much a dead end from a development standpoint. It works well enough, but it's hard to get anything more out of bash. In order to keep evolving, undvd needs a new language.

Another substantial problem with bash is that you're executing commands in the shell, in other words you build execution strings. There is a lot of potential for quoting bugs when you're dealing with filenames that have spaces and quotes in them. And not just when feeding them as arguments to executables, but on every "function call" just the same. I've spent a lot of time trying to safeguard against this, but all it takes is one instance where the strings aren't quoted correctly, and you have a fatal parsing error.

So it's time to think about porting to another language. A language that is close to the shell. A language that lets you run a subprocess by passing in the arguments as a list, not a string. A language that has basic programming constructs, like functions. That has good string handling. That can do simple floating point arithmetic. That is as widely available as possible. A language like... perl.

It sounds absurd, doesn't it? Porting to perl in the name of maintainability. But when you're in bash and most of what you're doing is string manipulation and calls to other executables, it's the right choice. And I bet you have perl on your box.

Not that it hasn't been fun. Bash was the right place to start, and I've learned a lot of things about bash on the way. I've also learned that you have to do obscene things like echo strings to bc to do simple floating point math.

The port

It's a straight port. undvd 0.7 runs on perl, but the way it was written was to reproduce 0.6 exactly. The code is completely new, obviously, but the functionality is the same.

As a result of running in perl, all the string/numerical processing logic has been internalized, and all the calls to awk, sed, bc and so on are gone. This makes it run faster, scandvd is especially noticeable. This isn't a big impact, since most of the work is done in mencoder, and that is still the same. Nevertheless, it's a welcome side effect. It also makes me happier, since it's less dependent on all these outside tools.

In terms of size the code is about the same. The perl code is actually 5% bigger.

What this means for you

  • 0.6 and 0.7 are functionally equivalent.
  • If you find a bug in 0.6, it's probably also in 0.7.
  • If you find a bug in 0.7, try 0.6.
  • Please report bugs.

* I'm using the term "language" loosely here. I'm talking about both the language, and the implementation, and the execution environment (ie. standard libraries, or in bash's case the gnu userland). Often we just pile all of this under "language", because it's easier to talk about it that way.

:: random entries in this category ::