Archive for August, 2006

fixing missing post slugs in wordpress

August 31st, 2006

If you've ever moved from one house to another, you know that it's not just moving day that is a mess in your new house, it drags on for a while until you get things sorted out. Lots of little details escape attention for days, weeks even. But eventually you track down every last one and after about a month or two, you are 100% in order.

Now you're probably thinking what the hell does that have to do with the title of this entry?!? Well, just like moving houses, migrating data from one system to the next is similar. And moving from BLOG:CMS to WordPress has not been entirely trivial, so I still spot the odd bug even though it's been a couple of weeks. One thing I neglected to consider when migrating the blog was missing post slugs. You see, WordPress uses post slugs as a way to label urls more human-friendly. Instead of {blog_url}?p=34 to open post number 34, it allows you to use urls in the form {blog_url}/index.php/year/month/day/blog-entry-title (the part after the last slash is what WordPress calls a post slug) This is nice for people who link to a blog entry, because the latter url makes a lot more sense to a human than the former (which is just a number of a column in a database).

But. BLOG:CMS does not use post slugs (or didn't), so I've never had them. WordPress generates them automatically for new posts, but since I imported my old entries into WordPress, those didn't have post slugs from before. I realized all this when I migrated my blog entries, and I thought it was just inconsistent, but it wouldn't have any repercussions. Well, it turns out some links were broken over this. So I realized today that I would have to fix this annoying little bug and put in post slugs for entries that don't already have them.

And for that purpose I wrote a little script. It's a quick and dirty fix, stripping off all non-ascii characters (this will not work well with non-English post titles), forcing all characters to lowercase and inserting hyphens between words. But for my money it works well enough.

<?

$dbhost = '';
$dbuser = '';
$dbpass = '';
$dbname = '';


$sql = 'SELECT ID, post_title'
        . ' FROM `wp_posts`'
        . ' WHERE post_status = \'publish\''
        . ' and post_name = \'\''
        . ' order by ID asc';


$db = mysql_connect($dbhost, $dbuser, $dbpass) or die('Could not connect: ' . mysql_error());
mysql_select_db($dbname);


$result = mysql_query($sql) or die('Query failed: ' . mysql_error());
while($row = mysql_fetch_array($result, MYSQL_ASSOC))
{
	$id = $row['ID'];
	$title = $row['post_title'];
	
	$title = trim($title);
	$title = strtolower($title);
	$title = str_replace(" ", "-", $title);	
	$title = ereg_replace("[^a-z0-9-]", "", $title);
	$title = ereg_replace("[-]+", "-", $title);
	
	echo "ID :{$row['ID']} <br>" .
		"post_title : {$row['post_title']} <br>" .
		"post_title : {$title} <br>";

	$sql_u = 'UPDATE `wp_posts` SET post_name = `' . $title .'`'
		.'WHERE ID = ' . $id;
	echo '<br>'.$sql_u;
	mysql_query($sql_u) or die('Query failed: ' . mysql_error());

} 

mysql_close($db);

?>

Six degrees of separation: nothing like a good play

August 30th, 2006

I haven't been to that many plays and I could probably enumerate the really good plays I've seen on the fingers of one hand. Because even though the theatre is a big thing, a lot of the stuff being played there is very mediocre. But, there are good plays from time to time. The problem is that the form of putting them on is terribly outdated. I mean that whole thing where the actor talks to the audience and "noone can hear him"? Noone is buying that. The theatre doesn't have the possibilities that movies have.

Which is precisely why taking a good play and making a movie of it can really bring out the bright points of that play. Like in the case of Six degrees of separation. I can imagine what it looks like on stage, which is why I like the movie all the more. The strongest point is that the plot is very good. Without trying to give anything away, the plot is very unclear and unexpected. It doesn't make you guess what's coming, you don't feel compelled to. And there's no way to know either.

It is a bit of a cliché on rich people living hollow lives and having all kinds of petty problems, but it has enough depth to not make that aspect be anymore than a secondary concern. The story is what drives it forward, and it is complex enough to fill 2 hours without boring you at all. The characters are 'tasteful' - vivid enough to be palpable, but subtle enough to not make you get sick of them. (This is something I get in theatres a lot - characters that are so dominating that their depth is exhausted long before the play breaks 30 minutes, and if they are annoying too, well..)

Banlieue 13: poetic senseless violence

August 29th, 2006

banlieue_13_poster.jpgLuc Besson strikes again. The guy has a passion for stunts and martial arts, but this movie from 2004 is far better than the Transporters and doesn't make the slightest effort to be funny or charming, in stark contrast to the Taxis. It is a far more focused effort - focused on long, intense action sequences that stir your imagination.

How often do you see a truly violent movie that ends in a morality tale saying that violence is not the solution to our problems? If you can stand to ignore your instincts for a tight plot, solid casting and a proper escalation of the story, there is a chance you may really enjoy Banlieue 13. The mass killing scenes do get tedious at times, but the movie features some ground breaking stunts, based on parkour, the discipline of running at and jumping over urban obstacles at a high pace. It is a sight to behold, especially considering that most of these scenes were made without any kind of technical aid. So much cooler than yet another car chase.

Then, of course, comes the added benefit of bragging rights for watching classy European cinema "a film, is what it is", rather than the same old Hollywood productions remade over and over. Until your friends see it and call your bluff, that is.

Project Newman :: An evaluation

August 29th, 2006

The thing about a project like Newman is that it's basically impossible to make it work perfectly. It has a difficult job, because there are so many potential sources of error. Servers may go offline, connections may fail, article formats may change and so on. It is as good as impossible to guarantee that Newman will do the right thing, because at the end of the day we are trying to analyze text, and computers are not good at doing that. Just look at spam filters - they have been improved upon for years, but everyone is still getting spam. Much less than before, of course, so the filters are definitely useful. And Newman too makes mistakes, but it does still succeed quite often.

Newman has been posting on Xtratime.org under the username Carsonne, a French female impersonator of Carson35's it would seem. Carsonne averages about 15 posts a day since July 30, that is a little over 350 posts in all, 350+ news stories posted. While I haven't been keeping score to present statistical numbers, I have kept a close eye on Carsonne and I would estimate that upwards of 90% of the stories posted were correctly parsed, formatted and classified. In fact, I recall about 10-15 misposts of the ones I've seen (which I think is most). And that is an error rate no human poster would have, Carsonne at an estimated 95% success rate is at least an order of magnitude below a human poster (ie. I would claim that a human poster would have a >99.5% success rate at copy/pasting and classifying stories - less than 2 misposts in 350).

What about user input, then? Well, unfortunately Newman does present a certain configuration cost, not everything can be automated. In particular, finding channels is something that would be wonderful to automate, given how quickly the forum climate changes. Newman also requires that sources be configured (and if need be - updated) for the parsing to work. Of course, once that is in place, Newman can post at will. So that is still quite a limited set of abilities.

The screenshot below shows a typical run of Newman. Quite a few stories were fetched, some were selected for posting, and then posted. It also shows how Newman is fault reliant - a parsing error was handled gracefully, as was a timeout from the forum web server.

newman_running.png

After 20+ days on the forum, Carsonne has been active long enough to stir up some reactions about "her" posting of news. Carson's long tenure has paved the way for posters like this, so Carsonne is seen by most as just another compulsive news poster. "She" has taken some heat over posting news in the wrong place (wrong classification), but beyond that it has been no worse than Carson gets daily.

So what have we learnt?

As it often is, it seems that Project Newman has yielded more questions than the number of answers it has given. Sure enough, it isn't too hard to automate posting on a forum, it isn't too hard to fetch stories from the web and parse them, it certainly isn't hard to automate this out of any human's ability to keep up. But it is hard to decide what text means, it is hard to decide which story is relevant to what thread, it is hard to decide whether a word in a sentence is a name and so on.

The question is just how to do these things in a reliable way?

Thus endeth Project Newman. Download the code from the code page if you're interested.

This entry is part of the series Project Newman.

how to spend a lovely weekend down south

August 28th, 2006

Sørlandet (literally "south of the country") is just the nicest place to be in the summer. Tourist flock from all of Europe to experience the Norwegian summer in this region. It's been years since I've spent summers there myself, but this weekend I had a chance to go back to Mandal, a lovely little town with the best beach in the country.

trondheim_stavanger.png

First, catch a flight down south to Stavanger (cause it's waaaay too far to drive). Then, get on the E39 direction Kristiansand. It's about 200km to Mandal, so set aside 3-4h for this, it's quite a scenic drive through the mountains and valleys, but you won't go fast cause the roads are narrow and there's traffic.

stavanger_mandal.png

In Mandal, you will discover a small, charming touristy town with classic white wooden houses and a vacation kind of atmosphere. (Make sure you don't back into the car with the German plates as you're getting out of your parking spot.) Once in Mandal, you may want to go for a coffee (or indeed an ice cream) at the town's best located ice cream bar, right on the main street. Then, why not take a stroll through town and pick up some nice paella for dinner at the fish market.

Sjøsanden camping is right outside town. Bring your camping gear or hire a bungalow if you don't have any. Head for the greatest beach in Norway, which is literally at the foot of the camping (I tried pulling up the Google Earth imagery for it, but as luck would have it, the imagery stinks for that particular place).

Once you're done with the beach and dusk sets in, have some dinner and head to Kristiansand, a mere 40km away. Kristiansand is *the* city in the south of the country. Also a nice place to be in the summer, the old fort dominates the bay, but there are marinas and there's a beach too.

The next morning, go for a swim in the sea before breakfast. If the water feels chilly, then it probably means it's the hot time of year. On the drive back, stop off at the little town of Flekkefjord. It isn't so much of a tourist place, but it's more the kind of small town Norway that much of this country is. As you drive into the center, crossing the bridge over the fjord, on the left you'll see a place called Kaffebørsen. They have a wide selection of coffee and in particular, their mocca is excellent.