Archive for the ‘newman’ Category

Project Newman :: An evaluation

Tuesday, August 29th, 2006

The thing about a project like Newman is that it’s basically impossible to make it work perfectly. It has a difficult job, because there are so many potential sources of error. Servers may go offline, connections may fail, article formats may change and so on. It is as good as impossible to guarantee that Newman will do the right thing, because at the end of the day we are trying to analyze text, and computers are not good at doing that. Just look at spam filters – they have been improved upon for years, but everyone is still getting spam. Much less than before, of course, so the filters are definitely useful. And Newman too makes mistakes, but it does still succeed quite often.

Newman has been posting on Xtratime.org under the username Carsonne, a French female impersonator of Carson35’s it would seem. :D Carsonne averages about 15 posts a day since July 30, that is a little over 350 posts in all, 350+ news stories posted. While I haven’t been keeping score to present statistical numbers, I have kept a close eye on Carsonne and I would estimate that upwards of 90% of the stories posted were correctly parsed, formatted and classified. In fact, I recall about 10-15 misposts of the ones I’ve seen (which I think is most). And that is an error rate no human poster would have, Carsonne at an estimated 95% success rate is at least an order of magnitude below a human poster (ie. I would claim that a human poster would have a >99.5% success rate at copy/pasting and classifying stories – less than 2 misposts in 350).

(more…)

Project Newman :: Further ideas not implemented

Sunday, August 27th, 2006

Newman was meant to be a simple design that wouldn’t take too long to build (it took me about 2 weeks of afternoons to write) and just focus on the issues that are simple to handle without making a complicated mess of it. There are all kinds of ways in which it could be improved and I’ll mention some of the ideas that I decided to leave out.

(more…)

Project Newman :: The scheduler

Saturday, August 26th, 2006

Now that we’ve covered the reporter, the editor and the publisher, we have a functional Newman that can actually post stories. I set up Newman to run in a cron job (ie. at set intervals) to run every three hours, but then it occurred to me that it isn’t human behavior to post at 9am, then at 12am, then at 3pm and so on, it just doesn’t look real. And if someone were to keep an eye on Newman, they might notice that it always posts at regular intervals, which looks odd. (The point here isn’t so much to fool people into believing that Newman is real, it is just to make it so that it seems to exhibit a lot of human qualities.)

So I thought why not add a scheduler to decide when Newman should run. The scheduler runs as a daemon (ie. an application that runs 24/7 in the background, but only does actual work whenever it is called upon). So the scheduler is given a time interval (for instance: 3 hours), and then it generates a random number between 0 and 3 hours. That’s when Newman is going to run. And then it goes to sleep until that time. So if I start the scheduler at 10am, give it an interval of three hours, it may decide that Newman should run at 11.45. So then it goes to sleep until 11.45 and then it runs Newman.

newman_scheduler.png

The advantage of this method is also that if the scheduler runs Newman and Newman crashes, it won’t make the scheduler crash. So the scheduler will still keep running and will again run Newman at the next interval. I’ve also made sure that the scheduler waits for Newman to finish, so that if Newman is taking a lot time to complete and the next interval is in 5 minutes, Newman will not be started again until the current execution is finished.

This entry is part of the series Project Newman.

Project Newman :: The publisher

Thursday, August 24th, 2006

Compared to what we’ve talked about so far, the publisher is a pretty simple piece of the puzzle. It receives a list of stories, each one assigned to one or more channels, and simply posts them on the selected target, that is Xtratime.org. For this to work, we must first prepare an account on the forum for Newman. Having done that, the publisher will log in the user, open the thread where the story should be posted and simply post it, adding some vBcode formatting to the text. The image below shows what a typical news post looks like.

newman_publisher.png

(more…)

Project Newman :: The editor

Tuesday, August 22nd, 2006

The editor is basically the “brain” of Newman. It’s the most complicated part, because it has to handle the most logic. Broadly speaking, it is the editor’s job to figure out which news articles to post where. Once the target is set (ie. Xtratime.org), the editor has to figure out whether each of the articles delivered by the reporter should be published in any of the channels (ie. threads) we have available. The illustration below shows the editor’s role in the chain.

newman_schematic.png

(more…)