fixing greedy emoticon matching in kopete

September 13th, 2006

I have a lot of admiration for the KDE project. The way that things come together and integrate into a common desktop with KDE is quite extraordinary. And all the time there are people interested in improving just about every bit of it. Now, of course, it's all about KDE4, the long awaited upgrade will come at some point in 2006, I guess the date hasn't been set yet.

Anyway, the beauty of free software is that if there's a bug that gets to you, you can fix it yourself. And one such bug irks me in Kopete. I've been testing the xtorg emoticon theme and with a fairly rich set of emoticons (82 images, 117 replacement strings), it's quite a good testset and exposes certain problems. The emoticon theme comes with a file called test_suite.txt, which just lists all the emoticon replacement strings, so that you can paste them into a chat client and see if they come up correctly. The special thing about the xtorg theme is that I've made sure to include the most common Msn Messenger strings, so that Windows people can reuse the ones they're used to already. In Kopete 0.12.2, using the test suite gives this result.

kopete_bugged.png

So evidently, Kopete's parsing of emoticons is not as good as it could be. I have examined the issue and found that the problem lies in non-greedy matching. This means that if : s and : s t a r : are both defined as emoticon strings, and : s just happens to appear before : s t a r : in Kopete's internal list of emoticons, : s t a r : will be parsed as [: s] t a r :, not as [: s t a r :]. This is not what the user expects, having defined a list of replacement strings, the user expects all of them to work.

This is not the kind of bug that will affect a lot of users, because the average user does not use big emoticon styles like this one (and will probably never encounter the error). Thus if I were to report the bug, it's not likely to be very high priority. Meanwhile it does bother *me*, so I thought I would try and fix it myself. So after a little hacking, I wrote a patch for Kopete, and it now does this.

kopete_fixed.png

I've reported this on KDE Bugzilla and Kopete developers willing, the fix will find itself into Kopete at some point. In the meantime, the patch is attached below.

diff -Naur kopete-0.12.2/kopete/libkopete/private/kopeteemoticons.cpp kopete-changed/kopete/libkopete/private/kopeteemoticons.cpp
--- kopete-0.12.2/kopete/libkopete/private/kopeteemoticons.cpp	2006-08-12 02:51:47.000000000 +0200
+++ kopete-changed/kopete/libkopete/private/kopeteemoticons.cpp	2006-09-13 07:20:28.000000000 +0200
@@ -48,6 +48,8 @@
 struct Emoticons::Emoticon
 {
 	Emoticon(){}
+	/* sort by longest to shortest matchText */
+	bool operator< (const Emoticon &e){ return matchText.length() > e.matchText.length(); }
 	QString matchText;
 	QString matchTextEscaped;
 	QString	picPath;
@@ -424,6 +426,7 @@
 		node = node.nextSibling();
 	}
 	mapFile.close();
+	sortEmoticons();
 }
 
 
@@ -492,9 +495,24 @@
 		node = node.nextSibling();
 	}
 	mapFile.close();
+	sortEmoticons();
 }
 
 
+void Emoticons::sortEmoticons()
+{
+	/* sort strings in order of longest to shortest to provide convenient input for
+		greedy matching in the tokenizer */
+	QValueList<QChar> keys = d->emoticonMap.keys();
+	for ( QValueList<QChar>::const_iterator it = keys.begin(); it != keys.end(); ++it )
+	{
+		QChar key = (*it);
+		QValueList<Emoticon> keyValues = d->emoticonMap[key];
+ 		qHeapSort(keyValues.begin(), keyValues.end());
+ 		d->emoticonMap[key] = keyValues;
+	}
+}
+
 
 
 
diff -Naur kopete-0.12.2/kopete/libkopete/private/kopeteemoticons.h kopete-changed/kopete/libkopete/private/kopeteemoticons.h
--- kopete-0.12.2/kopete/libkopete/private/kopeteemoticons.h	2006-08-12 02:51:47.000000000 +0200
+++ kopete-changed/kopete/libkopete/private/kopeteemoticons.h	2006-09-13 07:19:17.000000000 +0200
@@ -156,6 +156,12 @@
 	 * @see initEmoticons
 	 */
 	void initEmoticon_JEP0038( const QString & filename);
+	
+	/**
+	 * sorts emoticons for convenient parsing, which yields greedy matching on
+	 * matchText
+	 */
+	void sortEmoticons();
 
 
 	struct Emoticon;

EDIT: The original conclusion of this entry was that Gaim has parsing bugs too. This seems to be incorrect. A fresh new screenshot shows that Gaim handles the xtorg theme just fine.

gaim_emoticons.png

UPDATE: The patch was accepted verbatim into kopete svn, so the next release (kde-3.5.5), whenever it will be, should have this problem fixed.

:: random entries in this category ::