Archive for 2010

what is comprehension really?

December 11th, 2010

When we're talking about learning a language we seem to present an idealized picture of both what the language is like and what our learning is like. We imagine a script, which is written on paper, and which is recited with clear enunciation and proper pacing by a voice actor on our language tapes. This creates the impression that these two things are somehow equivalent, that the written text and the spoken message are two encodings that have the same content. This is the ideal case.

But as we all know, language is not ideal. There is a reason we have expressions like "what did you say" that we use with people who speak our own language perfectly well. Our communication is lossy, ie. the sum total of what is received is less than what was transmitted. To make up the difference we have to reconstruct the message, and our reconstruction may not be accurate. Comprehension is the ability to reconstruct successfully, most of the time.

In order to belabor this point a bit, let's use an example message from an article:

Liu, he said to sustained clapping, "has exercised his civil rights. He has done nothing wrong. He must be released."

This is the ideal message, the message that was intended for you. If you speak English then this is perfectly clear to you. But when it becomes degraded in communication, what degree of error can you cope with?

What if the writer had excluded the commas?

Liu he said to sustained clapping "has exercised his civil rights. He has done nothing wrong. He must be released."

What if there were no quote marks?

Liu he said to sustained clapping has exercised his civil rights. He has done nothing wrong. He must be released.

No periods?

Liu he said to sustained clapping has exercised his civil rights He has done nothing wrong He must be released

No capitalization?

liu he said to sustained clapping has exercised his civil rights he has done nothing wrong he must be released

No spaces?

liuhesaidtosustainedclappinghasexercisedhiscivilrightshehasdo- nenothingwronghemustbereleased

Now, think about it. When you hear speech in a language that you don't understand, does it have punctuation in it, does it even have spaces? I would suggest that speech is actually much more similar to the last version than to the first one. Sure, there is vocal "punctuation" in the audio, with things like tone of voice and pacing, but it's not as standardized and clear as punctuation in writing. Names are not capitalized. There are no parentheses. There are no quote marks (only the bandwidth heavy "and I quote"/"end of quote", which gets quite tiring to hear if there are many quotes).

And the above example only demonstrates information loss, it doesn't even begin to address errors introduced by a faulty communication (mispronounced words, words spoken with the emphasis in the wrong place, spoken with a foreign accent etc.), yet that too your brain must be prepared to deal with.

Spoken language is never like the original message. Some time ago, a friend recommended a book to me. A book that is available on the author's website, but which for some reason doesn't have any punctuation (I don't know if this is just the online version or the print version too, but that would surprise me). The only thing capitalized are names. I first tried to read this a few months ago, but I found it too tough. I came back to it recently and this time I made more progress. That's when it dawned on me how well this illustrates how we idealize and underestimate language.

In reading this book the brain is forced to attempt something similar to what I try to do here with a little syntax highlighting:

noi abbiamo le pive nel sacco Malva è sconvolta ma Cocco non molla entriamo e la facciamo lo stesso in quanti siamo dice dobbiamo farla lo stesso tanto ormai non abbiamo più niente da perdere grida e cosi convinciamo gli altri a fare lo stesso l'assemblea entriamo tutti insieme e ci mettiamo in un'aula vuota del pianterreno è un minuto che siamo dentro e non abbiamo ancora cominciato a dire una parola che arriva Mastino sbraitando cosa fate qui tu tu e tu siete tutti quanti sospesi passate in presidenza uno alla volta e esce lasciando la porta aperta Scilla dà un calcio alla porta e poi la barrica ci spingiamo davanti due banchi restiamo un momento in silenzio dobbiamo fare qualcosa ci guardiamo negli occhi ma non sappiamo cosa fare ci sentiamo in trappola
Noi abbiamo le pive nel sacco. Malva è sconvolta, ma Cocco non molla. “Entriamo e la facciamo lo stesso in quanti siamo”, dice. “Dobbiamo farla lo stesso, tanto ormai non abbiamo più niente da perdere”, grida. E così convinciamo gli altri a fare lo stesso. L'assemblea entriamo tutti insieme e ci mettiamo in un'aula vuota del pianterreno. È un minuto che siamo dentro e non abbiamo ancora cominciato a dire una parola che arriva Mastino sbraitando cosa fate qui tu, tu e tu? Siete tutti quanti sospesi. Passate in presidenza uno alla volta”. E esce lasciando la porta aperta. Scilla dà un calcio alla porta e poi la barrica. Ci spingiamo davanti due banchi. Restiamo un momento in silenzio. Dobbiamo fare qualcosa. Ci guardiamo negli occhi, ma non sappiamo cosa fare. Ci sentiamo in trappola.

Is that the canonical version? No, it's not. It's my best attempt at reconstruction. I'm not certain that this is correct, I'm not even certain that the version on the website matches the printed version (there seem to be quite a few typos). But ultimately, there is no final answer, all language comprehension is heuristical and hypothesis based.

free will or not

September 13th, 2010

One of the classic topics for debate among us humans is the dilemma of free will. In a way I'm surprised that it comes up as much as it does, because I don't really think it's that interesting a question. In the sense that I don't see how we're making any progress on it.

It's as if everyone is convinced that we have free will, and yet... there is absolutely no empirical basis to think so. Never has anyone had the chance to go back in time and make a different decision. So why do we seem to think it must be so?

Here's the thing. It's my impression that much of the debate on free will is shaped by an unwillingness to really accept the premise of the question and take it to its logical conclusion.

All too often I've heard people argue things like "well if there is no free will, then people cannot be held responsible for their actions, so we should let everyone get away with it". What do you mean "let"? Here's the problem: our form of expression is basically based on the premise of free will. So to even discuss it without commiting a fallacy one has to be careful.

The whole "ethical problem" of determinism seems to me nothing more than a false dilemma. If the actor has no free will to commit a crime or not, then why are we debating the problem of "deciding" how to respond if we have no freedom of choice? If the actor has no choice, then neither do we, there is nothing to decide, there is no problem to solve. Whatever happens is purely a matter of inevitability, however much it may seem otherwise. If the crime commited was deterministic, then our post-fact discussion is deterministic and whatever action we will take is also deterministic. The only way there is a dilemma is if the other guy's actions were determined and yours somehow aren't. But that's not how the question is defined.

The free will topic also enters the religious domain often, and there too people make the same mistake. As in, if god is all powerful and all knowing, then he knows your future, thus your future is decided, thus "why are you praying to him hoping he will change his mind?" Wrong. If the future is decided then he has *already decided* that you will pray to him and that you will thank him etc, so your apparent gratitude to him is nothing else than him having decided to "program you" (if you will) to thank him.

"How a programmer reads your resume"

August 13th, 2010

tl;dr: Sometimes stereotypes are true.

I came across this rather excellent comic about how people see your resume depending on who they are and after glazing over it and appreciating it as one of my ~5-10 daily interweb funnies, I looked over it again and noticed that it's eerily accurate.


  1. Has written a compiler or OS for fun.
    That'd be a yes.
  2. Resume compiled from latex.
    Actually, from hand made xsl to latex. There was a time when I was all excited about single source publishing, so that's what I did here. Xml to html/pdf/txt. (Last time I checked the latex->html bridge was seriously lacking anyway.)
  3. Contributes to open source software.
  4. Has written compiler or OS for class.
  5. Has blog discussing programming topics.
    You're here.
  6. President of programming/robotics/engineering club.
  7. Participated in programming/robotics/engineering contest.
  8. Internship at Google or Microsoft.
    They know where to find me.
  9. Has written non-trivial programs in dynamic languages (perl/python/ruby).
  10. Knows 3 or more programming languages.
  11. Previous position demonstrates similar skills.
    Not really.
  12. Has internship.
  13. Founded a company.
    Only a pretend company, and we haven't been active for about 10 years.
  14. Personal web page uses Rails, PHP or Asp.Net.
    Been meaning to switch from PHP to Python, but there's just no pressing need for it.
  15. Email address at own domain.
    Not since 2005.
  16. Has modified programs in dynamic languages (perl/python/ruby).
    That's how I started out with dynamic web stuff in 1999, found perl scripts and tried to mod them without breaking them.
  17. Has personal web page.
  18. High grades, top of class, etc.


  1. Won scholarship.
    Have never applied for one.
  2. Lists job at fast food chain.
    Haven't had the pleasure.


  1. Looks kind of drunk in facebook picture.
    One of [apparently] few specimen in the human race who don't find unending ecstasy in alcohol.
  2. Has Ph.D.
    Not so much.
  3. Generic cover letter.
    Might be tempted.
  4. Mentions skills in Excel/Word.
    Over my dead body etc.
  5. Spelling or grammar errors on resume.
    My typing is a bit dodgy, but I tend to proofread.
  6. Resume font too small.
    Let's hope not.
  7. All programming experience in class.
  8. Knows only 1 programming language.
    Once upon a time.
  9. Resume more than three pages long.
    I try to make it in two.
  10. Includes irrelevant objective section.
    Never knew what the point was of that one.
  11. Took certification course in a technology.
    Never occurred to me.
  12. Low grades in relevant courses.
  13. Lists visual basic experience first.
    Don't have any.
  14. Topless in facebook picture.
    Only by mistake.
  15. Resume uses combination of tabs and spaces to indent sections.
    I'm clean, narc. (Actually, if you use Tab in vim with wildmenu, it's tricky to type a tab without a space first, because it will try to auto complete the current token. Haven't tried to fix that yet.)

I timebox and so can you

August 10th, 2010

Axiom: SRS is by far the most effective vocabulary learning method I've ever seen.
Corollary: There is no way I could have learned nearly as much vocabulary with my usual laid back attitude.

Don't get me wrong, I'm happy spending most of my life propping up the idea that "if a word wants me bad enough it will find a way to attach itself". It's modest reward for scarce effort and I like it that way. There is so much more worth doing in life than learning lots of words. But there are times when a quick uptake of vocabulary is pretty crucial, namely in the opening stages of a new language. It's when you put in a lot of effort and consequently, where seeing results matters a whole lot. But it matters not only for motivation. It also greatly impacts the quality of your early learning process if you can absorb the core vocabulary quickly.

I found this out last year when I was starting on Italian and I realized I had learned lots of not-all-that-interesting-but-important words that would have taken me several times longer without SRS. To my good fortune, I knew about Anki and I had read enough plaudits to try it.

Still, there is a problem. Anki may be effective, but I wouldn't call it fun. In fact, it's awfully tedious. So much so that even though I appreciate how helpful it is, most days I just can't persuade myself to click the icon that launches it. I get little thrill from returning to the same words that I saw yesterday and couldn't remember then. Plus the interaction itself is highly tedious; clicking those buttons like it's some kind of psychological survey, trying each time to pick the most appropriate choice.

Making it more fun

Alright, how? Khatzumoto writes about timeboxing and SRS tirelessly, and after reading through most of that I was ready to try it out. The idea is to go from "man this is boring, how much longer?" to "I only have 5 minutes, how much can I get done?" and it sounds like it's never going to work. And yet.. okay, have you ever stayed at a nice place on vacation for just a bit too long, so much so you get bored? The idea is to leave wanting more, it's basic showmanship. Timeboxing, believe it or not, adds that element of urgency to the mix. You give yourself 5 minutes for Anki and that means you only have 5 minutes, however many decks you have.

Yeah, it's weird. But here's what it looks like to me. Before timeboxing I would start up Anki, gaze at all the decks I have and all the cards that are up for review and sigh. "What a pain it's gonna be to review all that." Now, all of a sudden, I have a different reaction. "Alright, I have all these decks, which one do I most want?" Then I start on one and keep going for a while, but not for too long since I also want to cover other ground. The 5 minutes is almost up and I still want to get more done so it ends up being 7 minutes. 7 minutes,  which prior to timeboxing, seemed like a century of Anki.

Decks - how to plan them out?

You could just put everything in one giant deck, but I don't like that idea. I did that at first and I found out that I like to have some notion of the context where the information was from. Is it from a textbook, a vocabulary list on a particular topic, from reading or watching stuff (ie. passive learning) or what? That gives me a choice; I can pay close attention to some vocabulary set that's important right now. And if a particular deck is just annoying me I can remove it completely.

It's also a way to manage my morale. If I review lots of cards from a tough deck and I can't remember anything (ie. the thing that makes me unhappy), I can counter that with an easy deck where I win easy points.


Steven Pinker: The stuff of thought

August 8th, 2010

The title is too vague and not very good. The subtitle is much better: Language as a window into human nature.


Go read it!

Dry and minute at times, but very rich. Don't get discouraged by the wikipedia page, it doesn't begin to do justice to the content between the covers. Neither does Pinker's google talk where he's selling the book and gives a superficial impression of it.

A non-exhaustive feature list:

  • Why can you not interchange "give her a hand" with "give a hand to her"? Is there any logical explanation for these quirks? (Answer: yes. In fact, language is far more logical and far less arbitrary than we imagine most of the time.)
  • Linguistic determinism. Is it really true that our language/vocabulary is the language of our thoughts, and thus it can empower/limit what we are able to think? (Answer: no.)
  • Why do you say "the stars are out" when you can see them and "the lights are out" when you can't see them? (Answer: don't remember the explanation for this one, but it's a neat example, no?)

And so on. But Pinker doesn't just explain a whole bunch of riddles, he reaches deep into a whole range of topics, like how language routinely states time in terms of distance (and vice versa), how language expresses causation (and what this implies for our perception of causation) and many more such topics. In short, he does exactly what he's promised to do, he shows how language codifies our human nature.

If I never read a second book about linguistics, I still think this one will have been a pretty good choice.