python bytecode: classes

August 11th, 2013

New to the series? The previous entry was part 2.

Today we're going to see how classes work. Classes are a bit tricky as they provide the glue that makes functions into methods. They are also created dynamically, as we saw last time, so they must have some compiled bytecode that gets executed when the class actually gets created.

We'll do this in several steps, because unlike functions classes don't come with code objects that we can inspect directly. So we'll use some trickery to see what happens step by step.

Executing a class body

Before we look at a class in constructed form, let's poke around its namespace.

We'll use this simple class as a running example.

import dis

class Person(object):
    default_age = 7

    def __init__(self, age):
        self.age = age or self.default_age

    def had_birthday(self):
        self.age += 1

    def how_old(self):
        return self.age

    # throws NameError if these are not bound
    default_age, __init__, had_birthday, how_old

    # We'll see what this function looks like
    dis.dis(how_old)

# disassemble
 13           0 LOAD_FAST                0 (self)
              3 LOAD_ATTR                0 (age)
              6 RETURN_VALUE

What happens here is that we let all the class members get defined. This is basically like making bindings at the module level. In this scope there is nothing to suggest that we are inside a class body. (Well, except that the name __module__ is also bound to "__main__", which would not be the case at module level.)

We disassemble one of the functions to show that there is nothing special about it - it doesn't contain any special sauce that would reveal it to be a method instead of a function. self is a local variable like any other that just happens to be called self.

The class code

As mentioned there is no way to get at the code given a class object, but what we can do is put the class definition in a function body, retrieve the function's code object - which we know contains all the constants and variables used in that function. One of those is bound to be the code object of the class-to-be-constructed.

import dis

def build_class():
    class Person(object):
        default_age = 7

        def __init__(self, age):
            self.age = age or self.default_age

        def had_birthday(self):
            self.age += 1

        def how_old(self):
            return self.age

cls_code = build_class.func_code.co_consts[2]
dis.disassemble(cls_code)

# disassemble
  4           0 LOAD_NAME                0 (__name__)
              3 STORE_NAME               1 (__module__)

  5           6 LOAD_CONST               0 (7)
              9 STORE_NAME               2 (default_age)

  7          12 LOAD_CONST               1 (<code object __init__ at 0xb7533ba8, file "funcs.py", line 7>)
             15 MAKE_FUNCTION            0
             18 STORE_NAME               3 (__init__)

 10          21 LOAD_CONST               2 (<code object had_birthday at 0xb7533c80, file "funcs.py", line 10>)
             24 MAKE_FUNCTION            0
             27 STORE_NAME               4 (had_birthday)

 13          30 LOAD_CONST               3 (<code object how_old at 0xb7533e30, file "funcs.py", line 13>)
             33 MAKE_FUNCTION            0
             36 STORE_NAME               5 (how_old)
             39 LOAD_LOCALS         
             40 RETURN_VALUE

So we reach into the function's code, into it's co_consts tuple and grab the code object. It happens to be at index 2, because index 0 is None and index 1 is the string "Person".

So what does the class code do? Just like at module level, it binds all the names in its namespace, and it also binds the name __module__, because a class is supposed to know the module it's defined in.

And then? Once all those bindings have been made, it actually just returns them. So basically the class code builds a dict and returns it.

This helps complete the picture from last time. To recap, at module level the code first a) calls the class as if it were a function with CALL_FUNCTION (to which this "function" returns a dict, as we've just seen), and then b) BUILD_CLASS on that return value (ie. on the dict), which wires everything together and produces an actual class object.

Methods

Okay, now let's find out something else. We know that functions and methods are not the same type of thing. What about their code? We saw before how a function defined in a class body has no signs of being a method. Has it changed during the construction of the class? A function object replaced by a method object perhaps?

import dis

class Person(object):
    default_age = 7

    def __init__(self, age):
        self.age = age or self.default_age

    def had_birthday(self):
        self.age += 1

    def how_old(self):
        return self.age

dis.disassemble(Person.how_old.func_code)

# disassemble
 13           0 LOAD_FAST                0 (self)
              3 LOAD_ATTR                0 (age)
              6 RETURN_VALUE

The answer to that is no. The object is unchanged. In fact, the object stored in the class is a function. That's right, a function. Try reaching into Person.__dict__ to get it and what you get is a function object. It isn't until you do an attribute access on the class object (Person.how_old) that the method object appears, so the method is like a view on the function, it's not "native".

How does that work? You already know: descriptors.

func = Person.__dict__['how_old']
print func
# <function how_old at 0xb749e994>

print Person.how_old
# <unbound method Person.how_old>
print func.__get__(None, Person)
# <unbound method Person.how_old>

person = Person(4)
print person.how_old
# <bound method Person.how_old of <__main__.Person object at 0xb73f5eec>>
print func.__get__(person, Person)
# <bound method Person.how_old of <__main__.Person object at 0xb73f5eec>>

Function objects implement the descriptor protocol. Getting the function through the class (ie. getting the method) is equivalent to calling the function object's __get__ method with that class object as the type. This returns an unbound method (meaning bound to a class, but not to an instance of that class).

If you also give it an instance of the class you get a bound method.

So there you have it, classes and methods. Simple, right? Well, ish. One last thing: is the bound/unbound method object created on the fly? As in: does Python perform an object allocation every time you access a method? Because that would be... bad. Well, it doesn't. At least as far as the user can tell, it's always the same object with the same memory address.

:: random entries in this category ::