Tuesday, September 11, 2007

Pythonic programming and the "self" keyword

Bruce Eckel recently posted an article which expressed seeming disappointment in the direction Python 3000 was heading. Particularly, he seemed to think the grandeur of a four digit version name was too much for what new things Python 3.0 will bring about. I can't really say that I care about some of his points (the GIL argument is one that's popped up on Reddit several times in the past few days), and obviously I only care to comment on one topic: the self keyword in Python. In his original post, he says the following:

This is something I really hoped to see in Python 3K, but the beloved self seems to be hanging on.

self is inappropriate noise in a language that lays claim to clarity and simplicity. No other mainstream OO language requires it everywhere like Python does, and it's a hurdle for people who try to come to Python from those languages. Maybe it's a significant reason that Java programmers seem to be more comfortable with Ruby; Ruby takes care of it for you just like C++ and Java do.

And posts later in a reply:

Exactly, and it's not the writing but the reading. Python generally makes code that's easier to read, but 'self' is an intrusion.

And parroting "explicit is better than implicit" is a misuse of that maxim. All languages provide abstractions; Python (generally) produces clear abstractions that tell you what's going on -- these abstractions are explicit in "the right places." But 'self' is something we don't need to see inside classes. You're in a class, so 'self' or 'this' can be implied, just as it is in every other OO language I know of. Ruby, I think, has it right on this one.

Some of you who have really long memories or have too much spare time on your hands would remember that I actually mentioned this topic in a blog post a long time ago. Obviously, with that mentioned, I disagree on both counts with Bruce's reasoning.

To begin with, I wonder just how much of a hurdle learning Python is because of a required self for referencing instance members. While they may not be required in other languages, they most certainly exist, usually in the form of this rather than self. Is requiring self really a hurdle? In terms of using it inside method bodies, I don't think it's a very big deal. Specifying that a variable belongs to an object makes it more clear as to what's going on, especially in a dynamic language where variable declarations don't exist. Even Ruby, as mentioned in the thread, uses @ as a scope resolution operator.

Bruce also argues that the "explicit is better than implicit" maxim is wrongly cited as a reason here, claiming that implicitness improves readability. I definitely don't agree here, either. In C++, Java, and C# (and probably more?), classes are well defined with what members they have, so implicitness works fine. In Python, since you don't really have instance variable declarations, being implicit seems like a bad idea. If you're maintaining someone else's code, you'd have to make sure all of your local variables weren't actually instance variables, lest ye overwrite them and screw something up. GvR's response also indicates a technical challenge in removing the required self.

Brandon Corfman also responded, saying:

The problem is that the 'self' plague doesn't stop there ... don't forget that self is required as the first parameter of every class method. For example, the following code that forgets to use self as the first parameter of printInfo:

class MyClass(object):
    def printInfo(s):
        print s
   
def main():
    m = MyClass()
    m.printInfo('Hello')

When running main, you get the following traceback from Python:

Traceback (most recent call last):
  File "", line 1, in 
  File "hello.py", line 7, in main
    m.printString('Hello')
TypeError: printString() takes exactly 1 argument (2 given)

Except I am giving one argument! What second one is it talking about? Oh yes, the interpreter is expecting 'self'. Totally confusing and brain-dead requirement. Why does this need to be the first parameter of every class method?

I think explicitly using self as the first argument is a good idea, to some extent. It's kind of quirky (I'm pretty sure you can name it whatever you want, and it would still work), but having self be in the parameter list shows C programmers (and other, maybe more inexperienced OO programmers) how instance methods basically work. They aren't some voodoo magic higgity biggity, they really just pass the object as an invisible first argument to the method. In the above case, m.printInfo() is really just shorthand for MyClass.printInfo(m). Python's explicit requirement of self as the first argument makes it so that it's clear to people what the difference between instance and static methods are (even though they're different in Python than Java, et al). The exception message is still kind of confusing until they make the revelation of that m is basically the first argument, but this concept is probably something most programmers who deal with OO should know.

4 comments:

papavb said...

you seem irritated dan, got some sand in your programming vagina?

jk, I agree on the scope thizang, it's not at all a hindrance, not even a minor encumbrance.

Anonymous said...

Thanks for this post. Until now, every single argument I've seen for an explicit self parameter has always been quite vague, and often times are more dealing with self.attribute notation (which I have no problem with). I do agree that it's "quirky" to call self as a first parameter (receive it, as you say). But at least you finally gave a damn good explanation.

That said, it should be upfront and easy to find this info instead of me having to google and read half a dozen pages just to find an answer. Being that a lot of people don't get the concept, you'd think the documentation could have something as simple as this listed as the reason why.

Victor said...

If there is no self parameter in the method, it means you don't access the instance members via dot notation. This means that __setattr__ and similar will not work. What i say, is that 'self' convention might have deep connections with other Python mechanisms.

http://stackoverflow.com/questions/4432376/a-modules-setattr-and-getattr-when-accessing-globals

Kalomegh said...

means you don't access the instance members via dot notation. This means that __setattr__ and similar will not work. What i say, is that 'self' convention might have deep connections with other Python mechanisms.

http://stackoverflow.com/questions/4432376/a-modules-setattr-and-getattr-when-accessing-globals