The Zen of Python and Me
31 Mar 2012, 10:07 p.m. (updated: 14 Apr 2012, 2:17 a.m.)
import this is a brilliant easter egg in the Python interpreter. In just a few words it catches the philosophy behind what is considered "pythonic". One might argue that it describes the essence of what is so great about the language, its libraries and its community.
But for it to be more than just a bunch of aphorisms, a programmer has to ask herself: how do I conform to those guidelines in my daily work? I asked myself this question the other day and here are a few thoughts and opinions. Beware that this is at times very meta.
Beautiful is better than ugly.
In "Zen and the Art of Motorcycle Maintenance" Robert M. Pirsig writes about form and function. While most of his work discusses Quality, there is a part where he talks about perception. You can either see and name things by means of their form, or you can see and name them by considering their function.
A screwdriver can either be a special tool with a plastic handle that you buy in a specific store, or a piece of narrow metal you can stick into screws and turn around. Even the nicest looks won't help if the tool doesn't fit the screw. In that case an improvised piece of metal can get the job done much better. What does "beautiful" mean then?
For me, the key thing is that the tool performs its function, and it does it well. But having a choice among tools of similar performance one can still recognise some of them as more attractive than others. How do you know which are which?
In case of Pythonic code, "The Zen of Python" provides a good checklist on what to watch for. Beautiful should give you the impression of consistency, brilliance and predictability. But most importantly, it provides a solution that does the job well.
When using Python there's also one more thing. When your solution looks clumsy or the code is hard to read, it's quite possible that there is a better, "more beautiful" way to do it. This is especially true for programmers that come to Python with prior experience and expectations. Consider:
Python is not Ruby is not Java is not C++ is not Lisp is not Haskell.
Explicit is better than implicit.
You don't want to have to think about it. You just want to see it. Explicit is obvious and unambiguous. Explicit means fewer strings attached .
Clearly, explicit is the way to go. So why is implicit so seductive? Because explicit often seems ugly. It seems spurious, redundant and bloated. On the other hand, implicit seems clean, minimalistic and smart.
Looking at a single implicit solution in isolation makes it look preferable to the explicit alternative. But as soon as you gather enough implicitness in a single place, suddenly it no longer looks clean and smart. It starts to look like magic, with all its unpredictability. It's no longer deterministic because you lose the ability to see it through. At best you can call it a weather system.
When I program things, they always end up more complex than I anticipated. I run into problems and make mistakes. When I do, and the application stops doing what I thought it should, I prefer determinism. I want predictability and consistency. I don't want to have to think about it. I just want to see it.
Simple is better than complex.
Simplicity is a choice. It might not always be the best but it usually is. What would you rather have? A simple implementation solving 90% of the use cases or a complex one solving 100%? That obviously depends on the case but additional implementation complexity should better be well justified. After all, you will have to maintain it and you will have to explain it to other people (including the future you when there's a sudden need to change or fix something after 2 years).
Complex is better than complicated.
You can't keep things simple forever as scale increases. It's also easy to lose the original thought behind the design along the way. That can be either because of rush and poor decisions or because you start designing for problems which didn't arise yet. It's nice to build things that anticipate future needs. But it's nicer yet to build things that are used to their fullest potential.
"Perfection is achieved not when there is nothing left to add, but when there is nothing left to take away."
— Antoine de Saint Exupéry
Flat is better than nested.
The Linux kernel is written in a coding style where indentations are 8 characters long. Sometimes people argue that this is too much and after a couple of levels of indentation it gets hard to read. Linus Torvalds answers these objections:
The answer to that is that if you need more than 3 levels of indentation, you're screwed anyway, and should fix your program.
Although I sometimes fail to stick to just a few levels of indentation, I think this is great advice. Especially in Python where indentation denotes scope and the deeper you are, the larger the state you have to keep in your head to make sense of the code.
But there's more than just nested levels of indentation. Programmers generally love to put things in hierarchies. They adore hash tables, tree structures and graphs. But along the way it's easy to enforce hierarchies where none were necessary. If you have a choice, and most of the time you do, keep things flat. This kind of design won't be as smart but on the other hand will be easier to reason about, simpler and more predictable. Have you noticed the pattern yet?
Sparse is better than dense.
At first I thought this simply applied to formatting where:
delta = round((new - old) / 2)
is preferable over:
But it bugged me that there's a separate generic remark just below that addresses formatting just as well. When I presented The Zen of Python to others I felt I missed something here. So eventually I went to ask the gurus. R. David Murray said it well so I think I'll just quote him verbatim:
It's about more than formatting. You should strive to minimize the number of concepts the programmer needs to keep in his head at once in order to understand a given bit of code.
Actually, it probably applies more widely than just "concepts", and if you generalize that term you wind up including the fact that the formatting shouldn't be too dense either :)
There's an old Chinese proverb that goes like this: "Code is read much more often than written". Or better yet:
"Programs should be written for people to read, and only incidentally for machines to execute."
— Harold Abelson and Gerald Jay Sussman, "Structure and Interpretation of Computer Programs"
Special cases aren't special enough to break the rules.
At first I thought this meant "if you find a special case, disregard it". Now I think it rather means: "if you find enough special cases, the rules chosen in the first place are poor". In that sense, it's an endorsement of refactoring and conscious design.
Although practicality beats purity.
So yeah, refactor and redesign. But don't overdo it. Remember that simple is better than complex. Nobody cares if it's more pure if it's harder to use.
Errors should never pass silently.
Failing fast and loud is another case of being explicit...
Unless explicitly silenced.
... but the reason for loud errors is minimising surprises. So when you already anticipate error scenarios in a specific piece of code, you can implement successful recovery or let the interpreter ignore the error case.
When you do silence exceptions, be careful not to overdo it. Never use bare except constructs like:
try: a block of code except: pass # catches KeyboardInterrupts, etc.
If you think except Exception is better, consider that it also catches MemoryErrors and lots of other exceptions you probably don't want to silence. When you catch yourself doing that, it most likely means you're not sure what can go wrong. In that case explicit is better than implicit. Don't hide what you don't know is there.
Using too large try-except blocks is a similar trap. When you catch a KeyError in a block of several tens of lines, it's very likely you'll hide a bug. So instead of:
try: value = dictionary["key"] # ... 50 lines of code ... except KeyError: pass
try: value = dictionary["key"] except KeyError: pass else: # ... 50 lines of code ...
In the face of ambiguity, refuse the temptation to guess.
Should there be an implicit default value? Maybe yes, maybe not. But when I say: "I'm sure everybody wants this value to be the default" all that really means is "I'm sure I want this value to be default". And when somebody says: "Nobody in their right mind would use it that way" all they mean is "I wouldn't use it that way because I haven't thought of a use case for it". The community is really diverse and assuming how people use the technology is bad guesswork.
Should my code guess about the input it receives? Should it guess what the user intent is? Even if the guess was right, that would be dangerous implicitness. Python 2 sometimes does guess when it shouldn't, for instance in how Unicode and bytestrings implicitly coerce to each other at runtime. Python 3 fixes all that (and then some).
There should be one — and preferably only one — obvious way to do it.
That's downright contrary to what some other languages endorse. But think about it, the terrible, unreadable and scary code you just read may very well be yours, written a couple of months or years ago. Add to that more programmers and their own history. Do you really want everybody to have maximum freedom of expression, doing stuff the way they like at the time?
I love reading code by others which looks very much like my own. I like reading my own code from time ago and still comprehend it. Python lets me do that because it goes far in terms of providing preferable ways to do things. This means they become recognisable idioms. Once you have enough idioms, suddenly you start thinking in larger cognitive blocks, which lifts you to a yet higher level of abstraction.
Although that way may not be obvious at first unless you're Dutch.
When I first started contributing to Python I focused on the low-hanging fruit, e.g. things I considered long lasting problems with trivial solutions. Under closer examination each and every one of them turned out to be far from trivial. Things look simple from afar. When I considered a problem closely from a couple of perspectives, the solution I initially viewed as obvious started to look wrong and naive instead.
One day I discussed the fact that defaultdict takes a lambda instead of just the value you want to use. It seems suboptimal to use:
defaultdict(lambda: 0) defaultdict(lambda: None)
instead of just:
It takes a while to notice that actually the latter form wouldn't work when a mutable object would be passed, e.g. defaultdict() would always use the same list for each missing key. defaultdict(lambda: ) makes sure that for each case the list will be brand new. Obviously you could try to provide special cases for that but now it doesn't look so obvious, simple and trivial anymore, does it?
Guido, Fred, Raymond, Barry, Benjamin, Georg and others have tons of experience and ingenuity that enables them to see through such seemingly easy cases. This sometimes leads to solutions that don't give the nicest first impression but in fact are the best solution available given the circumstances.
Now is better than never.
I will do that better in the next version. This current one is clumsy and primitive but as soon as I finish designing this generic metaframework, everything will be better. You know, this second version will be much more configurable and will support a variety of workflows. There are still some problems left with the design and I hardly have time for it. But all of it shall be sorted out real soon now. So you know, these changes you need in the current version? Ain't happen. Just wait for the real thing.
Although never is often better than right now.
Even the seemingly simplest decision may leave us with a maintenance burden for years to come. This is especially true for a piece of software that is used as widely as Python. Moreover, with such a diverse community, it's easy to occasionally overlook the needs of a group of users who are unlike myself. If I act too hastily, that might leave me with a headache.
This sounds very obvious but in fact judging when a design decision is though over well enough is not trivial. That's why in Python we have the python-ideas mailing list for openly discussing fresh ideas. Later the discussion moves to the python-dev mailing list. But when things get really serious, someone writes a PEP (a Python Enhancement Proposal). This document is a rather formal report on what the problem is, what the possible solutions are, what the advantages and disadvantages of each solution are, how the research looked like, etc. The document usually gets edited several times before it's considered final. At each phase of the discussion the whole idea might simply get rejected. If it does, it's not because of ill will but because the process discovered real problems with the idea with no clear solutions.
If the implementation is hard to explain, it's a bad idea.
This sounds brave and a bit too simplistic. But one of the design principles of Python is: the best solutions to a given problem should look simple and concise. This is even pushed so far that Guido favors making the least performant approach to be visibly more verbose in code. For instance, in discussions on Ruby's advantages an often cited one is the Uniform Access Principle which basically means every message sent to an object is a method call. Attribute access is thus also done by means of calling methods under the hood. This leads to nice shortcuts, e.g. methods can be called without parentheses. Moreover, this reduces the vocabulary required to successfully interface with objects. Python also supports custom implementations of the get/set/delete operations on attributes by means of its @property decorator. However, by default there is a distinction between a function call and accessing an attribute. The reason for that is that method invocation visibly shows the programmer that "this action may take a significant amount of time" whereas attribute access is assumed to be fast. Moreover, in that case a method is also just an object attribute that can be passed, set or deleted. The main difference from other attributes is that it's callable. In the compromise between smart and predictable Python leans towards the predictable.
Getting back to the point though, how do you evaluate whether an implementation is hard to explain?
When you come from the Java world, you expect a body of documentation. One of the things that look unusual is how there's hardly any documentation generated straight from the source code. While there are JavaDoc-like tools for Python, the community favors Sphinx and narrative documentation. You wonder why that is and suspect it's not really solid and well thought over. Not very trustworthy in fact.
In reality though, the Python community discovered that auto-generated documentation very often gives you an illusion of comprehensive documentation. While a method-by-method API reference is useful, it doesn't cover the "how to?" of things, the higher picture. And very often it's desirable to use a custom ordering of classes and functions in a module or members in a specific class to help a newcomer to comprehend the thing.
And yet a better solution is skip the "how" of things altogether and just document "what" and "why". Because in Python, to know how things really work, you are expected to read the source code. Don't worry though! Remember this is a language that makes 3rd party code look like yours.
That's true also for the interpreter itself and the standard library, even though the published documentation is comprehensive. When you really want to know what happens under the hood, you simply pop it and examine. You can even put pdb.set_trace() invocations and print() functions . Don't worry, the sky won't fall on your head. This open source thing - it's like magic!
Getting back to the point though, how do you evaluate whether an implementation is hard to explain?
In my opinion, the simple answer to that is: whenever you feel like you have to explain the "how" instead of focusing on "what" and "why".
If the implementation is easy to explain, it may be a good idea.
"For every complex problem, there is a solution that is simple, neat, and wrong."
— H. L. Mencken
Namespaces are one honking great idea — let's do more of those!
But never is often better than right now. Think about what you're doing. One of the Zope guys (was it Martijn or Jim?) said once that he regrets choosing the zope namespace package because it's too long to type. Note, it's only 4 characters and still considered too long for many. Currently an increasing number of Zope packages is using the zc namespace because of that reason. It's 50% shorter, a significant gain.
In all seriousness though, sometimes namespaces hide where you don't expect them. Thread locality is a namespace too! A function also defines a namespace for its local variables. Every object is a namespace for the data it holds. There are many other examples.
Increasing the number of namespaces is often a nice solution to scalability or concurrency problems. For example, nowadays it seems people no longer rant about the GIL because they already learnt that using multiprocessing or other architectures with processes rather than threads is preferrable and completely solves the problem.
Also, when you hear gurus complaining about excessive global state somewhere, they are really noticing that there should be more namespaces there. Why this is a problem depends on the case but usually excessive global state makes it hard to test things, run multiple configurations within the same process, customise or extend such code.
Speaking of which, I find it hard not to start a rant on why Singleton is really an anti-pattern but this is not the time and place.
Summing it all up
"The Zen of Python" would not be complete without a single of its statements. It's also shocking how interconnected they are. One could argue that the ideas of simplicity, predictability, maintainability and sheer aesthetic beauty would constitute a great philosophy for much more than a programming language. But that would be a bit crazy... right?
|||There always are strings attached.|
|||Or print statements for those poor bastards who didn't discover the One True Way. Remember, print() is a function, always should have been. Use the print_function future to change your muscle memory already. You'll thank me later.|