What single-dispatch generic functions mean for you

June 5, 2013, 11:50 a.m.

PEP 443 has just been accepted. It brings a uniform API for creating and managing single-dispatch generic functions in Python 3.4. If the last sentence read to you like some Haskellish solution to a self-inflicted problem, this post is here to explain how generic functions may help you to organise real world code.

When you're confronted with a nested conditional in Python, your code quickly starts to look ugly. This would be even more evident if there was no elif, which is syntactic sugar hiding the fact that each subsequent condition is really a nested if statement. This kind of logic is sometimes necessary. The standard library provides many examples, for instance zipfile.py:

def _get_compressor(compress_type):
    if compress_type == ZIP_DEFLATED:
        return zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
            zlib.DEFLATED, -15)
    elif compress_type == ZIP_BZIP2:
        return bz2.BZ2Compressor()
    elif compress_type == ZIP_LZMA:
        return LZMACompressor()
    else:
        return None

or smtpd.py (a simplified example, original here):

def smtp_HELP(self, arg):
    if not arg:
        self.push('250 Supported commands: EHLO HELO MAIL RCPT DATA '
                  'RSET NOOP QUIT VRFY')
        return

    extended = ' [SP <mail parameters]'
    lc_arg = arg.upper()
    if lc_arg == 'EHLO':
        self.push('250 Syntax: EHLO hostname')
    elif lc_arg == 'HELO':
        self.push('250 Syntax: HELO hostname')
    elif lc_arg == 'MAIL':
        msg = '250 Syntax: MAIL FROM: <address>'
        if self.extended_smtp:
            msg += extended
        self.push(msg)
    elif lc_arg == 'RCPT':
        msg = '250 Syntax: RCPT TO: <address>'
        if self.extended_smtp:
            msg += extended
        self.push(msg)
    elif lc_arg == 'DATA':
        self.push('250 Syntax: DATA')
    else:
        self.push('501 Supported commands: EHLO HELO MAIL RCPT '
                  'DATA RSET NOOP QUIT VRFY')

Other languages, like C, provide more forms of expressing multiple conditions (e.g. switch). In Python, you're often advised to move the code present in distinct branches to separate functions and put them in a dictionary. For example, the code above could be rewritten like this:

def _ehlo(self):
    return '250 Syntax: EHLO hostname'

def _helo(self):
    return '250 Syntax: HELO hostname'

def _mail(self):
    msg = '250 Syntax: MAIL FROM: <address>'
    extended = ' [SP <mail parameters]'
    if self.extended_smtp:
        msg += extended
    return msg

def _rcpt(self):
    msg = '250 Syntax: RCPT TO: <address>'
    extended = ' [SP <mail parameters]'
    if self.extended_smtp:
        msg += extended
    return msg

def _data(self):
    return '250 Syntax: DATA'

def _default(self, arg=None):
    code = '501' if arg else '250'
    return (code + ' Supported commands: EHLO HELO MAIL RCPT '
                   'DATA RSET NOOP QUIT VRFY')

smtp_help_impl = {
    'EHLO': _ehlo,
    'HELO': _helo,
    'MAIL': _mail,
    'RCPT': _rcpt,
    'DATA': _data,
}

def smtp_HELP(self, arg):
    try:
        func = self.smtp_help_impl[arg]
    except KeyError:
        self.push(self._default(arg))
    else:
        self.push(func(self))

This version of code has only one significant disadvantage: it's more verbose. Otherwise it's nicer because:

PEP 443 starts with defining what generic functions are. Basically, they are functions executing different code depending on the arguments passed. If they decide on a single argument, they're called single-dispatch generic functions.

Do you see that we refactored stmp_HELP() to a single-dispatch generic function dispatching on the value of arg?

Dispatching on the type

PEP 443 forms a framework that lets you dispatch on the type of the first argument provided to the function. You define a generic function by decorating it with @singledispatch:

>>> from functools import singledispatch
>>> @singledispatch
... def fun(arg, verbose=False):
...     if verbose:
...         print("Let me just say,", end=" ")
...     print(arg)

Then you add additional implementations by using the register() attribute of the generic functions, which is a handy decorator as well. Just pass the type parameter to it first:

>>> @fun.register(collections.MutableMapping)
... def _(arg, verbose=False):
...     if verbose:
...         print("The items in this mapping are:")
...     for k, v in arg.items():
...         print(k, '->', 'v')

When called, the generic function dispatches on the type:

>>> fun("Boom.", verbose=True)
Let me just say, Boom.
>>> fun(collections.defaultdict(lambda: 0, (('a', 'b'), ('c', 'd'))))
c -> v
a -> v
>>> fun(locals(), verbose=True)
The items in this mapping are:
__loader__ -> v
__package__ -> v
__builtins__ -> v
collections -> v
fun -> v
atexit -> v
_ -> v
__doc__ -> v
__cached__ -> v
__name__ -> v

For a long time the understanding of duck typing in Python was that you should not ask an object for its type. While valid use cases for type checking always existed [2], this was largely considered an anti-pattern in Python.

This is still true but with the advent of abstract base classes in Python 2.6, it's no longer perceived so poorly to use type information to decide which code to execute. We always considered an object to be a duck if it quacked like a duck. Raymond Hettinger famously said during his PyCon 2013 keynote that nowadays it's enough that "it said it was a duck". This is compatible with classical duck typing because you can override isinstance() and issubclass() checks with abstract base classes, which is a very powerful tool.

As the example above shows, PEP 443's @singledispatch embraces abstract base classes as well. As you can see, if there's no implementation found for a specific type, its method resolution order (including the abstract base classes) is consulted to find a more generic version. In other words, if you pass to a generic function an object of a Duck type which subclasses Animal, the dispatch will do as follows:

  • search for an implementation for Duck; failing that
  • search for an implementation for Animal; failing that
  • search for an implementation for object, which is by default the original function decorated with @singledispatch.

All details can be found in the PEP. Most importantly though, thanks to some smart caching involved, after the first costly type check, all other calls to the function are only a single hash lookup away. In other words, very fast.

A couple of use cases

The Django Admin uses a form of type dispatch, too. But if it was implemented using @singledispatch, you would be able to register an admin class for an abstract model and automatically get admin support for all models subclassing said abstract model.

Speaking of ORM models, anytime you use a custom form for a specific model, you could write the form getter using @singledispatch. This is especially useful because if the app you're writing is designed to be reusable, with @singledispatch you enable your users to register new implementations for models you never heard of.

The same thing can be said about UI widgets, serialization formats, protocol handlers, and many other constructs where the behaviour differs between supported types.

For the standard library, I'm planning to update pprint so that you'll be able to extend it with support for your own types. Stay tuned.

What about value dispatch?

Dispatching on the value of an argument is largely uninteresting from a design standpoint. All that's needed to support it is a dictionary and you're good to go. In fact, we did exactly that in the SMTP example. However, this is not generic enough to advertise it as the solution. For other use cases, it might be better to use an ordered data structure (take for instance Django's URL dispatching) or have custom ways of resolving more generic implementations if there's no exact match (for example transport adapters in requests use "the longest match available").

Providing a framework to support all those needs is complex and generally comes down to predicate-based dispatch, which is as generic as it gets. If you want with an engine for this kind of thing, try PEAK-Rules. Python might one day get a standard way to do that. This time, to dodge the bikesheds flying my way, I focused on the simplest variant that makes sense and is useful in a wide variety of use cases.

Type dispatch with a value override

What if we're happy with type dispatching in general but we need to special-case a single value? Well, I already hear the sirens of the abstraction police, so let me give you an example. But first, a slight detour.

Python just gained an enum type

I like enum types so I looked forward to that addition. Moreover, looks like the design which Barry, Eli and Ethan chose for the built-in implementation is ingenious. For all the nitty-gritty details I forward you directly to PEP 435 but let me just state the most important thing:

  • the enum type (say, Color) is a class;
  • the values of the enum (e.g. Color.green) are instances of this class.

When I first learned about this, I immediately got excited because one of the reasons I find enums to be so useful is that they can encapsulate value-specific information. For instance, if your website provides a couple of themes, you could hold the styling information along with the theme enum:

class Theme(Enum):
    blue = 1
    graphite = 2
    skeuomorphic = 3

    def css(self):
        """theme.css() -> (css_file, container_class)"""
        if self is Theme.blue:
            return ('aqua.css', 'default')
        if self is Theme.graphite:
            return ('graphite.css', 'default')
        if self is Theme.skeuomorphic:
            return ('aqua.css', 'skeuomorph')
        raise NotImplementedError("No css for {}".format(self))

This is nice because in actual use your code doesn't care about the value of the enum, just about the context information it provides:

def some_view(request):
    theme = Theme(request.session['theme'])
    context = {'css': theme.css()}
    return render(request, 'view.html', context)

Look, the user code requires no conditionals at all! If only we could skip the conditionals while defining the css() method...

Generic functions on enums

For instance, something like this would be nice:

class Theme(Enum):
    blue = 1
    graphite = 2
    skeuomorphic = 3

    @valuedispatch
    def css(self):
        """theme.css() -> (css_file, container_class)"""
        raise NotImplementedError("No css for {}".format(self))

    @css.register(blue)
    def _css_bl(self):
        return ('aqua.css', 'default')

    @css.register(graphite)
    def _css_gr(self):
        return ('graphite.css', 'default')

    @css.register(skeuomorphic)
    def _css_sk(self):
        return ('aqua.css', 'skeuomorph')

That looks better. First we have a generic function with a default value (this time we raise an error but your usage might be different). Next, we define implementations for specific values of the enum. It works because the first argument is self.

Interestingly, with an implementation like this we would also be able to A/B test our users with a new version of the theme. For the selected users, simply instantiate a Theme subclass instead of the Theme directly:

class BetaTheme(Theme):
    @valuedispatch
    def css(self):
        return super().css()

    @css.register(Theme.skeuomorphic)
    def _css_sk(self):
        return ('jony.css', 'default')

Show me the @valuedispatch code already

Okay, okay... Here it is:

def valuedispatch(func):
    _func = singledispatch(func)
    @_func.register(Enum)
    def _enumvalue_dispatch(*args, **kw):
        enum, value = args[0], args[0].value
        if value not in _func.registry:
            return _func.dispatch(object)(*args, **kw)
        dispatch = _func.registry[value]
        _func.register(enum, dispatch)
        return dispatch(*args, **kw)
    @wraps(func)
    def wrapper(*args, **kw):
        if args[0] in _func.registry:
            return _func.registry[args[0]](*args, **kw)
        return _func(*args, **kw)
    wrapper.register = _func.register
    wrapper.dispatch = _func.dispatch
    wrapper.registry = _func.registry
    return wrapper

While it might look somewhat convoluted, it's actually rather straightforward. This decorator creates a generic function using singledispatch, but returns a custom wrapper which checks for existence of the argument itself in the generic function registry.

The last lines simply ensure the API stays the same with the default wrapper provided by singledispatch. Finally, _enumvalue_dispatch() is necessary to enable registration of implementations for enum values inside the class definition. Note how it's registered on the generic function using its built-in functionality.

Generic methods

If all you ever wanted are single-dispatch generic methods, you surely noticed that in the case of PEP 443, dispatching on the first argument is a limitation. Instance and class methods treat the first argument in a special way, binding it to the current object or class, respectively. With a little bit of plumbing you can get around it. Just use bound methods.

Bound methods have their first argument occupied already, making the first thing passed by the caller to be the second argument on the method definition:

>>> class C:
...   def method(self, arg):
...     print(self)
...     print(arg)
>>> o = C()
>>> o.method('Argument')
<__main__.C object at 0x1025739d0>
Argument
>>> bound = o.method
>>> unbound = C.method
>>> bound('Argument')
<__main__.C object at 0x1025739d0>
Argument
>>> unbound('Argument')
Traceback (most recent call last):
...
TypeError: method() missing 1 required positional argument: 'arg'

In other words, to make a generic method, you have to register bound methods on it. To do that, move the wrapping of the method with singledispatch to the __init__() method of your class:

class State:
    def __init__(self):
        self.add = singledispatch(self.add)
        self.add.register(int, self.add_int)
        self.add.register(float, self.add_float)
        self.add.register(complex, self.add_complex)
        self.sum = 0

    def add(self, arg):
        raise TypeError("Not supported.")

    def add_int(self, arg):
        self.sum += arg

    def add_float(self, arg):
        self.sum += int(round(arg))

    def add_complex(self, arg):
        self.sum += int(round(arg.real))

This way the generic method behaves just like plain old instance methods:

>>> state = State()
>>> state2 = State()
>>> state.add(1)
>>> state.add(2.51)
>>> state.add(3.7+4j)
>>> state.sum
8
>>> state.add(2.50)
>>> state.sum
10
>>> state.add("string")
Traceback (most recent call last):
...
TypeError: Not supported.

Moreover, you can register additional implementations on an instance. Just remember to bind the function to the object, to make self work as expected:

>>> def add_decimal(self, arg):
...     self.sum += int(arg.to_integral_value())
...
>>> bound = add_decimal.__get__(state2, State)
>>> state2.add.register(Decimal, bound)
<bound method State.add_decimal of <state.State object at 0x102d1ee28>>
>>> state2.add(50)
>>> state2.add(Decimal("48.76"))
>>> state2.sum
99

Note that the first instance doesn't know how to support the newly registered type:

>>> state.add(Decimal("1"))
Traceback (most recent call last):
...
TypeError: Not supported.

To enable class-wide registration of new implementations on generic methods, the class needs to maintain a second registry where users assign implementations which are not yet bound. The class then binds those unbound functions as methods in __init__ and registers them on the instance-level singledispatch method.

This can be implemented as a second layer of singledispatch, which is nice because the user will not have to be aware that there's anything special going on:

class ExtendedState:
    def __init__(self):
        add_registry = self.add.registry
        self.add = singledispatch(add_registry[object])
        self.add.register(int, self.add_int)
        self.add.register(float, self.add_float)
        self.add.register(complex, self.add_complex)
        for typ, impl in add_registry.items():
            impl = impl.__get__(self, State)
            self.add.register(typ, impl)
        self.sum = 0

    @singledispatch
    def add(self, arg):
        raise TypeError("Not supported.")

    def add_int(self, arg):
        self.sum += arg

    def add_float(self, arg):
        self.sum += int(round(arg))

    def add_complex(self, arg):
        self.sum += int(round(arg.real))

The new code supports all the behaviour that was present in the previous version, but now we can also register implementations on classes as well:

>>> @ExtendedState.add.register(str)
... def add_str(self, arg):
...     self.sum += int(arg)
...
>>> state1 = ExtendedState()
>>> state2 = ExtendedState()
>>> state1.add(10)
>>> state2.add(20.4)
>>> state1.add("30")
>>> state2.add("40")
>>> state1.sum
40
>>> state2.sum
60

One caveat here is that since binding happens during object initialization, if you later register a new implementation on the class, already existing objects won't be able to handle those. Obviously, we could get around it with still more plumbing. But that's a thing for a different blog post.

Summing it up

I find generic functions to be a terrific complement to traditional object-oriented building blocks. Now that Python provides a clear way to define and extend generic functions, it opens some interesting refactoring possibilities for the standard library, as well as for your own code.

What PEP 443 brings to the table is a solid foundation which can be the basis for further exploration. I hope the examples I presented show that the mechanism is quite easily extensible to support an even wider range of use cases.

But Łukasz, I'm still on Python 2.7

Python 3.4 is just around the corner [3] and you're still on 2.x? Well, that's too bad, but don't fret. I made a backport just for you:

$ pip install singledispatch

You can run it on Python 2.6+ and 3.2+.

Footnotes

[1]Although the careful reader will notice that instance-level state might still be used to introduce side effects. This is in fact what the example does.
[2]The PEP gives examples of this in Python itself: len(), iter(), pprint.pprint(), copy.copy(), and most of the functions in the operator module.
[3]Well, not quite ;)

Entry tagged as long , PEP and python.

« Enter Music Festival 2013