If you know people interested in programming language theory, you will have heard of the monad. Especially in Haskell, they are used extensively. When trying to find out what they are, exactly, the average programmer will have a distinct Matrix feeling, though: You can not be told what a Monad is! Lots and lots of websites try to explain what a monad is, so many that you must wonder what an exceptionally awesome concept this is that it can not be explained in simple, understandable terms, but is still so important that people will not simply give up on using it.
After watching a Google Tech Talk presentation recently, I was enlightened as to how monads can be explained to normal programmers.
Update: You can also read the second part of this series.
Monads
The following is the normal person definition of a monad. I’ll explain how it maps to the type theory definition later.
A Monad is an object whose methods return monads.
Note that this is a recursive definition. Monads are objects where methods return monads which are objects where methods return monads which are objects …
This property is called chaining and is the distinct feature of monads with which you can identify them in languages where they are not explicitly named.
The second important property is the ability to branch an execution path. Because you can take any monad in the execution chain, and start multiple new chains from that monad, you can get an execution path tree.
These two properties are the main properties that make monads interesting.
Example 1: jQuery
jQuery is a JavaScript library for website manipulation. One of its main features is a monad. No one calls it that, though.
$("#element").find("ul").parent().hide()
This expression searches for an element with the id #element
, finds
unsorted lists (ul
) among the childs of that element, finds the
parents of those childs, and hides them.
Each of those expressions return an object representing a “set of matched elements” (I’m not even sure if jQuery defines a single name for that concept), and the same kind of methods can be applied to all such objects. This means you can chain such calls as shown above.
Branching looks very similar:
var lists = $("#element").find("ul");
var parents = lists.parent();
var list_items = lists.children("li");
In this example, we first split up the original query. The variable
lists
now contains an object representing the set of unsorted list
tags under the #element
tag. We can now re-use this set. First, we
find the parent elements of those unsorted lists. Then we find their
list items.
Because every one of those calls returns the same kind of object which provides the same methods which all return such objects, we were able to both simply chain multiple calls to each other, but also to sort of cache an intermediate result and split execution from that point.
Both are very powerful programming methods.
Example 2: Django Query Sets
With its queryset objects, Django provides a monad for database abstractions. A query set represents a partially-constructed SQL statement. By using method application, you can successively build the expression.
Book.objects.filter(author="Bernard Cornwell"
).exclude(series="The Sharpe stories")
Chaining has an extremely useful application in combination with conditions:
books = Book.objects.all()
if author is not None:
books = books.filter(author=author)
if year is not None:
books = books.filter(year=year)
In this case, the links of the chain are added conditionally depending on specified options, for example to a function. This is a very powerful implication of the ability to chain monadic types.
Branching works, too:
cornwell = Book.objects.filter(author="Bernard Cornwell")
older = cornwell.filter(published__lt="2000")
new = cornwell.filter(published__gt="2010")
One of the main drawbacks of SQL is that it’s so difficult to compose statements. Even just adding a simple added WHERE clause is complicated enough, successively adding queries to other tables and other more complex problems is almost impossible. Abstracting them using monadic types allows for very strong compositability.
Type Theory
So far, you have kindly just believed me that those examples actually are monads. Let me fulfill my promise from earlier and show that they actually are. And not only that, but also that all monads are like that. That is, that the following definition is equivalent to the type theoretical definition of monads:
A Monad is an object whose methods return monads.
The type theoretical definition of a monad is as follows:
- A type constructor that defines for every type t how to obtain a monadic type M t
- A unit function: t → (M t)
- A bind function: (M t) → (t → (M u)) → (M u)
The first point tells the type system how to deal with this type we are about to define. In other programming languages, the closest approximation is a class definition.
The second point defines a function which requires an argument of any type (t), and returns a monad based on that type (M t). In other programming languages, this is called an object constructor. It takes an argument of whatever complexity (multiple arguments can be emulated by passing a tuple), and returns an instance of the “Monad” class.
The third point is a bit more complex. If you are not used to reading Haskell type definitions, this might throw you off easily, as there is a lot of implicit information in there.
First of all, A → B → C is the type signature of a function of two arguments, A and B, that returns C. It’s written this way to make partial application explicit. It’s a function which takes an A and returns a function which takes a B which in turn returns C. I find it easier to read if I make it explicit that I am talking about a single function with two arguments, (A, B) → C. Then, for easier reading, let’s write (M t) as Mt. And finally, note that the second argument there—(t → (M u))—is a function itself. Let’s just call it f instead and worry about it later. With this simplification, the definition of bind becomes this:
bind :: (Mt, f) → Mu
So bind is a function which gets an object of type Mt and a function, and returns an object of type Mu. Now let’s look at f:
f :: t → Mu
Here, again, you have to remember that Haskell loves partial application. This function could be a function which has been partially applied to any amount of other arguments, as long as the last argument it expects now is this t which we encapsulated into monads.
It takes this enscapsulated object, and maps it to a new monad. Which is the monad that bind will return.
- bind :: (Mt, f) → Mu
- f :: (t, args) → Mu
That is, bind applies this function to the contents of the monad, and returns the monad this function returns.
Other programming languages call bind method application.
In the expression obj.method(args), the method application function . has the following type signature:
- . :: (obj, method) → result
- method :: (obj, args) → result
Look familiar? Haskell applies the method to the contents of the monad and not the monad itself because otherwise, it would not know how to access those contents.
The only added restriction by the type signatures compared to normal method application is that the result has to be another monad.
It’s important to note that the definition of monads does not specify what the object constructor or the methods actually do, only what they return. Different monads do different things in those functions. What unifies them, though, is that they all follow these three definitions for a monad:
- There’s a class for the monad.
- There’s a constructor for instances of this class.
- There’s a method application where methods return other instances of this class.
Or, more succintly:
A Monad is an object whose methods return monads.
Addendum: Monad Laws
As Edward Kmett correctly pointed out in the comments, the above is not sufficient to show that e.g. the Django ORM interface defines a monad. The existence of the functions shown is the first step; the second is showing that the monad laws hold for the specific type of monad.
There are three such laws. But to explain how they apply in this case, I need to talk a bit more about something I only mentioned in passing above, the object/value duality.
Strictly speaking, the bind function applies the function it is passed to the value stored within a monad. This is sufficiently generic, as you can store a tuple and thus any kind of complex structure you like. Method application applies the method-function to the object, not the value stored within the object.
This is important in Haskell as it allows to write generic functions that work on all monads regardless of the included type, while writing generic methods that work on all objects is usually not part of common programming languages. For all practical purposes, though, this distinction is irrelevant.
The first two monad laws are about identy, namely that bind and unit form left and right identities.
bind(unit(x), f) ≡ f(x)
Or, in our syntax, Monad(x).f() ≡ f(x)
.
Monad(x).f()
expands to f(Monad(x))
which is
correct if you keep the object/value duality from above in mind, so
this actually holds true for all method applications.
In the Django ORM, this would be Model.objects.filter() ≡
filter(Model.objects)
, which is trivially true as
filter()
is a method whose first argument is
self
, which is indeed the value of
Model.objects
.
The second identity law is as follows.
bind(m, unit) ≡ m
This is the casewhere the object/value duality really shows. What this does is to
apply an existing object to the constructor of the class. With the
strict definition of bind, it applies the constructor to the value
within the object. We do not care about the distinction between object
and the value stored within. If we split this up, we get
value.constructor() ≡ constructor(value) ≡ object
, which
is actually how objects are created.
For the Django ORM, there’s actually a QuerySet type underlying all query sets. The init function takes three arguments, model, query and using; that’s the value tuple hidden within that monad. So to fulfill this law, I should be able to do the following:
QuerySet(model=qs.model, query=qs.query, using=qs._db)
And that actually works.
The third law is the most complex.
(m >>= f) >>= g ≡ m >>= ( \x -> (f x >>= g) )
Or, to write it in our syntax:
m().f().g() ≡ m().fun()
fun(x) ::= f(x).g()
m().fun() is fun(m())
. Expanding that function gives us
f(m()).g()
, which in turn gives us
g(f(m()))
. Which actually is the same as
m().f().g()
.
Which means that this also holds for all method applications. This does not work this way in, say, Python or JavaScript because methods are not first class objects independent of their classes, and method application does not take any first class function to apply it to the object.
So, is jQuery or the Django ORM a monad? Strictly speaking, no. While the monad laws actually hold, they do so only in theory, but you can not actually use them in those languages as readily as you can in, say, Haskell. Methods get the object as the first (implicit, in JavaScript) argument, not the value(s) stored in the object. Methods are not first class objects independent from their classes. You can circumvent those restrictions by implementing some boiler code or, in Python, metaclasses that do some magic. What you get for doing that is a much easier time writing functions that work on all monadic classes, at the expense of making the whole concept more difficult to understand.
This article is meant to help people who do not care about mathematical laws understand what monads are useful for. And for all intents and purposes, outside of Haskell, the definition I provided encapsulates the concept of monads very well.