From the Archive: Mutable Algorithms in Immutable Languages, Part 2

This is a popular post from my blog in a previous incarnation—originally published 2014-07-13.

Last time we saw a way to implement Union/Find, an algorithm which depends critically on observable mutable memory, within a particular abstract monad called Mem. Monads implementing Mem model mutable memory (say that 10 times fast) and we saw that's sufficient to recover Union/Find.

But we didn't actually see any implementations of Mem. So maybe we're hosed. Are there any interesting implementations of Mem we can run Union/Find in?

In case of emergency break ivory

Okay, I'm being dramatic. There is, of course, at least one such model. If we're willing to infect our program with IO then we can always use IORefs to model mutable memory. It's almost too simple, but we'll do it anyway for completeness and to explain some nits involved in working with the Mem class.

Essentially, we'd like to have, more or less, the following correspondences

  • ref is newIORef
  • deref is readIORef
  • set is writeIORef

Hopefully, this also clears up why I called pointers "refs" back in part 1.

Diving in

So, we could do exactly what was just suggested and implement IO into Mem (almost) directly. We'll need a newtype wrapper around Ref IO but it's pretty trivial.

instance Mem IO where
  newtype Ref IO = IORef' { unwrapIORef' :: IORef Int }
  type    Val IO = Int

  ref     = fmap IORef . newIORef
  deref   = readIORef . unwrapIORef'
  set r v = writeIORef (unwrapIORef' r) v

Hopefully the major problem with this approach is obvious---Mem restricts us to only store one type of thing in memory and IORefs are more general than that. I had to pretty arbitrarily pick Ints to store for this interface.

So this clarifies a point about using Mem. It's always going to be an internal, hidden class where instances, probably created using newtype wrappers, are specific to both a particular use and a particular algorithm.

(Well, actually, we ought to be able to get around this by using GADTs and existential types, but I'll leave that for another day.)

We'll thus make an entirely new wrapper called, say, UfIO for Union/Find in IO and specialize the value to be wrapped in the Node_ layer as needed according to the internal AbstractUnionFind interface.

{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE TypeFamilies               #-}

newtype UfIO v a =
  UfIO { runUfIO :: IO a }
  deriving ( Functor, Applicative, Monad )

instance Mem (UfIO v) where
  newtype Ref (UfIO v) =
    UfIORef { getUfIORef :: IORef (Node_ (UfIO v) v) } deriving ( Eq )
  type Val (UfIO v) = Node_ (UfIO v) v

  ref   a = UfIO (UfIORef <$> newIORef a)
  deref r = UfIO (readIORef $ getUfIORef r)
  set r v = UfIO (writeIORef (getUfIORef r) v)

Just a little bit more code and some noise as we wrap and unwrap all of the newtypes (but note that all of that noise is free at runtime).

Using UfIO

Now that we have something which implements Mem with Val r ~ Node_ r v we can use it to run our Union/Find algorithm. All we need to do is construct a computation using the Union/Find interface and then run it with runUfIO.

exIO = runUfIO computation where
  computation :: UF r Int => r Bool
  computation = do
    n1 <- node 1
    n2 <- node 2
    link n1 n2
    connected n1 n2
>>> :t exIO
exIO :: IO Bool
>>> exIO
True

Phantom parameters

It's worth spending a few seconds focusing one particular trick used above. In order to leave the type stored in the Union/Find graph unspecified we've used a phantom type parameter on the type of UfIO, the v.

The parameter v is called "phantom" because it is not used in the implementation of UfIO at all. Instead, the information carried by v is only used in the instantiation of Mem.

This phantom parameter is almost entirely what allows us to avoid the problem we encountered above with the Mem instance of IO. The only way for the associated type Val (UfIO v) to change is if we use information about some extra parameter like v.

Phantom parameters will return in this series. Generally, their ability to constrain values and subsequently expose more information about them is a critical trick.

Recovering purity

Whew! With all that IO out of the way we can get back to solving real fake problems like how to emulate mutable memory.

We've seen that what we need to do is establish the ability to store objects of type Val r and produce references, Ref r, to them which can be read from or written to. Without beating around the bush too much, this behavior is very similar to that of storing and accessing values from a finite Map. Better, if we just let our Refs be integer identifiers we can use the efficient Data.IntMap.IntMap from the containers package.

(There's a question as to whether or not we should use the lazy form of IntMap, but I'm going to ignore it.)

If we wrap one of those maps up into a State monad then monadic operations which edit the finite map by using integer references is basically what Mem asks for. Better yet, we can just "snapshot" our memory at any point (using get) to escape back into purity.

Sounds perfect. Let's build it!

Implementing purely mutable memory

As described, we're going to implement a Monad which is not a whole lot more than a State monad carrying around our IntMap representing memory.

newtype UfIntMap v a =
  UfIntMap { unUfIntMap :: State (Uf v) a }
  deriving ( Functor, Applicative, Monad )

and as long as we can generate some initial state (uf0) then we can "run" a UfIntMap computation purely, shedding its monadic layer.

runUfIntMap :: UfIntMap v a -> a
runUfIntMap = flip evalState uf0 . unUfIntMap where

That state type Uf v isn't exactly what I may have led you to believe it would be. Instead of only holding an IntMap we must also keep a "source" of IDs to ensure we don't collide.

data Uf v =
  Uf { count :: Int
     , mem   :: IntMap (Node_ (UfIntMap v) v)
     }

uf0 :: Uf v
uf0 = Uf { count = 0, mem = IM.empty }

But that's really just a small technical detail. All that's left now is to implement the Mem instance. We use the same phantom variable trick from UfIO v, but now references are just wrappers over Ints in our map.

instance Mem (UfIntMap v) where
  newtype Ref (UfIntMap v) = UfIntMapRef { getId :: Int } deriving ( Eq )
  type    Val (UfIntMap v) = Node_ (UfIntMap v) v

The methods of pure Mem

We're close now. We can implement set very easily atop monadic operations in the State monad and IntMap operations using our Int references.

set r v = UfIntMap $ do
    modify (\s -> s { mem = IM.insert (getId r) v (mem s) })

Implementing ref is pretty easy as well, although we must be careful to update the count stored in Uf to ensure that we never cause collisions.

ref v = UfIntMap $ do
    c <- gets count
    modify (\s -> s { count = c + 1, mem = IM.insert c v (mem s) })
    return (UfIntMapRef c)

Finally, we implement deref which is pretty simple as well.

deref r = UfIntMap $ do
    Just v <- gets (IM.lookup (getId r) . mem)  -- WHOA!
    return v

Right?

Well, it's a little scary that we have an incomplete pattern match (marked by WHOA above). But we never delete anything from the IntMap so our references should always be valid. Our sanity test passes after all.

exPure = runUfIntMap computation where
  computation :: UF r Int => r Bool
  computation = do
    n1 <- node 1
    n2 <- node 2
    link n1 n2
    connected n1 n2
>>> :t exPure
exPure :: Bool
>>> exPure
True

So this is fine, right?

Well, no, otherwise I wouldn't be harping on it.

A subtle bug

Here's how we can use our Union/Find API to expose an error. If you're not familiar with Haskell errors are bad because you usually expect all failure conditions to be reified in types and values.

bug :: Bool
bug =
  let
    n1   = runUfIntMap $ do
      node ()
      node ()
      node ()
    conn = runUfIntMap $ do
      n2 <- node ()
      connected n1 n2
  in
   conn
>>> bug
*** Exception: Pattern match failure in do expression at /.../UnionFind/IntMap.hs:43:5-10

We can do even worse, too. Here's the same error bug triggered such that it just silently, unpredictably provides the wrong answer without even throwing an error.

bug2 :: Bool
bug2 =
  let
    n1   = runUfIntMap (node ())
    conn = runUfIntMap $ do
      n2 <- node ()
      connected n1 n2    -- this can't possibly be True can it?
  in
   conn
>>> bug2
True

Ugh.

Border control problems

What we've got is a border control problem. Because running our UfIntMap computation can return values like Ref (UfIntMap v) we're able to produce IDs on Vals outside of the context where they make any sense.

In the first bug described we observe this problem when trying to reference an id in a context where nothing has yet been storeed at that location. In the second bug2 we create a reference which, despite being no longer valid, collides with a value in a different context.

What this ought to tell you is that (a) there's a concept of a region of validity that's defined by "run" calls like runUfIntMap and (b) it's dangerous to export references from that region since they can then contaminate other regions.

This isn't really a problem in languages where mutability is allowed to happen everywhere. In some sense they only have a single region---the entire runtime. The only way to "execute" this region is to execute the problem itself.

Can we do better?

So we've come quite far now. We have two separate implementations of Mem and both appear to work correctly for Union/Find. One of them, based on IO, is impure and so using the results of Union/Find generated with that algorithm will pollute code with IO. The other, based on a State monad containing an IntMap modeling mutable memory, can be executed purely... but when we do we run into the opportunity for subtle bugs to arise.

In some languages you might just have to shrug your shoulders and write a note in the documentation.

Never run a Union/Find computation which returns a Node as those
nodes will no longer be meaningful and can be used to violate
preconditions in other Union/Find computations.

But we're fighting this whole problem in order to preserve the strength of the type sytem. Can we make the type system pay its dues?

Turns out that, yes, we can. And will.

Commentary

There was some interesting commentary at Lobsters.