A Scrum Of One: Functions as Function Results

This post talks about a set of utilities provided in the BrightSword SwissKnife library (nuget, codeplex).

Learning to write performance-sensitive code is a journey. There are many programming tools and techniques which contribute to different aspects of improving code performance.

The most elegant of these are the fundamentally timeless classics – mathematical methods and proofs that are agnostic to technologies. However, understanding these techniques generally requires a rigorous understanding and application of logic and mathematical technique.

Memoizing is a commonly used mathematical technique to improve the performance of the act of calculations – historically, great care was taken to compute, verify and publish the values resulting from the calculation of formulae – logarithms, arithmetic and trigonometric functions, for example.

You will find a general-purpose Memoize function here, but its implementation is more advanced than is discussed by this post. We will cover the advanced issues in a future post which addresses recursion.

Introduction

We’ve seen that we can abstract away boilerplate code and parameterize logic using functions as function arguments.

We also had a stab at memoizing a function parameterized only by a single generic type parameter. We’re going to spend a little more time talking about memoizing functions now.

Consider the following snippet using the Math.Sqrt function, which computes the square root of a given positive real number.


var y = Math.Sqrt (Math.Sqrt (a)) + 2 * Math.Sqrt (b) * Math.Sqrt (a);

A function, such as Math.Sqrt, which always returns the same result for the same arguments without changing any other state is called a “pure” function. Such functions have a property known as referential transparency, which means that the result value of the function can replace its invocation in an expression. Because of this property, we do not need to evaluate Math.Sqrt(a) twice in the expression above. We could simply store the result against the argument somewhere and return it whenever the function is invoked with the same argument. Such a technique is called memoization.

Memoizing

We could take several approaches to memoize a function: the most naïve approach might be to use code such as:

public static ConcurrentDictionary<double, double> GlobalCacheForSqrt = new ConcurrentDictionary<double, double>();
public static double sqrt_memoized(double x)
{
    return GlobalCacheForSqrt.GetOrAdd(x,
                                       Math.Sqrt);
}

...

var y = sqrt_memoized(sqrt_memoized(a)) + 2 * sqrt_memoized(b) * sqrt_memoized(a);

This would work, but what price have we paid to make this happen?

We have created a global value cache to store the results of the Math.Sqrt function.
We have created a wrapper function to return the cached value and only invoke the Math.Sqrt function if we haven’t done it once before.

If we wanted to memoize another function, we would have to do the same two steps all over again, which is somewhat tedious. But somewhat more seriously, we are memoizing the results in mutable global state – making the system impure.

What we really want is to localize the cache somewhere local and safe, and also remove the tedium of writing the wrapping function.

Something that looks like this:


public class MemoizedSqrt
{
    private readonly ConcurrentDictionary<double, double> _cache = new ConcurrentDictionary<double, double>();

    public double Sqrt(double x)
    {
        return _cache.GetOrAdd(x,
                               Math.Sqrt);
    }
}

...

var sqrt_memoized = Memoize<double, double>(Math.Sqrt);
var y = sqrt_memoized(sqrt_memoized(a)) + 2 * sqrt_memoized(b) * sqrt_memoized(a);

This is much nicer – it has now hidden away the cache inside of a class, but we still have to write a different class for each function we want to memoize.

So let’s look at a different way of doing this – imagine we had a magic box which took in a function like Math.Sqrt, and returned the sqrt_memoized for us, automatically encapsulating a class with the cache in it.

Imagine no further – we can build such a box. It’s simply a higher-order function which takes a function as an argument and returns another function.


public static Func<TDomain, TRange> Memoize<TDomain, TRange>(Func<TDomain, TRange> f)
{
    private class M
    {
        private readonly ConcurrentDictionary<TDomain, TRange> _cache = new ConcurrentDictionary<TDomain, TRange>();
        private readonly Func<TDomain, TRange> _f;

        public M(Func<TDomain, TRange> f)
        {
             _f = f;
        }

        public TRange MemoizedF(TDomain x)
        {
             return _cache.GetOrAdd(x,
                                    _f);
        }
    }

    return new M<TDomain, TRange>(f).MemoizedF;
}

...

var sqrt_memoized = Memoize<double, double>(Math.Sqrt);
var y = sqrt_memoized(sqrt_memoized(a)) + 2 * sqrt_memoized(b) * sqrt_memoized(a);

And there you have it – one single function to memoize any single-argument function, all neatly packaged into a static method.

All done!

What? Howls of protest? This couldn’t work?

Of course it won’t work! There’s no way to embed the class M inside the Memoize function. But we have a nice, elegant solution and we don’t want to mess it up by making the class M public…what do we do?

Getting some Closure

The trick to getting that inner embedded class inside the function is to let the compiler put it there instead of writing the class ourselves.

Remember that we are writing a higher-order function, so we could just build and return the memoizing function in place.

Consider the following code:

public static Func<TDomain, TRange> Memoize<TDomain, TRange>(Func<TDomain, TRange> f)
{
    var _cache = new ConcurrentDictionary<TDomain, TRange>();
    return _ => _cache.GetOrAdd(_, f);
}

Let’s break this down – ignoring the type arguments:

Memoize is a function which takes in a function f and returns an anonymous function.
The function has a local variable _cache to cache appropriately typed results against arguments.
The result is a function which references the cache. The salient point to notice is that the _cache variable is outside the scope of the result function. We say that the result function closes over the _cache variable.
The function returned requires a proper reference to the _cache variable even after the Memoize function exits.
In order to keep the reference alive, the compiler creates an inner, anonymous class storing the closed variable and exposing a method representing the resultant function.
The compiler then instantiates the class and returns a reference to the method, which now references the closed variable legally.

In fact the anonymous class looks and behaves almost exactly the class we tried to sneak in.

This construct is called a closure, and is an excellent way to encapsulate state without creating classes and objects.

In fact, the last code snippet is a general purpose memoizing function for functions of one argument, and you will agree is much more elegant than any of the other alternatives!

Summary

SwissKnife provides a Memoize function to elegantly provide memoized versions of pure single-argument functions.
We can use the functional programming principle of higher-order functions to memoize pure functions to leverage Referential Transparency. Memoizing functions ensure that functions are evaluated only once per argument value and the cached value of the result can be used subsequently.
We use closures to encapsulate the state associated with the cache.

A Scrum Of One

Sunday, November 3, 2013

Functions as Function Results

Introduction

Memoizing

Getting some Closure

Summary

No comments:

Post a Comment

Pages