Sunday, November 3, 2013

Classes and Memoization - A Neat C# Trick

 

This post talks about a set of utilities provided in the BrightSword SwissKnife library (nuget, codeplex).

When programming in the real world (as opposed to programming for an academic exercise), we and (and should) combine concepts to synergistically improve our solution.

We have visited Memoization as a technique, and have seen the use of functional programming to provide more succinct, maintainable code in discussing the Reflection Utility.

In this post, we will combine those two concepts (and derive a programming rule-of-thumb) to provide a more efficient utility for Reflection.

You will find this utility here, and can access it as BrightSword.SwissKnife.TypeMemberDiscoverer<T>

The Simple Approach

The Reflection Utility extension method surface is simple to use and highly maintainable, because there is minimal code-duplication.

To re-cap, our extension method surface for properties looks like:

public static IEnumerable<PropertyInfo> GetAllProperties(this Type _this, BindingFlags bindingFlags = DefaultBindingFlags)
{
return _this.GetAllMembers((_type, _bindingFlags) => _type.GetProperties(_bindingFlags), bindingFlags);
}
which allows for usage like this:

var readonlyProperties = typeof (T).GetAllProperties().Where(_ => !_.CanWrite || _.GetSetMethod() == null);
One of the immediately evident drawbacks is that each invocation of the extension method involves reflection over the type’s members, and therefore we pay a linear cost for reflection.

One approach might be to consider Memoization to store and reuse the result of the reflection utilities, and pay the price for reflection just once. This post discusses the approach SwissKnife takes in order to do this.

The real problem with simple memoization is the BindingFlags parameter, which may select different sets of members each time. In order to optimally reduce the required computation, we could restrict the BindingFlags value to some reasonable default (such as ‘all public members’) and memoize the result-set of members by Type.

This line of thought leads to something like this:

public static class TypeMemberDiscoverer
{
private const BindingFlags DefaultBindingFlags = BindingFlags.Default | BindingFlags.Instance | BindingFlags.Public;

public static ConcurrentDictionary<Type, IEnumerable<PropertyInfo>> _propertySetCache =
new ConcurrentDictionary<Type, IEnumerable<PropertyInfo>>();

public static IEnumerable<PropertyInfo> GetAllProperties(this Type type)
{
return _propertySetCache.GetOrAdd(type,
_ => _.GetAllMembers((_type,
_bindingFlags) => _type.GetProperties(_bindingFlags),
DefaultBindingFlags));
}
}
This approach still provides an extension method, and now the default property set is cached and the GetAllMembers call is only performed once.

However, we can use a feature of the C# language to get a slightly more elegant solution, and simultaneously derive a programming rule-of-thumb.

 

Static Fields in Generic Types



Because C# generic classes are true classes, a static field defined in an open generic class will actually be defined as a separate static field in each corresponding closed generic class.

Consider the open generic type:

public class Foo<T>
{
private static int _bar;
}


Consider now, a Foo<int> and a Foo<string>.

Both of these closed generic types will have a static field of type int called _bar.

As expected, all instances of Foo<int> will share a static _bar field, and all instances of Foo<string> will share a static _bar field.

What you might not expect, however is that Foo<int> will not be the same as Foo<string>. In fact, Resharper™ and Microsoft Code Analysis both have warnings alerting you that this behaviour might be potentially unexpected – you might expect that the field is shared across all types that close the open generic type, but it won’t be!

However, in our case, this is exactly the behaviour we want! Because static fields in generic classes are actually only shared by instances of the corresponding closed generic type, we can use a generic class itself as a kind of memoization container, where we can cache something by Type!

So the general rule-of-thumb is: If you find yourself trying to cache against a type name for whatever reason, consider using a generic class as a cache container. The static initializers of a class are guaranteed thread-safe, and you can write more readable code and let the language provide the cache!

So we can write a generic type with one type parameter, as follows:

public static class TypeMemberDiscoverer<T>
{
private const BindingFlags DefaultBindingFlags = BindingFlags.Default | BindingFlags.Instance | BindingFlags.Public;

// ReSharper disable StaticFieldInGenericType
private static readonly IEnumerable<PropertyInfo> _properties =
typeof(T).GetAllMembers((_type,
_bindingFlags) => _type.GetProperties(_bindingFlags), DefaultBindingFlags);
// ReSharper restore StaticFieldInGenericType

public static IEnumerable<PropertyInfo> GetAllProperties()
{
return _properties;
}
}
The usage is:
var readonlyProperties = TypeMemberDiscoverer<int>.GetAllProperties()
.Where(_ => !_.CanWrite || _.GetSetMethod() == null);


In the class above, we pay the price to discover the properties exactly once – when the TypeMemberDiscoverer<int> type is initialized. We can use the Lazy<T> pattern to further defer the evaluation to when the GetAllProperties() call is first invoked.

public static class TypeMemberDiscoverer<T>
{
private const BindingFlags DefaultBindingFlags = BindingFlags.Default | BindingFlags.Instance | BindingFlags.Public;

// ReSharper disable StaticFieldInGenericType
private static readonly Lazy<IEnumerable<PropertyInfo>> _propertiesL =
new Lazy<IEnumerable<PropertyInfo>>(
() => typeof(T).GetAllMembers((_type,
_bindingFlags) => _type.GetProperties(_bindingFlags),
DefaultBindingFlags));
// ReSharper restore StaticFieldInGenericType

public static IEnumerable<PropertyInfo> GetAllProperties()
{
return _propertiesL.Value;
}
}


This is both efficient and easier-to-read, and it doesn't need a ConcurrentDictionary to work!

 

Summary





  • We can use the functional programming principle of memoization to cache the result of an expensive operation (recursive reflection). This eliminates repeated computation at the expense of memory.

  • We can do this elegantly by using a feature of the C# language which allows static fields in a generic type to act as a cache keyed by the type parameters.

  • We can further reduce wasted effort by using the Lazy pattern provided by the runtime, and only compute (and cache) results when first used

Functions as Function Results

 

This post talks about a set of utilities provided in the BrightSword SwissKnife library (nuget, codeplex).

Learning to write performance-sensitive code is a journey. There are many programming tools and techniques which contribute to different aspects of improving code performance.

The most elegant of these are the fundamentally timeless classics – mathematical methods and proofs that are agnostic to technologies. However, understanding these techniques generally requires a rigorous understanding and application of logic and mathematical technique.

Memoizing is a commonly used mathematical technique to improve the performance of the act of calculations – historically, great care was taken to compute, verify and publish the values resulting from the calculation of formulae – logarithms, arithmetic and trigonometric functions, for example.

You will find a general-purpose Memoize function here, but its implementation is more advanced than is discussed by this post. We will cover the advanced issues in a future post which addresses recursion.

 

Introduction

We’ve seen that we can abstract away boilerplate code and parameterize logic using functions as function arguments.

We also had a stab at memoizing a function parameterized only by a single generic type parameter. We’re going to spend a little more time talking about memoizing functions now.

Consider the following snippet using the Math.Sqrt function, which computes the square root of a given positive real number.


var y = Math.Sqrt (Math.Sqrt (a)) + 2 * Math.Sqrt (b) * Math.Sqrt (a);


A function, such as Math.Sqrt, which always returns the same result for the same arguments without changing any other state is called a “pure” function. Such functions have a property known as referential transparency, which means that the result value of the function can replace its invocation in an expression. Because of this property, we do not need to evaluate Math.Sqrt(a) twice in the expression above. We could simply store the result against the argument somewhere and return it whenever the function is invoked with the same argument. Such a technique is called memoization.

 

Memoizing



We could take several approaches to memoize a function: the most naïve approach might be to use code such as:

public static ConcurrentDictionary<double, double> GlobalCacheForSqrt = new ConcurrentDictionary<double, double>();
public static double sqrt_memoized(double x)
{
return GlobalCacheForSqrt.GetOrAdd(x,
Math.Sqrt);
}

...

var y = sqrt_memoized(sqrt_memoized(a)) + 2 * sqrt_memoized(b) * sqrt_memoized(a);


This would work, but what price have we paid to make this happen?



  • We have created a global value cache to store the results of the Math.Sqrt function.

  • We have created a wrapper function to return the cached value and only invoke the Math.Sqrt function if we haven’t done it once before.


If we wanted to memoize another function, we would have to do the same two steps all over again, which is somewhat tedious. But somewhat more seriously, we are memoizing the results in mutable global state – making the system impure.

What we really want is to localize the cache somewhere local and safe, and also remove the tedium of writing the wrapping function.

Something that looks like this:


public class MemoizedSqrt
{
private readonly ConcurrentDictionary<double, double> _cache = new ConcurrentDictionary<double, double>();

public double Sqrt(double x)
{
return _cache.GetOrAdd(x,
Math.Sqrt);
}
}

...

var sqrt_memoized = Memoize<double, double>(Math.Sqrt);
var y = sqrt_memoized(sqrt_memoized(a)) + 2 * sqrt_memoized(b) * sqrt_memoized(a);


 

This is much nicer – it has now hidden away the cache inside of a class, but we still have to write a different class for each function we want to memoize.

So let’s look at a different way of doing this – imagine we had a magic box which took in a function like Math.Sqrt, and returned the sqrt_memoized for us, automatically encapsulating a class with the cache in it.

Imagine no further – we can build such a box. It’s simply a higher-order function which takes a function as an argument and returns another function.


public static Func<TDomain, TRange> Memoize<TDomain, TRange>(Func<TDomain, TRange> f)
{
private class M
{
private readonly ConcurrentDictionary<TDomain, TRange> _cache = new ConcurrentDictionary<TDomain, TRange>();
private readonly Func<TDomain, TRange> _f;

public M(Func<TDomain, TRange> f)
{
_f = f;
}

public TRange MemoizedF(TDomain x)
{
return _cache.GetOrAdd(x,
_f);
}
}

return new M<TDomain, TRange>(f).MemoizedF;
}

...

var sqrt_memoized = Memoize<double, double>(Math.Sqrt);
var y = sqrt_memoized(sqrt_memoized(a)) + 2 * sqrt_memoized(b) * sqrt_memoized(a);


And there you have it – one single function to memoize any single-argument function, all neatly packaged into a static method.

All done!

What? Howls of protest? This couldn’t work?

Of course it won’t work! There’s no way to embed the class M inside the Memoize function. But we have a nice, elegant solution and we don’t want to mess it up by making the class M public…what do we do?

Getting some Closure



The trick to getting that inner embedded class inside the function is to let the compiler put it there instead of writing the class ourselves.

Remember that we are writing a higher-order function, so we could just build and return the memoizing function in place.

Consider the following code:

public static Func<TDomain, TRange> Memoize<TDomain, TRange>(Func<TDomain, TRange> f)
{
var _cache = new ConcurrentDictionary<TDomain, TRange>();
return _ => _cache.GetOrAdd(_, f);
}


Let’s break this down – ignoring the type arguments:



  • Memoize is a function which takes in a function f and returns an anonymous function.

  • The function has a local variable _cache to cache appropriately typed results against arguments.

  • The result is a function which references the cache. The salient point to notice is that the _cache variable is outside the scope of the result function. We say that the result function closes over the _cache variable.

  • The function returned requires a proper reference to the _cache variable even after the Memoize function exits.

  • In order to keep the reference alive, the compiler creates an inner, anonymous class storing the closed variable and exposing a method representing the resultant function.

  • The compiler then instantiates the class and returns a reference to the method, which now references the closed variable legally.


In fact the anonymous class looks and behaves almost exactly the class we tried to sneak in.

This construct is called a closure, and is an excellent way to encapsulate state without creating classes and objects.

In fact, the last code snippet is a general purpose memoizing function for functions of one argument, and you will agree is much more elegant than any of the other alternatives!

 

Summary





  • SwissKnife provides a Memoize function to elegantly provide memoized versions of pure single-argument functions.

  • We can use the functional programming principle of higher-order functions to memoize pure functions to leverage Referential Transparency. Memoizing functions ensure that functions are evaluated only once per argument value and the cached value of the result can be used subsequently.

  • We use closures to encapsulate the state associated with the cache.

Saturday, November 2, 2013

Functions as Function Arguments

 

This post talks about a set of utilities provided in the BrightSword SwissKnife library (nuget, codeplex).

Reflection is a commonly used meta-programming technique where run-time inspection of types allows for more elegant solutions in some cases. The .NET framework provides powerful primitives to use reflection, which have a nice orthogonal and consistent interface for the various type elements.

A necessary, but less desirable, consequence of an orthogonal interface is the ‘boilerplate’ nature of the code which uses different elements. There is typically a great deal of symmetry and similarity in the code, which generally leads to a cut-copy-paste kind of code-proliferation – and results in unmaintainable code.

In this post, we’ll talk about the software techniques we can use to mitigate this form of code-proliferation, and actively develop a lean and maintainable utility without sacrificing flexibility.

You will find these utilities here, where there are a set of extension methods on the Type type. 

 

Reflection Utilities

One of the fundamental principles of functional programming is the concept of a First-Class Function.

A language with a First Class function is one that allows functions to be treated exactly the same way as other first class entities such as variables and constants. This generally means that there is the concept of being able to pass in a function as an argument to another function (just like one can pass in a string as an argument), assign a function to a variable, or return the function as the result of a function.

This is a very powerful concept, as it allows us to parameterize logic just like we traditionally parameterize values.

GetAllProperties|GetAllMethods|GetAllEvents

Consider the problem of reflecting over a type and enumerating over its properties:

A naïve implementation of a function to enumerate a type’s properties might look like this:

public static IEnumerable<PropertyInfo> GetProperties(Type t) 
{
foreach (var item in t.GetProperties())
{
yield return item;
}
}



An analogous method to enumerate methods would then be:

public static IEnumerable<MethodInfo> GetMethods(Type t) 
{
foreach (var item in t.GetMethods())
{
yield return item;
}
}



The reason this is a naïve implementation is that interfaces do not inherit, in the traditional sense of the term, the members of their base interfaces – rather they inherit the requirement that the base interface is implemented as well. See my earlier post for the background to this problem.


The complete solution involves recursively walk up the inheritance chain for interfaces, whilst keep track of interfaces seen before, and enumerating the members of each interface encountered.


The code to do this is somewhat more involved:


public static IEnumerable<PropertyInfo> GetInterfaceProperties(this Type type,
ISet<Type> processedInterfaces = null)
{
Debug.Assert(type.IsInterface);

processedInterfaces = processedInterfaces ?? new HashSet();

if (processedInterfaces.Contains(type))
{
yield break;
}

foreach (var _pi in type.GetProperties())
{
yield return _pi;
}

foreach (var _pi in type.GetInterfaces()
.SelectMany(_ => _.GetInterfaceProperties(processedInterfaces)))
{
yield return _pi;
}

processedInterfaces.Add(type);
}
After taking a single glance at this code, one can imagine what the analogous code for Methods and Events might be – a lot of similar code with a few method calls and types changed.


This is an ideal situation to use the functional programming support provided by C#.


We can wrangle out the boilerplate code out of this to get:

public static IEnumerable<TMemberInfo> GetInterfaceMembers(this Type type, 
Func<Type, IEnumerable<TMemberInfo>> accessor,
ISet<Type> processedInterfaces = null)
where TMemberInfo : MemberInfo
{
Debug.Assert(type.IsInterface);

processedInterfaces = processedInterfaces ?? new HashSet();

if (processedInterfaces.Contains(type))
{
yield break;
}

foreach (var _pi in accessor(type))
{
yield return _pi;
}

foreach (var _pi in type.GetInterfaces()
.SelectMany(_ => _.GetInterfaceMembers(accessor, processedInterfaces)))
{
yield return _pi;
}

processedInterfaces.Add(type);
}



So by introducing the <TMember> type parameter to represent a MemberInfo generically, and providing a function argument accessor which provides a collection of members of the specified type, we can abstract out the boilerplate code and parameterize the accessor logic.


SwissKnife uses this approach to provide a set of extension methods on the Type type, which simply parameterize a common function:


public static IEnumerable<PropertyInfo> GetAllProperties(this Type _this, BindingFlags bindingFlags = DefaultBindingFlags) 
{
return _this.GetAllMembers((_type, _bindingFlags) => _type.GetProperties(_bindingFlags), bindingFlags);
}

public static IEnumerable<MethodInfo> GetAllMethods(this Type _this, BindingFlags bindingFlags = DefaultBindingFlags)
{
return _this.GetAllMembers((_type, _bindingFlags) => _type.GetMethods(_bindingFlags), bindingFlags);
}

public static IEnumerable<EventInfo> GetAllEvents(this Type _this, BindingFlags bindingFlags = DefaultBindingFlags)
{
return _this.GetAllMembers((_type, _bindingFlags) => _type.GetEvents(_bindingFlags), bindingFlags);
}
which allows for usage like this:
var readonlyProperties = typeof (T).GetAllProperties()
.Where(_ => !_.CanWrite || _.GetSetMethod() == null);



A functional approach to coding up these functions has reduced coding noise and resulted in simpler, easier-to-use, easier-to-understand and easier-to-maintain code.


Summary







  • SwissKnife provides a utility mechanism to simplify reflection over a type to enumerate different types of members. This is especially useful in the case of interface hierarchies.


  • We can use the functional programming principle of first-class functions to extract boilerplate code and parameterize logic in addition to values. This makes code easier to maintain.