Monday, June 27, 2011

A (Fast) Object Copier

 

A colleague of mine asked me today to help with the onerous task of “twisting-off” the data in an entity into a DTO. Something about serialization, he said…

He said that he didn’t mind if the copying was slow – just that he didn’t want to put in hundreds of lines of code each saying

destination.Id = source.Id;
destination.Name = source.Name;
destination.DateAndTimeOfBirth = source.DateAndTimeOfBirth;
destination.BirthWeightInKilograms = source.BirthWeightInKilograms;
destination.IsMultipleBirth = source.IsMultipleBirth;
destination.BirthOrder = source.BirthOrder;
...


So I sat down and knocked off the traditional reflection-based code:

internal static T Map<T>(T source, T destination) where T : class
{
foreach (var pi in typeof (T).GetProperties())
{
pi.SetValue(destination,
pi.GetValue(source, BindingFlags.GetProperty | BindingFlags.Instance, null, null, null),
BindingFlags.SetProperty | BindingFlags.Instance,
null,
null,
null);
}
return destination;
}


(I’ve removed the error checking just to keep things simple for this post)



Neat and all, and does the job.



But I couldn’t help wondering if I could do better – specially since I knew that the performance would potentially be too terrible to bear…



Express(ion) yourself!



It would be nice to use the tricks we learned with expressions to create, for a given type, a set of assignment expressions for each of the properties, and then simply apply all expressions one by one to the source and destination, and that way the cost of reflection is borne only once.



Fair enough. I got home, and after dinner, knocked off this piece of code:

public static class FastCopier
{
public static T Copy<T>(T source, T destination) where T : class
{
return FastCopierInternal<T>.Copy(source, destination);
}

private static class FastCopierInternal<T>
{
private static readonly IDictionary<PropertyInfo, Action<T, T>> _copyFunctions = typeof (T).GetProperties().ToDictionary(_pi => _pi, GetCopyFunc);

private static Action<T, T> GetCopyFunc(PropertyInfo propertyInfo)
{
Trace.WriteLine(String.Format("Creating mapping func for {0}.{1}", propertyInfo.DeclaringType.Name, propertyInfo.Name));
var sourceParameterExpression = Expression.Parameter(typeof (T), "_source");
var destinationParameterExpression = Expression.Parameter(typeof (T), "_destination");

var sourceAccessExpression = Expression.Property(sourceParameterExpression, propertyInfo);
var destinationAccessExpression = Expression.Property(destinationParameterExpression, propertyInfo);

var sourceAssignmentExpression = Expression.Assign(destinationAccessExpression, sourceAccessExpression);

var lambdaExpression = Expression.Lambda<Action<T, T>>(sourceAssignmentExpression,
sourceParameterExpression,
destinationParameterExpression);

return lambdaExpression.Compile();
}

public static T Copy(T source, T destination)
{
foreach (var func in _copyFunctions.Values)
{
func(source, destination);
}

return destination;
}
}
}


Here we have a static class with a single public method to do the Copy for a given type T.



Internally, we have a static generic copier class which only gets instantiated once for each type T, and which builds up an associative array of the properties of the type and the function required to copy each property from source to destination.



Here’s what the lambda expression looks like as it is built:



assignment_lambda



When the public Copy method on the outer class is called, it calls the Copy method on the (implicitly initalized) inner static class, which applies each of the copy functions in turn to copy the source to the destination one property at a time.



Pretty cool. So how much faster are we?

Explicit copy of object 1000000 times took 204 ms
Fast copy of object 1000000 times took 790 ms
Slow copy of object 1000000 times took 23801 ms


Running over a million objects, the fastest, as expected, is when we explicitly code the DTO initialization, and let the compiler optimize.



The lambda version is around 3.5x slower, but still pretty darn fast compared with the basic, slow, all-reflection version – which is two orders of magnitude slower!



Not a bad option to have, when the development and maintenance cost of creating the DTOs and keeping them up to date is factored in!

Sunday, June 5, 2011

Memoization Is Your Friend

 

I just came across some gems regarding memoization here, here, here and here. As a side note, I’d like to go on record and say that Wes Dyer’s derivation of fixed-point combinators is second to none! Even I could understand what he was saying! :) Another great discussion of fixed-point combinators is here. For more theoretical underpinnings, this paper discusses memoization of fixed-point functions.

We’re being plagued by performance issues in the project I’m currently working on, and it would be really nice to have a quick way to to automatically memoize some functions to gain the benefits of laziness and good book-keeping.

So, here is my implementation of a Memoize() function, which uses the fixed-point combinator concepts explained in Wes’ blog.

public static Func<TArgument, TResult> MemoizeFix<TArgument, TResult>(Func<Func<TArgument, TResult>, Func<TArgument, TResult>> func)
{
// ReSharper disable AccessToModifiedClosure
Func<TArgument, TResult> funcMemoized = null;

funcMemoized = func(_ => funcMemoized(_));
funcMemoized = funcMemoized.Memoize();

return funcMemoized;
// ReSharper restore AccessToModifiedClosure
}

private static Func<TArgument, TResult> Memoize<TArgument, TResult>(this Func<TArgument, TResult> func)
{
var _map = new ConcurrentDictionary<TArgument, TResult>();
return _ => _map.GetOrAdd(_, func);
}

In MemoizeFix(), we create the funcMemoized variable as null so that we get around the definite assignment problem. Because we defer the assignment to after the application of the memoization, we actually want to have access to the modified closure.

The other nice thing with .NET 4.0 is the ability to use the ConcurrentDictionary as a thread-safe result cache, with the added bonus of a very terse invocation to GetOrAdd().


We can use the memoization routines this way:

private static readonly Func<int, long> FastFibonacci = Functional.MemoizeFix<int, long>(
fib => n => (n <= 0 ? 0 : (n <= 2 ? 1 : fib(n - 1) + fib(n - 2)))
);

private static readonly Func<int, long> FastLucas = Functional.MemoizeFix<int, long>(
luc => (n => FastFibonacci(n) + FastFibonacci(n + 2))
);

private static readonly Func<int, double> FastFactorial = Functional.MemoizeFix<int, double>(
fact => n => n < 2 ? 1.0d : n*fact(n - 1)
);

For functions with more than one argument, I’d create a Tuple<> with all the arguments as the cache key as the simplest extension. Thusly:

private static Func<TArg1, TArg2, TResult> Memoize<TArg1, TArg2, TResult>(this Func<TArg1, TArg2, TResult> func)
{
var _map = new ConcurrentDictionary<Tuple<TArg1, TArg2>, TResult>();
return (_a1, _a2) => _map.GetOrAdd(new Tuple<TArg1, TArg2>(_a1, _a2), _ => func(_.Item1, _.Item2));
}

public static Func<TArg1, TArg2, TResult> MemoizeFix<TArg1, TArg2, TResult>(Func<Func<TArg1, TArg2, TResult>, Func<TArg1, TArg2, TResult>> func)
{
// ReSharper disable AccessToModifiedClosure
Func<TArg1, TArg2, TResult> funcMemoized = null;

funcMemoized = func((_a1, _a2) => funcMemoized(_a1, _a2));
funcMemoized = funcMemoized.Memoize();

return funcMemoized;
// ReSharper restore AccessToModifiedClosure
}


Pretty slick :)