A Scrum Of One

Friday, August 26, 2011

Migrating a Web Application to Azure: Step 1

Setting up your Azure workspace

So we have a working web-application which we want to move to Azure.

The first thing to do is to get a Windows Live ID and line up one Windows Azure subscription and one SQL Azure subscription.

Sign into windows.azure.com with your Live ID

Click on the “Hosted Services, Storage Accounts and CDN” button on the bottom left panel, then on “Hosted Services” on the top left panel.

Right-Click on your Azure Subscription in the main panel and select “New Hosted Service”

Choose not to deploy at this stage!

Similarly create a Storage Account on the same subscription.

Now create a database server on the SQL Azure Subscription.

Ensure you set the firewall rules to include both your machine’s IP address and other Azure services. Your SQL Azure instance will need to be accessed from the Windows Azure instance,

Once you create the database server, you should make a note of the following pieces of information:

The subscription id
Your administrator login and password
The instance name, and the fully-qualified database server DNS name

Now we have Windows Azure and SQL Azure instances set up for use.

Next we’ll migrate our database to SQL Azure.

Thursday, August 25, 2011

Migrating a Web Application to Azure: Step 0

Creating a traditional, locally run web application

Let’s start with a simple, traditional web application.

This is a contrived application, built specifically for demonstrating the issues we want to address in the process of migrating to Azure.

Please do not consider this application as indicative of following best-practices for web application development!

Also, the purpose of this web application is not to wow anyone with a slick user interface! :)

Let’s imagine a site where someone logs on, creates a photo album or two, and uploads a few images to each album.

Prerequisites

You will need the following applications and tools to follow along the process of migrating an application to Azure

Visual Studio 2010 (SP1 strongly recommended)
SQL Server 2008 R2; you’ll be using SQL Server Management Studio
Windows Azure SDK 1.4

We’ll introduce a few more tools along the way, but we can get started with this.

Steps

Let’s create this application the simplest way possible:

First, create the ASPNETDB Application Services database on the SQL Server 2008 instance. You may need to run aspnet_regsql.exe from the .NET framework directory.

Then create a basic web application in Visual Studio:

Modify the Web.Config on the web application to point the Application Services Providers to our database.

Pressing F5 to run/debug the web application should fire up the default application, and allow you to create an account to authenticate against in future. The default home page after logging in should look like this:

To create the picture album site, we create a new database with this simple schema:

Note that for this demonstration, we will store the uploaded picture in an image column on the Picture table, in the database.

We’ll use Linq 2 SQL to create a data-model for our application.

When we’re done with adding the Linq2Sql classes, the connectionStrings collection in our Web.Config looks like:

Now we’ll actually develop the application, and let’s say we finally land up with:

Let’s also assume that the application contains, for the sake of this demonstration, the following code to extract the binary data from the database and cache the file on the file system, so that the images on the rendered page refer to the cached file:

var relativePathToDirectory = "~/.cache";

// ensure the directory exists
var absolutePathToDirectory = Server.MapPath(relativePathToDirectory);
if (!Directory.Exists(absolutePathToDirectory))
{
    Directory.CreateDirectory(absolutePathToDirectory);
}

// now ensure the file exists
var relativePathToFile = Path.Combine(relativePathToDirectory, picture.OriginalFilename);
var absolutePathToFile = Server.MapPath(relativePathToFile);

if (File.Exists(absolutePathToFile)) return absolutePathToFile;

using (var stream = new FileStream(
        absolutePathToFile,
        FileMode.OpenOrCreate,
        FileSystemRights.TakeOwnership | FileSystemRights.Write,
        FileShare.None,
        1024,
        FileOptions.None))
{
    stream.Write(picture.Image.ToArray(), 0, picture.Image.Length);
}

Conclusion

We have a rudimentary, but fully functional web application that runs on our local web server in the traditional manner.

We will need to consider the specific changes to make when we want to migrate this application to the cloud.

PS:

We’ll put the source code on github and stick the link in here just after the talk.

Migrating a Web Application to Azure: Introduction

This is a series of posts relating to the details of the talk Mahesh Krishnan and I will be giving at TechEd Australia 2011. You can follow along in this series of posts for more details and gotchas that we can’t cover in the 60 minute talk at TechEd!

Here are the details of the talk:

We’ll be taking a contrived, but simple, traditional data-driven web application, and outlining the issues to consider, and steps to take, to move the application to Windows Azure/Sql Azure.

We’ll show you how to set up your Azure subscription, migrate your database to Sql Azure, manage session state with Azure’s Session State Manager, move file-system based actions to Windows Azure blob storage, and secure your application with Azure’s ACS 2.0 Federated Authentication.

We’ll show you what tools you need and how to use Visual Studio 2010 SP1 to simplify the process of migrating.

There’s some code, but mostly we’re going to be discussing issues, tips and tricks.

See you there!

Sunday, July 10, 2011

Functional Object Manipulation

In the last couple of posts, we’ve looked at ways to copy scalar properties from one object to another, with the high performance of explicit assignments and the low maintenance of reflection.

The approach of creating a lambda to operate on the object instance as a whole shows a lot of promise. For example, we may want to compare two object instances property-by-property, and the code for that would look quite similar.

public static class Comparer
{
    public static bool Compare<T>(this T source, T destination) { return CompareClosure<T>.Compare(source, destination); }

    private static class CompareClosure<T>
    {
        private static Func<T, T, bool> BuildComparerLambda(Func<PropertyInfo, bool> propertyFilter = null)
        {
            propertyFilter = propertyFilter ?? (_ => true);

            var sourceParameterExpression = Expression.Parameter(typeof(T), "_left");
            var destinationParameterExpression = Expression.Parameter(typeof(T), "_right");

            var properties = typeof(T).GetProperties().Where(propertyFilter);
            var expressions = properties.Select(_pi => Expression.Equal(Expression.Property(destinationParameterExpression, _pi), Expression.Property(sourceParameterExpression, _pi)));
            var conjoinedExpression = expressions.Aggregate<Expression, Expression>(Expression.Constant(true), Expression.AndAlso);

            var lambdaExpression = Expression.Lambda<Func<T, T, bool>>(conjoinedExpression, sourceParameterExpression, destinationParameterExpression);
            return lambdaExpression.Compile();
        }

        private static readonly Func<T, T, bool> _compare = BuildComparerLambda();
        public static bool Compare(T source, T destination) { return _compare(source, destination); }
    }
}

which gives us:

What are are doing in the code is mapping each property to a boolean value indicating whether the two instances have the same value for the property, and then folding that set of boolean values to a single boolean value by aggregating it with the && operator (and a seed value of ‘true’).

We can readily see a pattern emerging here:

There are two types of operations possible for a given arity. In the case of Compare and Copy where the operation arity is 2, we can create either an Action<T, T> or a Func<T, T, TResult>.

The Action<> case is a lambda that performs an operation on each property with reference to the given argument instances.

We can create a general operation builder by lifting a parameter which specifies the desired operation on a property, with reference to the lambda arguments.

public static Action<T, T> BuildOperationAction<T>(
    Func<PropertyInfo, ParameterExpression, ParameterExpression, Expression> operationExpressionFactory)
{
    var leftParameterExpression = Expression.Parameter(typeof (T), "_left");
    var rightParameterExpression = Expression.Parameter(typeof (T), "_right");

    var properties = typeof (T).GetProperties();
    var expressions = properties.Select(_ => operationExpressionFactory(_, leftParameterExpression, rightParameterExpression));

    var blockExpression = Expression.Block(expressions);
    var lambdaExpression = Expression.Lambda<Action<T, T>>(blockExpression, leftParameterExpression, rightParameterExpression);

    return lambdaExpression.Compile();
}

The Func<> case is a lambda that maps the set of properties to a set of values with reference to the argument instances, and then aggregates this set of values to a single value using an aggregator.

We can create a general operation builder by lifting:

a parameter which specifies the desired operation on a property, with reference to the lambda arguments
a parameter which specifies the aggregate operation to fold the set of computed values to a single value, and
a parameter specifying the initial value for the aggregator.

Thusly:

public static Func<T, T, TResult> BuildOperationFunc<T, TResult>(
    Func<PropertyInfo, ParameterExpression, ParameterExpression, Expression> operationExpressionFactory,
    Func<Expression, Expression, Expression> conjunction, Expression seed)
{
    var leftParameterExpression = Expression.Parameter(typeof (T), "_left");
    var rightParameterExpression = Expression.Parameter(typeof (T), "_right");

    var properties = typeof (T).GetProperties();
    var expressions = properties.Select(_ => operationExpressionFactory(_, leftParameterExpression, rightParameterExpression));

    var joinedExpression = expressions.Aggregate(seed, conjunction);
    var lambdaExpression = Expression.Lambda<Func<T, T, TResult>>(joinedExpression, leftParameterExpression, rightParameterExpression);

    return lambdaExpression.Compile();
}

This allows us to implement Copy() and Compare() as follows:

public static bool CompareWith<T>(this T source, T destination) { return CompareClosure<T>.Compare(source, destination); }
private static class CompareClosure<T>
{
    private static readonly Func<T, T, bool> _compare =
        BuildOperationFunc<T, bool>(
            (_pi, _src, _dest) => Expression.Assign(Expression.Property(_dest, _pi), Expression.Property(_src, _pi)),
            Expression.AndAlso,
            Expression.Constant(true));

    public static bool Compare(T source, T destination) { return _compare(source, destination); }
}

public static void CopyTo<T>(this T source, T destination) { CopyClosure<T>.Copy(source, destination); }
private static class CopyClosure<T>
{
    private static readonly Action<T, T> _copy =
        BuildOperationAction<T>(
            (_pi, _src, _dest) => Expression.Assign(Expression.Property(_dest, _pi), Expression.Property(_src, _pi)));

    public static void Copy(T source, T destination) { _copy(source, destination); }
}

Very neat.

We could develop more operations than Copy and Compare, building fast, automatically maintained mappers for data access (think ORMs) and UI field access, by simply specifying the equivalent expression. The development effort is simply in the development of the expression.

For example, here is a mapper which maps properties defined on type T, from a dynamic source object to backing fields on a target object of type T:

private static readonly Action<T, T> _copyDynamicToStaticBackingFields = 
	BuildOperationAction<T>(
	    (_pi, _src, _dest) =>
	    {
	        var sourceBinder = RuntimeBinder.GetMember(CSharpBinderFlags.InvokeSpecialName, _pi.Name, typeof (T), new [ ] { ThisArgument });
	        var sourceGetExpression = Expression.Convert(Expression.Dynamic(sourceBinder, typeof (object), _src), _pi.PropertyType);

	        var fieldName = String.Format("_{0}{1}", _pi.Name.Substring(0, 1).ToLower(), _pi.Name.Substring(1));

	        var fieldInfo = typeof (T).GetField(fieldName, BindingFlags.Instance | BindingFlags.NonPublic);
	        if (fieldInfo == null) return Expression.Default(typeof (void));

	        var destinationAccessExpression = Expression.Field(_dest, fieldInfo);
	        return Expression.Assign(destinationAccessExpression, sourceGetExpression);
	    });

We could further develop this approach by developing the BuildOperationXXX functions for Action<>s and Func<>s of arity 1, which would allow us to quickly develop fast, low-maintenance object printers and serializers.

I believe that this set of generating functions is likely to be more generally useful, so I’ve created a CodePlex project for this. Please visit it and let me know how you used it!

The Object Copier, revisited.

We saw in the previous post how we could improve the maintainability of reflection-based code by building lambdas using reflection, and then gain the performance benefits by applying the lambdas at execution time.

Taking the idea a step further, we can easily see that what we are really doing is generating a function to do the copy operation on the entire object and applying the copy function.

(T _source, T _destination) =>
    {
        _destination.Id = _source.Id;
        _destination.Name = _source.Name;
        ...
    }

Based on the approach we took last time, this is easily achievable by wrapping the individual assignments into a block statement, wiring through the parameter expressions from the block into the individual lambdas, and then compiling out a full copy. Thusly:

public static class Copier
{
    public static void CopyTo<T>(this T source, T destination) { CopyClosure<T>.Copy(source, destination); }

    private static class CopyClosure<T>
    {
        private static Action<T, T> BuildCopierLambda()
        {
            var sourceParameterExpression = Expression.Parameter(typeof(T), "_source");
            var destinationParameterExpression = Expression.Parameter(typeof(T), "_destination");

            var properties = typeof(T).GetProperties();
            var expressions = properties.Select(_pi => Expression.Assign(Expression.Property(destinationParameterExpression, _pi), Expression.Property(sourceParameterExpression, _pi)));

            var blockExpression = Expression.Block(expressions);
            var lambdaExpression = Expression.Lambda<Action<T, T>>(blockExpression, sourceParameterExpression, destinationParameterExpression);

            return lambdaExpression.Compile();
        }

        private static readonly Action<T, T> _copy = BuildCopierLambda();
        public static void Copy(T source, T destination) { _copy(source, destination); }
    }
}

There are a few things to note about the code:

The outer, public, static Copier class is non-generic to allow the CopyTo operation to be defined as an extension property on generic type T.

The compiler will create a special, unique CopyClosure class for each type T. This ensures that the _copy Action is unique for each type T. Also, the _copy Action is initialized in a thread-safe manner, because the compiler guarantees thread-safety on field initializers.

This approach actually improves performance, because there is no run-time iteration!

Stay tuned for more improvements…

Monday, June 27, 2011

A (Fast) Object Copier

A colleague of mine asked me today to help with the onerous task of “twisting-off” the data in an entity into a DTO. Something about serialization, he said…

He said that he didn’t mind if the copying was slow – just that he didn’t want to put in hundreds of lines of code each saying

destination.Id = source.Id;
destination.Name = source.Name;
destination.DateAndTimeOfBirth = source.DateAndTimeOfBirth;
destination.BirthWeightInKilograms = source.BirthWeightInKilograms;
destination.IsMultipleBirth = source.IsMultipleBirth;
destination.BirthOrder = source.BirthOrder;
...

So I sat down and knocked off the traditional reflection-based code:

internal static T Map<T>(T source, T destination) where T : class
{
    foreach (var pi in typeof (T).GetProperties())
    {
        pi.SetValue(destination,
                    pi.GetValue(source, BindingFlags.GetProperty | BindingFlags.Instance, null, null, null),
                    BindingFlags.SetProperty | BindingFlags.Instance,
                    null,
                    null,
                    null);
    }
    return destination;
}

(I’ve removed the error checking just to keep things simple for this post)

Neat and all, and does the job.

But I couldn’t help wondering if I could do better – specially since I knew that the performance would potentially be too terrible to bear…

Express(ion) yourself!

It would be nice to use the tricks we learned with expressions to create, for a given type, a set of assignment expressions for each of the properties, and then simply apply all expressions one by one to the source and destination, and that way the cost of reflection is borne only once.

Fair enough. I got home, and after dinner, knocked off this piece of code:

public static class FastCopier
{
    public static T Copy<T>(T source, T destination) where T : class
    {
        return FastCopierInternal<T>.Copy(source, destination);
    }

    private static class FastCopierInternal<T>
    {
        private static readonly IDictionary<PropertyInfo, Action<T, T>> _copyFunctions = typeof (T).GetProperties().ToDictionary(_pi => _pi, GetCopyFunc);
    
        private static Action<T, T> GetCopyFunc(PropertyInfo propertyInfo)
        {
            Trace.WriteLine(String.Format("Creating mapping func for {0}.{1}", propertyInfo.DeclaringType.Name, propertyInfo.Name));
            var sourceParameterExpression = Expression.Parameter(typeof (T), "_source");
            var destinationParameterExpression = Expression.Parameter(typeof (T), "_destination");
    
            var sourceAccessExpression = Expression.Property(sourceParameterExpression, propertyInfo);
            var destinationAccessExpression = Expression.Property(destinationParameterExpression, propertyInfo);
    
            var sourceAssignmentExpression = Expression.Assign(destinationAccessExpression, sourceAccessExpression);
    
            var lambdaExpression = Expression.Lambda<Action<T, T>>(sourceAssignmentExpression,
                                                                   sourceParameterExpression,
                                                                   destinationParameterExpression);
    
            return lambdaExpression.Compile();
        }
    
        public static T Copy(T source, T destination)
        {
            foreach (var func in _copyFunctions.Values)
            {
                func(source, destination);
            }
    
            return destination;
        }
    }
}

Here we have a static class with a single public method to do the Copy for a given type T.

Internally, we have a static generic copier class which only gets instantiated once for each type T, and which builds up an associative array of the properties of the type and the function required to copy each property from source to destination.

Here’s what the lambda expression looks like as it is built:

When the public Copy method on the outer class is called, it calls the Copy method on the (implicitly initalized) inner static class, which applies each of the copy functions in turn to copy the source to the destination one property at a time.

Pretty cool. So how much faster are we?

Explicit copy of object 1000000 times took 204 ms
Fast copy of object 1000000 times took 790 ms
Slow copy of object 1000000 times took 23801 ms

Running over a million objects, the fastest, as expected, is when we explicitly code the DTO initialization, and let the compiler optimize.

The lambda version is around 3.5x slower, but still pretty darn fast compared with the basic, slow, all-reflection version – which is two orders of magnitude slower!

Not a bad option to have, when the development and maintenance cost of creating the DTOs and keeping them up to date is factored in!

Sunday, June 5, 2011

Memoization Is Your Friend

I just came across some gems regarding memoization here, here, here and here. As a side note, I’d like to go on record and say that Wes Dyer’s derivation of fixed-point combinators is second to none! Even I could understand what he was saying! :) Another great discussion of fixed-point combinators is here. For more theoretical underpinnings, this paper discusses memoization of fixed-point functions.

We’re being plagued by performance issues in the project I’m currently working on, and it would be really nice to have a quick way to to automatically memoize some functions to gain the benefits of laziness and good book-keeping.

So, here is my implementation of a Memoize() function, which uses the fixed-point combinator concepts explained in Wes’ blog.

public static Func<TArgument, TResult> MemoizeFix<TArgument, TResult>(Func<Func<TArgument, TResult>, Func<TArgument, TResult>> func)
{
    // ReSharper disable AccessToModifiedClosure
    Func<TArgument, TResult> funcMemoized = null;

    funcMemoized = func(_ => funcMemoized(_));
    funcMemoized = funcMemoized.Memoize();

    return funcMemoized;
    // ReSharper restore AccessToModifiedClosure
}

private static Func<TArgument, TResult> Memoize<TArgument, TResult>(this Func<TArgument, TResult> func)
{
    var _map = new ConcurrentDictionary<TArgument, TResult>();
    return _ => _map.GetOrAdd(_, func);
}

In MemoizeFix(), we create the funcMemoized variable as null so that we get around the definite assignment problem. Because we defer the assignment to after the application of the memoization, we actually want to have access to the modified closure.

The other nice thing with .NET 4.0 is the ability to use the ConcurrentDictionary as a thread-safe result cache, with the added bonus of a very terse invocation to GetOrAdd().

We can use the memoization routines this way:

private static readonly Func<int, long> FastFibonacci = Functional.MemoizeFix<int, long>(
    fib => n => (n <= 0 ? 0 : (n <= 2 ? 1 : fib(n - 1) + fib(n - 2)))
);

private static readonly Func<int, long> FastLucas = Functional.MemoizeFix<int, long>(
    luc => (n => FastFibonacci(n) + FastFibonacci(n + 2))
);

private static readonly Func<int, double> FastFactorial = Functional.MemoizeFix<int, double>(
    fact => n => n < 2 ? 1.0d : n*fact(n - 1)
);

For functions with more than one argument, I’d create a Tuple<> with all the arguments as the cache key as the simplest extension. Thusly:

private static Func<TArg1, TArg2, TResult> Memoize<TArg1, TArg2, TResult>(this Func<TArg1, TArg2, TResult> func)
{
    var _map = new ConcurrentDictionary<Tuple<TArg1, TArg2>, TResult>();
    return (_a1, _a2) => _map.GetOrAdd(new Tuple<TArg1, TArg2>(_a1, _a2), _ => func(_.Item1, _.Item2));
}

public static Func<TArg1, TArg2, TResult> MemoizeFix<TArg1, TArg2, TResult>(Func<Func<TArg1, TArg2, TResult>, Func<TArg1, TArg2, TResult>> func)
{
    // ReSharper disable AccessToModifiedClosure
    Func<TArg1, TArg2, TResult> funcMemoized = null;

    funcMemoized = func((_a1, _a2) => funcMemoized(_a1, _a2));
    funcMemoized = funcMemoized.Memoize();

    return funcMemoized;
    // ReSharper restore AccessToModifiedClosure
}

Pretty slick :)