Guru Meditation: 2013

Wednesday, September 11, 2013

The "Calculator Kata" challenge

Calculator Kata

The calculator kata made by Roy Osherove is an exercise in test driven development and goes like this:

Try not to read ahead.
Do one task at a time. The trick is to learn to work incrementally.
Make sure you only test for correct inputs. there is no need to test for invalid inputs for this kata

Create a simple String calculator with a method int Add(string numbers)

The method can take 0, 1 or 2 numbers, and will return their sum (for an empty string it will return 0) for example “” or “1” or “1,2”
Start with the simplest test case of an empty string and move to 1 and two numbers
Remember to solve things as simply as possible so that you force yourself to write tests you did not think about
Remember to refactor after each passing test
Allow the Add method to handle an unknown amount of numbers
Allow the Add method to handle new lines between numbers (instead of commas).
the following input is ok: “1\n2,3” (will equal 6)
the following input is NOT ok: “1,\n” (not need to prove it - just clarifying)
Support different delimiters
to change a delimiter, the beginning of the string will contain a separate line that looks like this: “//[delimiter]\n[numbers…]” for example “//;\n1;2” should return three where the default delimiter is ‘;’ .
the first line is optional. all existing scenarios should still be supported
Calling Add with a negative number will throw an exception “negatives not allowed” - and the negative that was passed.if there are multiple negatives, show all of them in the exception message
stop here if you are a beginner. Continue if you can finish the steps so far in less than 30 minutes.
Numbers bigger than 1000 should be ignored, so adding 2 + 1001 = 2
Delimiters can be of any length with the following format: “//[delimiter]\n” for example: “//[]\n12*3” should return 6
Allow multiple delimiters like this: “//[delim1][delim2]\n” for example “//[][%]\n12%3” should return 6.
make sure you can also handle multiple delimiters with length longer than one char

With nothing better to do this afternoon I decided to to give it a try. This is what I came up with in the first version.

The tests:

   [TestMethod]
    public void Add_EmptyString_ReturnsZero()
    {
        var result = CreateCalculator().Add(string.Empty);

        Assert.AreEqual(0, result);
    }

    [TestMethod]
    public void Add_SingleNumber_ReturnsNumber()
    {
        var result = CreateCalculator().Add("1");

        Assert.AreEqual(1, result);
    }

    [TestMethod]
    public void Add_TwoNumbers_ReturnsSum()
    {
        var result = CreateCalculator().Add("12,2");

        Assert.AreEqual(14, result);
    }

    [TestMethod]
    public void Add_MultipleNumbers_ReturnsSum()
    {
        var result = CreateCalculator().Add("1,2,3");

        Assert.AreEqual(6, result);
    }

    [TestMethod]
    public void Add_UsingNewLineDeliminator_ReturnsSum()
    {
        var result = CreateCalculator().Add("1\n2,3");

        Assert.AreEqual(6, result);
    }

    [TestMethod]
    public void Add_CustomDeliminator_ReturnsSum()
    {
        var result = CreateCalculator().Add("//;\n1;2");

        Assert.AreEqual(3, result);
    }

    [TestMethod, ExpectedException(typeof(ArgumentOutOfRangeException))]
    public void Add_NegativeValue_ThrowsException()
    {
        CreateCalculator().Add("-1");            
    }

    [TestMethod]
    public void Add_NumbersLargerThan1000_ReturnsSum()
    {
        var result = CreateCalculator().Add("2,1001");

        Assert.AreEqual(2, result);
    }

    [TestMethod]
    public void Add_LongDelimiter_ReturnsSum()
    {
        var result = CreateCalculator().Add("//[***]\n1***2***3");

        Assert.AreEqual(6, result);
    }

    [TestMethod]
    public void Add_MultipleDelimiters_ReturnsSum()
    {
        var result = CreateCalculator().Add("//[*][%]\n1*2%3");

        Assert.AreEqual(6, result);
    }

    [TestMethod]
    public void Add_MultipleLongDelimiters_ReturnsSum()
    {
        var result = CreateCalculator().Add("//[***][%%%]\n1***2%%%3");

        Assert.AreEqual(6, result);
    }

And the implementation looked something like this:

public interface ICalculator
{
    int Add(string input);
}

public class Calculator : ICalculator
{
    public int Add(string input)
    {            
        if (HasCustomDelimiter(input))
        {
            string delimiters = StripOffNumbers(input);
            input = StripOffDelimiterMetadata(input);                
            return AddNumbers(input, GetDelimiters(delimiters));
        }

        return AddNumbers(input, GetDelimiters(string.Empty));
    }

    private string StripOffNumbers(string input)
    {
        return input.Substring(0, input.IndexOf('\n')).Substring(2);
    }

    private int AddNumbers(string input, string[] delimiters)
    {
        if (input.Length == 0)
        {
            return 0;
        }

        return SplitNumbers(input, delimiters).Select(ParseNumber).Where(n => n < 1000).Sum();
    }

    private static int ParseNumber(string s)
    {
        int result = int.Parse(s);
        if (result < 0)
        {
            throw new ArgumentOutOfRangeException();
        }

        return result;
    }

    private string[] SplitNumbers(string input, string[] delimiters)
    {
        return input.Split(delimiters, StringSplitOptions.None);
    }

    private string StripOffDelimiterMetadata(string input)
    {
        if (HasCustomDelimiter(input))
        {
            return input.Substring(input.IndexOf('\n') + 1);
        }

        return input;
    }

    private static bool HasCustomDelimiter(string input)
    {
        return input.StartsWith("//");
    }

    private string[] GetDelimiters(string customDelimiters)
    {
        var delimiters = new List<string> { ",", "\n" };
        if (string.IsNullOrEmpty(customDelimiters))
        {
            return delimiters.ToArray();
        }

        if (customDelimiters.StartsWith("["))
        {
            var longDelimiters = customDelimiters.Replace("[", "]").Split(new[] { ']' }, StringSplitOptions.RemoveEmptyEntries);
            delimiters.AddRange(longDelimiters);
            return delimiters.ToArray();
        }

        delimiters.Add(customDelimiters[0].ToString());
        return delimiters.ToArray();
    }

}

Reading through the code we can see that there are a lot of string searching,splitting and even replacements.

But it works and all tests passed. Great!!

The challenge

I presented this to a colleague of mine and he said that this is all fine, but he was still waiting to see the stream version. Stream version???...oh wait, you mean that I should read the input string character by character and never read the same character twice.

The description clearly states that no effort should be done to validate for invalid input apart from not accepting negative numbers.

The first thing to do is to forget all about finding custom delimiters and splitting strings, just follow this simple rule:

Read the string from start to end and ignore everything that is not a numeric value.

This is what I ended up with.

public class Calculator : ICalculator
{
    public int Add(string input)
    {
        double sum = 0;            
        double number = 0;
        foreach (var character in input)
        {                
            double numericValue = Char.GetNumericValue(character);                
            if (numericValue != -1.0)
            {                   
                if (number > 0)
                {
                    number = number * 10;
                }                                                
                number = number + numericValue;                   
            }                                
            else
            {                 
                sum = sum + (number >= 1000 ? 0 : number);
                if (character == '-')
                {
                    throw new ArgumentOutOfRangeException();
                }

                number = 0;
            }                
        }

        return (int)sum + (int)(number >= 1000 ? 0 : number);            
    }
}

All tests are still green.

Just goes to show that there are more than one approach to any given problem :)

Tuesday, September 10, 2013

OSVersion

MarkdownPad Document

Just recently I was involved in a discussion about logging and the need to include the name of the operation system and the service pack level in the application log file.

While you can get to this information in a number of ways, it is important that it will work as expected even for future versions of Windows.

The following code shows a console application that displays the name of the operating system along with the current service pack level.

class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Operating System : {0} ({1} bit) ", GetOperationSystemName(), IntPtr.Size == 4 ? "32" : "64");
        Console.WriteLine("Service Pack : {0}", Environment.OSVersion.ServicePack);
    }

    private static string GetOperationSystemName()
    {
        var searcher = new ManagementObjectSearcher("SELECT Caption FROM Win32_OperatingSystem");
        var managementObject = searcher.Get().OfType<ManagementObject>().FirstOrDefault();
        return managementObject != null ? managementObject["Caption"].ToString() : "Unknown OS";
    }
}

Running the console application displays the following:

.\OSVersion.exe

Operating System : Microsoft Windows 7 Enterprise  (64 bit)
Service Pack : Service Pack 1

Thursday, August 15, 2013

The fallacy of the single composition root?

Single Composition Root

This article discusses DI (dependency injection) and the pattern of having one single unique composition root.

Composition roots

When we talk about composition roots, we usually talk about a single unique location where services are registered with the IoC container. This is then executed at the application entry level.

This ensures that only the composition root needs to have a reference to the IoC container and we can make the switch to another container without going through the whole application and its assemblies. Life is good and everyone is happy.

Now let me tell you a sad story about an application using this exact approach and how we ended up rethinking the whole single composition root idea.

As with most projects, they start out small and performance problems are something that only exists in galaxies far far away.

In order to save time during development, we decided that we could just scan all the assemblies and register services using some sort of convention instead of explicitly register each service. This worked out great and developers did not have to think about anything else than to ensure that their assembly was present in the bin folder. We also executed the same composition root during unit testing which meant that we did integration tests with the same configuration as in production (This is actually a good thing).
Man, those were happy times...for a while.

First sign of trouble

One day I got an e-mail from one of my colleagues complaining about increased startup time. It also took a long time just to execute a single unit test. The application was only half way completed and we were already faced with a serious performance problem. As it turned out, the application had grown from your typical "10 assemblies greenfield project", into something that now consisted of 100 assemblies where each of these assemblies represented services that could potentially be requested from the container. It now took 10 seconds to start the application and 10 seconds to start a single unit test. Happy times were going bye-bye.

Note: Not discussing whether the container should be used in unit tests. Think of integration tests.

What's the problem

It did not took us long to realize that the problem lied within the fact that we now scanned 100 assemblies during the startup of the application. Even for unit tests that only really needed 5 of those assemblies to run, we loaded all 100 assemblies into the application domain. Of course we needed to, we did not know which 5 of those 100 assemblies that was needed for the test.
As for the application that was a desktop application, we created a really fancy splash screen to amuse the end user while all this assembly loading was going on in the background.
It did not take long for the amusement to wear off among the users.

No more assembly scanning

Having identified the assembly scanning as the root of our problems, we decided that each service should be explicitly registered with the container. This required a little more work from the developers, but that was okay. After all, we needed to fix this problem before it really got out of hand.

container.ScanAllAssemblies()

to this

container.Register<IFoo, Foo>();
container.Register<IBar, Bar>();
--followed by hundreds more

We were now faced with another issue. The assembly that contained the composition root, now had to reference all the other 99 assemblies so that the composition root could register the services. The fact that we now needed the scrollbar just to browse the references list should have made us think twice. Sadly it did not.

The problem remains

So after a major refactoring process to explicitly register services and abandon the assembly scanning, we were still seeing a significant delay during startup although it had improved a little. The improvement was relative meaning that the startup time kept increasing as more assemblies got added to the application. The improvement came from not having to do reflection on all the types in each assembly.

So how could it be that we still saw this behavior?

To answer this, we need to look a little closer into how assembly loading work in the CLR (Common Language Runtime).

The CLR tries to minimize the number of assemblies that goes into the application domain by only loading the assemblies that are actually needed. So when does an assembly get loaded?

It gets loaded when code that uses types from that assembly is executed. To be even more precise, it gets loaded when the code is compiled by the Just-In-Time compiler.

This is actually a very simple and clever approach because the number of assemblies loaded is now dependent on the actual path of execution.

Now lets review the code from our composition root.

container.Register<IFoo, Foo>();
container.Register<IBar, Bar>();
--followed by hundreds more

What we essentially have here is a method that bypasses all this cleverness enforced by the CLR.

Because we now have a method that references types from all those 100 assemblies, we force the CLR to load all the assemblies into the application domain. As bad as this in release builds, this is even worse during debug since all the assemblies and their symbol files needs to be loaded too. Starting the application or running a unit tests still provided a nice opportunity to check the latest news on Facebook.
Happy times now seem ages ago.

Laziness is good

As opposed to life in general, in computer science, laziness is really a good thing. Doing things only when we really have to makes for highly optimized applications in terms of performance and memory footprint.

Being an DI framework author, I'm going to explain how to deal with this problem using the LightInject service container. Although the remaining examples will target this container, I'm confident that similar solutions could be applied using other DI frameworks.

In LightInject there is an interface called ICompositionRoot that serves the purpose of registering services with the service container.

The interface is rather simple and looks like this:

/// <summary>
/// Represents a class that acts as a composition root for an <see cref="IServiceRegistry"/> instance.
/// </summary>
internal interface ICompositionRoot
{
    /// <summary>
    /// Composes services by adding services to the <paramref name="serviceRegistry"/>.
    /// </summary>
    /// <param name="serviceRegistry">The target <see cref="IServiceRegistry"/>.</param>
    void Compose(IServiceRegistry serviceRegistry);
}

When an assembly is scanned, LightInject will check to see if the assembly contains implementations of this interface and execute the Compose method. If no composition root is found, services will be registered using a set of conventions (not covered here).

This implies that we are back to scanning assemblies and we already stated that did not work out to well, didn't we?

Well, it's all a matter of how and when.

Lazy Registration

HOW

How we scan an assembly is important with regards to performance when registering services. We have already said that LightInject will use the composition root if found within the target assembly. This is essentially done by inspecting all types and check to see if one or more types implements the ICompositionRoot interface. While that certainly works, it is far from optimal as this could also lead to unnecessary assemblies to be loaded into the application domain.

The solution is to provide a little metadata at the assembly level to help the assembly scanner in the right direction.

/// <summary>
/// Used at the assembly level to describe the composition root(s) for the target assembly.
/// </summary>
[AttributeUsage(AttributeTargets.Assembly, AllowMultiple = true)]
internal class CompositionRootTypeAttribute : Attribute
{
    /// <summary>
    /// Initializes a new instance of the <see cref="CompositionRootTypeAttribute"/> class.
    /// </summary>
    /// <param name="compositionRootType">A <see cref="Type"/> that implements the <see cref="ICompositionRoot"/> interface.</param>
    public CompositionRootTypeAttribute(Type compositionRootType)
    {
        CompositionRootType = compositionRootType;
    }

    /// <summary>
    /// Gets the <see cref="Type"/> that implements the <see cref="ICompositionRoot"/> interface.
    /// </summary>
    public Type CompositionRootType { get; private set; }
}

The assembly scanner will first check the assembly for this attribute and execute the composition roots according to the type information from the attribute.

This saves us from inspecting each and every type in the assembly and provides the most efficient way of locating the composition root.

To summarize how this works:

Check for CompositionRootTypeAttribute. If found, execute composition root(s).
Check all types for implementations of the ICompositionRoot interface. If found execute composition root(s).
Register types according to a set of conventions.

WHEN

When the container encounters a service that cannot be resolved, it will scan the assembly that contains the requested service. After the assembly has been scanned, the container will continue to try resolving the requested service. If the service is still not found, the container will throw an exception as usual.
This allows us to implement multiple lazy compositions roots and hence minimize the number of assemblies loaded into the application domain at any given time during the lifespan of the application.

Example

This example shows a simple console application that consists of the application itself and two additional assemblies.

FooAssembly

public interface IFoo {}

public class Foo : IFoo {}

public CompositionRoot : ICompositionRoot
{
    public void Compose(IServiceRegistry registry)
    {
        registery.Register<IFoo, Foo>();
    }
}

BarAssembly

public interface IBar {}

public class Bar : IBar {}

public CompositionRoot : ICompositionRoot
{
    public void Compose(IServiceRegistry registry)
    {
        registery.Register<IBar, Bar>();
    }
}

As we can see each assembly has its own composition root and takes care of registering services from the containing assembly.

Further both the FooAssembly and the BarAssembly will have the CompositionRootTypeAttribute defined at the assembly level.

[assembly: CompositionRootType(typeof(CompositionRoot)]

The console application now only references the FooAssembly and the composition root within the FooAssembly will be lazily executed once the IFoo instance is requested.

The Main method of our console application would look something like this.

static void Main(string[] args)
{
    var container = new ServiceContainer();                
    IFoo foo = container.GetInstance<IFoo>();                
}

The FooAssembly is loaded into the application domain when the Main method is executed (compiled by the J.I.T), while the BarAssembly is left unloaded until we execute code that uses types from this assembly.

What's the downside?

There is always at least two sides to every story and even this has some obvious downsides.

First and foremost it means that every assembly that contains a composition root, would also need to have a reference to the assembly that contains the ICompositionRoot interface. This is a container specific interface and should ideally only be referenced at the "true" composition root (application entry).

It's all a matter of give and take while this certainly creates more references to the service container library, it is still far from the downsides of implementing the service locator pattern which basically means references to the container everywhere in our code.

Secondly, we encourage developers to explicitly registers their services in the appropriate composition root where we in the past got away with plain old assembly scanning.

While this might push more plumbing work over to the developers, it also becomes less "magic" and might become easier to debug. If the service is unable to resolved, it is probable just missing that one line of code that registers the service with the container. It's as simple as that really.

Update 8/19/2013 2:12:56 PM

The article has been updated to emphasize the meaning of having "distributed" composition roots.

Feedback

I would appreciate any feedback whether you think this looks cool or the worst idea ever :)

mail: bernhard.richter@gmail.com

twitter: @bernhardrichter