Wednesday, April 8, 2009

Introducing Hiro, the World's Fastest IOC Container, Part I: Design Diary

Introduction

Have you ever had one of those moments where someone told you that your work sucked, and it inspired you to create something better? About a month ago,Alex Simkin sent me a message on the CodeProject forums for one of my articles saying that LinFu ranked second to last in performance among all the other containers, and that he was willing to show me the benchmark code that produced those numbers.

Eating the Humble Pie

Unfortunately, Alex was correct. Despite all of its features, LinFu landed a spot near the bottom of the pack, and needless to say, there had to be a better way to design a container so that it wouldn't have these bottlenecks.

"But...but...my container is dynamic and it's flexible!"

As an IOC container author myself, I've probably given that same excuse a dozen times over whenever someone complained that my framework was too slow. I never realized that the flexibility that I so touted in all my IOC articles was the same cause for all my performance headaches. Indeed, there had to be some way to improve these numbers, and at first, I thought adding more dynamic IL generation would solve the problem. After all, Lightweight Code Generation with DynamicMethod seems to be the trend nowadays among the other frameworks like Ninject, and that makes their code run faster, right?

Once again, I was wrong. DynamicMethods didn't make much of a performance impact because Ninject (which reportedly uses a lot of LCG its code) was actually the slowest among all of the IOC containers tested in the benchmark (Sorry Nate). Of course, this doesn't mean that the DynamicMethod approach is the cause of the slowdown; what it does suggest, however, is that piling more and more reflection onto the speed problem is not the solution. In addition, there were other frameworks in that benchmark (such as Funq) that didn't use any reflection at all, and yet, they still were taking significant performance hits on that benchmark. In fact, even the fastest among all the other containers--StructureMap--was still running forty-four times slower than the Plain/No Dependency Injection use case!

So the principle question is this: "Where is this bottleneck coming from, and how do I eliminate it?"


The Real Problem


As it turns out, the answer was staring me in the face all along: "It's the configuration, stupid", I thought to myself. The problem is that every major IOC container at the time of this post (such as Ninject, StructureMap, Unity, AutoFac, Castle, LinFu, etc) essentially has to trawl through each one of its dependencies just to instantiate a single service instance on every call, and practically no amount of optimization will ever compensate for the fact that they still have to "rediscover" any given part of an application's configuration in order to instantiate that one service instance. Needless to say, this rediscovery process wastes a huge amount of resources because these containers are actually "rediscovering" a configuration that (for all practical purposes) will rarely change between two successive method calls.

In layman's terms, this is akin to stopping and asking for directions at every intersection every time you want to leave your home to go to some other destination. There has to be some way to see the "whole map" and plan the trip ahead of time without having to stop for directions at every intersection. If you could plan all the possible routes on that trip ahead of time, then all the time you would have wasted asking for directions immediately vanishes.

In essence, that is what I did with Hiro. Hiro is an IOC container framework that reads the dependencies in your application ahead of time and actually compiles a custom IOC container that knows how to create those dependencies from your application itself. It uses absolutely no runtime reflection or runtime code generation, and since all your dependencies are discovered at compile time (that is, when the Hiro compiler runs), Hiro suffers zero performance penalties at runtime when instantiating your types.

Yes, you read that right: Hiro runs at 1:1 speed with a Plain/No DI configuration. Here's the results of the IOC container benchmark:




As you can see from the results above, the current crop of IOC containers (including LinFu) can only reach 2% of the speed of an application that does not use an IOC container. Now, let's take a look at Hiro's results:



If you don't believe it, then you can download and run the benchmarks yourself.

Like LinFu, Hiro is licensed under the terms of the LGPL, and you can preview the source code at this site. I'll also be starting a Hiro-contrib project, so if you want to add your own extensions, just email me at marttub@hotmail.com and I'll be more than happy to anyone who is interested. Thanks! :)

11 comments:

  1. Looks like a great idea.
    I'm starting to go over your code and I've got a question:
    Why did you choose to use extension methods on your DependencyMap class, instead of regular methods?

    ReplyDelete
  2. Hi Omar,

    I didn't want to pollute the DependencyMap class with multiple overloads of the same AddService method if it was really using an IDependency instance to add a particular IActivationPoint for a single dependency. Having a single AddService method on the DependencyMap class makes it easier to maintain, and at the same time, it keeps the code clean. HTH :)

    ReplyDelete
  3. What's the difference between "polluting" the main class or the static extension class ? You still need to test them both.
    Furthermore, since the extension methods are static methods, mocking them is out of the question (unless you use TypeMock).
    Where did you adopt that style? I'm trying to think whether I should use it myself in my classes and when.

    Another question:
    I couldn't find the generated assembly file. It looks as if the compilation was done completely in-memory which is cool. Do you know how it's done? Is this a Cecil trickery?
    I had to use windbg to dump the module's memory into a file (in order to open in reflector and see your generated code).

    One last thing: consider (as an optimization) persisting your generated assemblies to files, and load them if they're still applicable (this can be checked by hashing the DependencyMap and comparing the hash with the hash of the persisted assembly's DependencyMap - which could reside as metadata of the assembly file or in the assembly's name). This will make your solution even faster as you won't have to compile the container every time you run.

    Omer (not Omar) Mor.

    ReplyDelete
  4. Hi Omer,

    If you take a look at the extension methods, you'll notice that I'm actually extending the IDependencyMap interface rather than the DependencyMap class itself, and that makes it easy to test because you can mock out the IDependencyMap interface and see if the extension method is passing it the correct input.

    I adopted that style from LinFu when I started to notice that my classes had a lot of method overloads that had the same purpose but only turned out to be a bunch of helper methods that were attached to the class itself. In most cases, they were actually simple wrappers around the actual method call, and for me, it seemed more logical to extract these helper methods into extension methods so that 1) the end user would think that a class would have all these overloads and at the same time, 2) I could make the class easier to read since there were fewer methods that I would have to deal with when debugging the target class.

    Lastly, since I was extending an interface rather than a concrete type, the extension methods that I added applied to all interface instances of that type (or in this case, IDependencyMap), and it gave me the extensibility of polymorphism by using an interface, and at the same time, it made each interface easier to use since the extension methods acted as a facade around some of the boilerplate code that you would have otherwise had to write if the extension methods weren't there in the first place.

    About your other question: The in-memory compilation isn't a Cecil trick--I actually used some techniques from LinFu 2.0's IL library to write a Cecil assembly out to a bytestream and then used System.Reflection to load the bytes into a running assembly.

    As for hashing the output of each compilation, I'm not sure if that's entirely necessary, given that Hiro is already fast enough as it is. The only task for the Hiro compiler is to generate the IMicroContainer instance.

    At the same time, there's no way to predict what the output might be if no two dependency maps can be the same--it might not be the best approach to cache the output if you can't come up with a reliable hash for the input.

    In practical terms, caching the assembly output has no effect on the speed of each generated compiler instance, given that the container is completely independent from the Hiro compiler once it's out of the Cecil compiler.

    I think what would probably help you the most is if you could look at the generated assembly using reflector. The reason why it's so fast is that the generated GetAllInstances and GetInstance() methods are nothing but a chain of if statements in IL that generate a specific service instance depending on the input string and the given service type.

    In some ways, it resembles a lot of the "write an IOC container in 15 minutes" samples that have been shown on the web, but with one notable exception--each one of these MicroContainers that are generated by Hiro is customized for your configuration. In the end, the speed lies in the simplicity of the IL being generated, and it's certainly worth the study time if you want to come up with a few techniques of your own. Again, HTH :)

    ReplyDelete
  5. Thanks for the elaborate answer.

    I already "groked" the principle behind Hiro.
    My suggestion was to optimize the code-generation and compilation phase, not the container resolving logic (which is the same as writing my own custom container, and is great and cool idea - this is code-generation at its best).

    In a project I once wrote I used DynamicProxy (Castle's) to generate proxies that implemented RPC. I persisted the generated assemblies to save code-generation & compilation time when I could.

    Another question/suggestion:
    Is there any benefit to generation using Cecil over LCG using Expression Trees? I guess it's possible to write your container logic using expression trees, compile them, and inject them as delegates to a "hollow" container. It could also be handy as a way to make your container more dynamic: when a new service is being added to the container at runtime, you could just re-generate the expression trees. Another good side-effect is that your code would be easier to read and maintain because no IL is needed.

    I once wrote a codeproject article that used DynamicMethod LCG, and a reader suggested I use expression trees. The result was a slight improvement in performance, and a huge improvement in readability. You can read the article here: http://www.codeproject.com/KB/cs/EnumComparer.aspx

    Omer Mor.

    P.S.
    The in-memory assembly trick is great!

    P.P.S.
    In the same project I mentioned I also used your LinFu Closure class to curry some delegates - it helped a lot and I never got to thank you - so thanks! :-)

    ReplyDelete
  6. Looks like a fun intellectual experiment, but not of much use outside of the (severely) constrained scenario (not even plural) that it supports.

    If the purpose was to show that you can cut features until you can make the absolute fastest possible "container", then Hiro is a sure win.

    But I don't think it supports even the most crucial features required from a container such as container hierarchies, instance reuse and instance ownership (disposal) tracking. There's a reason why every other container supports those features: they are required in real world apps :)

    ReplyDelete
  7. @kzu I haven't cut any features out of Hiro--I just got started this week :)

    It takes a long time to write a solid IOC container with lifetime management, so it would be unrealistic to expect something so quickly from a container that is only a week old.

    That being said, Hiro will implement some of the same features that you would expect from an IOC container, albeit with a statically compiled twist--for example, lifetime support and IDisposal tracking will be woven in on a per-service basis instead of relying on reflection.

    Anyway, if the concern here is that Hiro isn't practical for the "real world", no need to be concerned--those features are coming. :)

    ReplyDelete
  8. Hi Omer,

    Working with expression trees makes things even more difficult from a performance standpoint because you have absolutely no control over the IL it generates. It does make it easier for users to generate dynamic IL, but Cecil is a tremendously powerful library, and using Expression trees doesn't even come close to what Cecil can really do.

    I usually take an "all or nothing" approach when it comes to generating IL--if you're going to go ahead and try to abstract yourself away from it, then there's no point in using it since the VB.NET/C# compiler already does a good job of generating that IL for you. If you embrace it, however, then there's really no limit to some of the things that you can do with it. :)

    ReplyDelete
  9. "IDisposal tracking will be woven in on a per-service basis instead of relying on reflection."

    You don't need reflection to check for IDisposable implementations. A quick "instance is IDisposable" is enough and doesn't go through reflection.

    Sometimes I wonder how much optimization really matters at some point. Especially when it starts getting in the way of features (support modular apps is a big requirement in my experience, where modules provide services "dynamically").

    But I'm certainly looking forward your explorations!

    ReplyDelete
  10. @kzu

    I think the main problem with all these additional IOC container features is that people are starting to stray from the fundamentals here. There used to be a time when an IOC container was nothing but a generalization over the Factory Pattern. I suspect that one can add these features (such as IDisposable tracking/management) to a container provided that they apply the SOLID principles to the whole container library itself rather than modifying it ad infinitum with new features.

    The Open-Closed Principle and the YAGNI principle seem to apply well in this case. My goal with Hiro is to keep it ultra-lightweight, and the best way to do that is to make it so that it can do the only following things:

    -Constructor/Property Injection
    -Create New Object instances as Transient instances or Singleton instances.

    It might seem like a ridiculously tiny feature set, but I don't think there's any need to reinvent the whole wheel here when you've got 5-7 different IOC container frameworks that doing a whole lot of other things like that IDisposal feature that you mentioned, and method interception.

    There are plenty of "fat" IOC containers out there, and I'm sure they're all capable of doing the same job well (including Funq, of course).

    My goal for Hiro is to target that <20% developer audience that only needs to use the bare minimum features to get a container up and running, and at the same time, Hiro is for developers who are not afraid to get their hands dirty if they need to use some feature X from some container and tie it in with Hiro.
    If they want to go pragmatic and mix Hiro with some other container for a speed boost, that's perfectly fine with me :)

    In the end, I think everyone wins with that scenario--the end user dev doesn't have to learn a wheel that I avoided reinventing, and at the same time, they can get that performance boost without leaving the confines of their favorite library (such as Funq, perhaps).

    Oh, and about Funq, Kzu--you might want to take a look at Hiro's DependencyMap implementation--it can do a full static analysis of a given assembly and tell you which types CAN be instantiated with the given dependencies. You might not make use of Hiro's compiler, but I'm sure that you will definitely find that class very, very useful in doing an automatic dependency analysis for Funq. In other words, you can use the same information to do AOT dependency resolution. Think about it :)

    ReplyDelete
  11. I think this comparison of DI frameworks might be of use here: http://blog.ashmind.com/index.php/2008/09/08/comparing-net-di-ioc-frameworks-part-2/

    YAGNI is a tough principle to apply when developing frameworks. You have to first and foremost think about possible and realistic usage scenarios, more than YAGNI up-front. Otherwise, you end up with infinite customization just to do the obvious things (think ObjectBuilder).

    I don't think container hierarchies and IDisposable tracking are YAGNIs or "esoteric" features. They are very much needed in every real-world app I worked on (think App-Session-Request-Thread hierarchy, all potentially using resources that require disposal. a container can make this a breeze).

    Note, btw, that I'm also on the minimalist side. Funq implements just those two features over your baseline. Although I have to do some real tests on webapp scenarios to see if that suffices. I think it will :).

    Regarding DependencyMap, unless I'm looking at the wrong class, it doesn't seem to be doing much ;) http://code.google.com/p/hiro/source/browse/branches/development/src/Core/DependencyMap.cs

    Although I might evaluate your compilation approach as an optional extension to Funq (not sure how much it's worth to optimize a quick dictionary lookup, though :S, given the additional complexity of the dynamics nature of the container hierarchy).

    ReplyDelete

Ratings by outbrain