Wednesday, April 10, 2013

Building your own System.Reflection API from scratch, Part I: Choosing Nemerle

Sometimes you have to reinvent a better light bulb to understand how it works.

Introduction 

About three years ago, I decided to take a break from working on LinFu. Although I was happy with some of the work that I was doing with Cecil and IL rewriting, I wanted to understand the underlying abstractions that represent your every day .NET assembly. Even though there was some good IL rewriting work being done by other bytecoders like Simon Cropp, they only focus mostly on making small and surgical changes to assemblies, such as implementing the INotifyPropertyChanged interface, or making all public methods on a POCO class virtual.

For me, being able to make small changes to the IL wasn't enough. I wanted to understand how to manipulate .NET assemblies so that I could make some "big" changes, such as:

  • Type cloning
  • Dead type elimination
  • Code migrations
  • Modifying signed .NET BCL assemblies at runtime

Unfortunately, at the time of this post, there are currently no assembly manipulation tools that are capable of doing those things, and even the author of Cecil says that it doesn't support type cloning:
There's no easy way to move one type from a module toanother, as it involves taking a lot of decisions about what to dowith references. Also Mono.Merge is completely dead
Given that there are no tools that are capable of doing what I wanted to do with assemblies, and given that I wanted to master assembly manipulation, I decided to take the next logical step: I was going to build my own reflection API from scratch.

The Tao of Metaprogramming

In this series of small posts, I'll talk about some of the design decisions as well as share some of my design notes that I have as I continue to build Tao, which is my own reflection metaprogramming API.

Choosing Nemerle

When I first starting this project over two years ago, I needed to use a language with built-in support for Design by Contract features since I was essentially going to create a library that builds .NET assemblies from scratch, and since I was starting with nothing, I needed a language that was robust enough to be fault-intolerant enough to tell me where I was failing, and why I was failing. Those days were some challenging days for me because all I had was the CLR Metadata Specification as a reference, and there were no programs at all (including PEVerify) that would tell me what or where my mistakes were being made.

Essentially, I was flying blind, and I relied heavily on Nemerle to be able to explicitly state my assumptions as runtime assertions. For example, here's how you can use Design by Contract macros and Non-nullable type macros in Nemerle to write more reliable code. The [NotNull] macro ensures that NullReferenceExceptions will be all but impossible, and the Design by Contract syntax extensions ensure that the code is always in a valid state, and those extensions are invaluable when you need to build an API that has no room for mistakes. In reading or writing .NET executables, even a single byte in the wrong position can give you an invalid assembly. The world of compiling and decompiling can a very cold and unforgiving world, and I needed the best tools I could find to make sure that my API was doing exactly what I intended it to do.

Needless to say, building your own reflection API from scratch can be a very daunting task. Even today, as I look back on the work that I have already done with Tao, it's hard to imagine being able to get this far without the DbC language features that Nemerle has to offer, and in hindsight, I'm glad that I made that choice.

Coming up in the next post

In the next post, I'll talk about some of the challenges of reading a raw .NET portable executable and turning it into something meaningful that a program can understand. For example, what does the format of a .NET assembly look like? How is it different from say, an unmanaged DLL/EXE file? More importantly, how do you actually write tests that ensure that the bytes that you're reading into memory are exactly the same as the ones loaded from the disk? Those questions were just some of the issues that I had to solve, and in the next post, I'll tell you exactly how I solved them, as well as talk about some of the tools I had to (re)invent in order to solve those problems. Meanwhile, stay tuned!



2 comments:

  1. What about IKVM reflection
    http://weblog.ikvm.net/PermaLink.aspx?guid=d0dc2476-471b-45f3-96bf-a90bc2f5800b

    ReplyDelete
  2. Here's an email exchange I had with the author of IKVM:
    ----
    Hi Jeroen,

    Out of curiousity, how difficult would it be to modify IKVM.Reflection so that the end user can work directly with the raw metadata table rows and streams?

    Jeroen Frijters
    Feb 12

    to me
    Hi Philip,

    I've thought about it, but designing a good API is quite a bit of work and there is not much demand for it. Most people interested in the low level stuff use Cecil.

    I do try to provide higher level APIs that expose stuff that is missing from the Reflection APIs. I've written an ildasm clone (just the command line, not the GUI) and it can extract all the necessary information without going to the metadata tables directly.

    Regards,
    Jeroen

    ---

    TL;DR; It's a lot of work to modify IKVM to do something it wasn't inteded to do in the first place.

    ReplyDelete

Ratings by outbrain