This is how REAL programmers patch systems

I was investigating an OutOfMemoryException today that occurred in a production intranet system. Fortunately by leveraging smart people like Joel Pobar the cause didn’t stay a mystery for very long (and we didn’t have to resort to the usual vadump and ADPlus route), and luckily the fix was as simple as changing a single Boolean parameter from true to false on one framework method call. We had a good repro in one of our test environments, but because of the vagaries of the build and deployment process it looked like it was going to be a bigger-than-expected deal to patch and re-deploy to verify the fix in the test environment. “But it’s just ONE assembly” I protested “surely we could just ILDASM , modify and then ILASM it”. So that is just what I did. It took a little longer than expected because I was a little disconcerted by the fact that the ILASM’d assembly was a few KB different in size to the original one. After reading this post by Kenny Kerr I felt relieved, and was ready to deploy my patched dll for testing, which went off smoothly.
Next came discussions of production – we didn’t have a scheduled outage where a properly build and patched version could be deployed for a few days. Joel, possibly mildly impressed with my ILASM bravado cooked up this proof-of-concept “zero downtime” approach involving WinDBG and modifying the JIT’d x86 code on-the-fly to show me how a REAL programmer does it. Yup - attach WinDBG, trace through a few memory addresses, modify one memory location and you’re good to go. Just to be clear, we never actually DID this (the WinDBG stuff), not even in testing, but I think Joel has shown us the way forward next time one of our managers asks how much downtime is required to patch a system.  Thank goodness for clusters and NLB, otherwise we all might have to actually know how to do this.

Comments

Steve Loughran
Apache ant has the <ildasm> <ilasm> tasks so you can decompile, patch and recompile as part of your automated build process. Sometimes even if you do have your build under control, you can’t stop tlbimport from screwing up the import, so patching is all you have left…
4/06/2007 1:44:00 PM
Paul Glavich
If you just compiled straight to production, and didn’t worry about all that testing/uat garbage, you wouldn’t have to resort to that black art voodoo … ;-)
5/06/2007 6:13:00 AM