Xml Serializer Redux

Technology

.NET Framework 1.1

.NET Framework 2.0 beta 1

<?xml:namespace prefix = o ns = “urn:schemas-microsoft-com:office:office” /> /o:p

Some time ago I wrote about some issues with the Xml Serializer, in particular the possibility of inadvertently leaking memory when using some of its constructors. People continue to be burned by this, so I thought it was worth drawing people’s attention to. Also with a new version of the .NET framework on the horizon I thought I would take a look at the changes to the serializer in this updated version of the framework.

 /o:p

The dynamically generated Reader/Writer Pair/o:p

The Xml Serializer takes the class you wish to serialize as a constructor argument (in addition to a number of extra parameters such as namespaces and other types in different overloads). The first time an Xml Serializer is created for a particular type the Xml Serializer uses the constructor argument to generate a specialized reader and writer for that type. This code is generated in C# in the path C:\Documents and Settings&lt;user name>\Local Settings\Temp (assuming documents & settings is located on C). It then compiles the reader and writer into a temporary assembly which it loads into memory to allow your type to be serialized or deserialized. It then deletes the temporary files. The deletion of these temporary files can be disabled by setting the following entry in your application config file if you want to look at them [via cazu http://weblogs.asp.net/cazzu/archive/2003/10/21/32822.aspx]

<system.diagnostics>/o:p

  <switches>/o:p

    <add name=“XmlSerialization.Compilation” value=“4”/>/o:p

  </switches>/o:p

</system.diagnostics>

 /o:p

Leaking Memory with the Xml Serializer/o:p

Depending on which constructor overload you use for the Xml Serializer, this code generation-compilation-loading process will happen once per type, or every time a Xml Serializer is created. Of  the 6 different public constructors for the Xml Serializer only the constructor which takes a type, or a type and a namespace (string) will not result in a new temporary assembly being generated every time. These constructors use an internal cache. We can see this by inspecting the decompiled code for the Xml Serializer in .NET Framework 2.0

 /o:p

public XmlSerializer(Type type, string defaultNamespace)/o:p

{/o:p

       this.events = new XmlDeserializationEvents();/o:p

       if (type == null)/o:p

       {/o:p

              throw new ArgumentNullException(“type”);/o:p

       }/o:p

       this.tempAssembly = XmlSerializer.cache[defaultNamespace, type];/o:p

       if (this.tempAssembly == null)/o:p

       {/o:p

              lock (XmlSerializer.cache)/o:p

              {/o:p

                     this.tempAssembly = XmlSerializer.cache[defaultNamespace, type];/o:p

                     // more implementation goes here/o:p

 /o:p

For all the other constructors a new temporary assembly is created each time, as can be seen in the code below.

 /o:p

public XmlSerializer(Type type, XmlAttributeOverrides overrides, Type[] extraTypes, XmlRootAttribute root, string defaultNamespace, string location, Evidence evidence)/o:p

{/o:p

    // some implementation details go here/o:p

    this.tempAssembly = /o:p

       XmlSerializer.GenerateTempAssembly(/o:p

       this.mapping, type, defaultNamespace, location, evidence/o:p

       );/o:p

}

 /o:p

Running my Xml Serializer test application which can be found here and creating and serializing 100 fairly simple objects using one of these non-caching memory leaking Xml Serializer constructors takes a few minutes. While it runs my test app and csc.exe collectively ensure there is NO system idle time and the amount of memory used constantly grows. When the application has finished executing it is consuming approximately 6MB more system memory than before. The application doesn’t hold a reference to a single variable – this is memory leakage caused by repeated loading of dynamically generated assemblies being into our current app domain so it can’t be freed by GC. Only when we close our application and the app domain is unloaded is that memory freed.

 /o:p

Compare this to the same application using a different Xml Serializer constructor overload. It executes in a matter of a few seconds and (although there is some marginal size increase) it is not significant.

 /o:p

This behavior is “by design” and will not be fixed in .NET framework 2.0 the rationale being that it would be too complicated to cache the temporary serialization assemblies when they vary by many parameters passed in to some of the more complicated constructors. The XML MVP project has created another cache http://weblogs.asp.net/cschittko/archive/2005/01/14/353435.aspx  that mimics the one used internally by the Xml Serializer, however use of this cache is “explicit” – you have to program against it instead of the regular Xml Serializer.

 /o:p

Pre-generation of Serialization Assemblies with sgen.exe/o:p

.NET Framework 2.0 includes a tool called sgen.exe to allow you to pre-generate your serialization assemblies. This is useful because it reduces that up-front performance hit caused by generating the dynamic serialization assembly containing the reader and writer. It also means that you don’t have any problems at run-time security problems if the account you’re running as does not have permissions to write to the required directory to generate the temporary assembly. This security problem never occurs with real logged in users, but can be a problem if ASP.NET is running with reduced privileges, and your web service stops working because ASP.NET can no-longer do any Xml Serialization.

 /o:p

Sgen.exe gets around this problem by pre-generating the serialization assembly. You specify an assembly and sgen.exe will create reader/writer classes and compile into a library. You can also optionally specify a single type, rather than have sgen.exe generate reader/writers for all types in the assembly you nominate.

 /o:p /o:p

Matt Tavis, Yasser Shohoud and Elliot Rapp in their MSDN article discussing Web Services enhancements in .NET 2.0 beta 1 describe the use of sgen.exe as follows:

This tool simply runs through all the types in an assembly and generates the necessary serialization code, which can be loaded at runtime. Once you are ready to deploy a Web service client, you can use this tool to generate a serialization assembly that can be shipped with your application’s assemblies to considerably reduce complex proxy instantiation times. Running this tool is as simple as:

C:&gt;sgen.exe MyProject.dll

This will generate MyProject.XmlSerializers.dll, which should be placed in the same directory with MyProject.dll. At runtime, when the XML serializer finds this assembly, it will use it for serialization of proxy types found in MyProject.dll.

This makes it sound very much like the generated assembly just needs to get deployed alongside the assembly containing the types that need to be serialized and the serializer will use it in preference to creating it’s own, however in beta 1 at least this is not the case when a serializer is created explicitly or when it is implicitly created by returning a type via a web service call. (if you know how to set it up differently to work please let me know). My demo for this is here. It will be interesting to see if/how this “association” between type and serializer is achieved. Even where serialization assemblies are pre-generated some of the constructor overloads to the Xml Serializer will still generate, compile and load a reader/writer pair every time they are called, resulting in the potential to leak memory.

 /o:p

Presumably you could just REFERENCE the generated serialization assembly and explicitly create the readers/writers it contains. This is what the documentation for sgen.exe says here:

 /o:p

The Data.XmlSerializers.dll assembly can be referenced from code that needs to serialize and deserialize the types in Data.dll

 /o:p

Sgen.exe today?/o:p

Reading through the decompiled code for the Xml Serializer’s static LoadGeneratedAssembly method for framework 2.0 beta 1 I found the following very interesting piece of code.

 /o:p

object[] objArray1 = type.GetCustomAttributes(/o:p

typeof(XmlSerializerAssemblyAttribute), false);/o:p

 /o:p

Looking at the way this attribute is used inside the Xml Serializer it allows a type to specify an assembly (and optionally codebase) where its serializer will be found. This is a .NET framework 2.0 feature, however quite interestingly it has also been back-ported to framework 1.1 in late 2004/early 2005 by Microsoft as part of a hotfix as described here: http://support.microsoft.com/kb/872800 the hotfix even has code for 2 different command-line pre-generation utilities. Does anybody know of any other types that have been back-ported from framework 2.0 to 1.1?

 /o:p

For people after something more conventional Daniel Cazzulino and the Xml MVPs have created a serialization pre-generator http://weblogs.asp.net/cazzu/archive/2004/10/21/XGenToolRelease.aspx

which is in turn based on this older command line utility. http://weblogs.asp.net/cazzu/archive/2004/08/02/SGen.aspx . The classes generated by this utility have to be explicitly referenced and used in place of the more generic XmlSerializer. No doubt there may be other tools which do similar “development-time” code generation.

 /o:p

Where are we?/o:p

All of the constructor overloads of the Xml Serializer cause a new type to be generated and compiled when they are first called for that type, resulting in a small initial performance hit. Some constructor overloads cause a new type to be generated, compiled and loaded every time they are called. This severely slows down performance and consumes memory which cannot be freed except by shutting down the application. Avoid these constructors if possible. In .NET framework 2.0 the sgen.exe utility will allow pre-generation of serialization assemblies which the Xml Serializer may just “pick up and use” in preference to generating it’s own (but I can’t get it to work L). Some overloads to the Xml Serializer will still generate temporary assemblies every time in framework 2.0, so these overloads should still be avoided. It also seems that Microsoft are not averse to back-porting types from .NET framework 2.0 to framework 1.1. C’mon guys – I’m sure all those ASP.NET programmers out there would like a hotfix to enable WebParts or Masterpages. Surely generics doesn’t need a WHOLE CLR version rev.

Comments

Shaul Dar
Very nice coverage of XML Serialization. But … the major pain I’ve been struggling with is that this causes 1st Web service invocation to be VERY slow, as aparently the proxy constructor creates serailization code for all methods. The solution requires (1) that we generate serailization code in design time, which we can do with the pre-generators described above (for Framework 1.1 we managed to do so with the MS hotfix 872800) and (2) that we make the proxy use the generated serialization DLLs instead of recreating them - this I don’t know how to do! Suggestions?

In 2.0 the proxy will look for these DLLs first, so that’s OK. In 1.1, using MS hotfix 872800 you must add a line to the client to change the behavior and make it use the pregenerated DLLs. Unfortunately hotfix 872800 also requires hotfix (SP 1 rollup) 890673, which (a)must be installed on each target computer, not only on the build machine, (b) has different versions for XP and 2003, © MS is making hard to get, though you can find link on Web (or call MS and wait…)
15/05/2005 5:07:00 AM