At work, I’ve been using Xsd.exe and XmlSerializer in V1.1 a lot lately. There are a number of things that aren’t satisfactory about both of them, but this post only talks about a few of them. Since .NET 2.0 was just released, I began trying out a few things to see if they fixed things. You’ll see that they’ve fixed some issues, but there is still a lot left that they can do.
One of the big things Microsoft has said about .NET, and it’s true, is that it has built-in support for XML. There is a lot of support, which is handy, because of Microsoft’s marketing message, when .NET came out, was all: “XML! XML! XML! Use XML everywhere.” I detest working with XML, but there are tools that make it bearable like the XmlSerializer. There are times when you have no choice to use XML, but with the XmlSerializer, you can hide most of the XML and use normal classes. Likewise, Xsd.exe is pretty handy; a Swiss Army-knife like XML tool, it can take an assembly and produce a schema of the types; it can take an XML file and generate a schema based on that file; give it a schema to generate C# or VB classes or a strongly-typed DataSet; give it a kitchen sink, it’ll do something.
I use it to generate class files from a schema that typically is beyond my control. It generates some truly heinous code for you, embarrassing code; if the code were a person, it’d wear jogging pants to a wedding, laugh at the worst jokes and have terrible teeth.
Suppose I have an XML schema that defines a log file. You can click here to view it. It defines for me XML files like so:
<?xml version=“1.0“ encoding=“utf-8“?>
<log xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance“
xmlns:xsd=“http://www.w3.org/2001/XMLSchema“
xmlns=“http://www.jasonkemp.ca/logentry.xsd“
name=“MyLog“>
<logEntry category=“1“>
<data>This is an entry in my log.</data>
<timestamp>2005-10-31T20:22:35.75-08:00</timestamp>
</logEntry>
<logEntry category=“2“>
<data>
This is another entry taking advantage of the fact that I don’t need a timestamp
</data>
</logEntry>
</log>
Although the details of the schema aren’t important, there are two things I’d like to point out: Both category attributes (which are typed to long, an XML Schema-way of saying Int32) and timestamp elements are optional. Keyword: optional. You’ll see why in a second. Like I said earlier, I use Xsd.exe to generate class files from a schema. So if I pass that mother through the tool, I’ll get C# code on the other end.
Click here for the code generated by Xsd.exe V1.1.
Click here for the code generated by Xsd.exe V2.0.
You’ll see in the V1.1 file that what you get is quite appalling: public fields (aaaaaagggh), incorrect casing, etc. You should feel compelled to take what the tool generated, and add to it so it doesn’t suck so much. In this contrived case, I’d probably save some browsing around in a command shell by writing the whole thing myself, however, once the schema gets large enough (i.e. lots of complex types), then modifying what the tool gives you will save you some time. With tools like ReSharper, it’s pretty easy to add properties and constructors to make the type more usable.
Contrast that with the 2.0 version: properties are there now, but they still don’t take advantage of the XmlElementAttribute overload that will take a tag name. The classes are partial and liberally sprinkled with oodles of design-time attributes. These attributes are useless for my scenario, but may be used for some of the other scenarios that Xsd.exe supports. (I typically use the tool, keep the source files, and throw away the schema.)
However, note that in both files, there is a pattern for value types. This is what I really want to talk about. Remember that I said the schema defined the timestamp element and the category attribute as optional? In the generated class files, these values are represented by value types. And how do we represent value types that don’t have a value set? Not elegantly, for certain. So how does the Xsd.exe tool do this? Consider the category attribute; the tool generates this code (kinda, I had to make it better):
private int category;
[XmlAttribute(“category”)]
public int Category
{
get { return this.category; }
set { this.category = value; }
}
private bool categorySpecified;
[XmlIgnore]
public bool CategorySpecified
{
get { return this.categorySpecified; }
set { this.categorySpecified = value; }
}
In order for the XmlSerializer to know that this optional property has a value, there is an additional property, CategorySpecified, to tell the serializer that there is indeed a value. If it’s true, then there is a value, otherwise, there isn’t. The serializer uses this when both serializing and deserializing. When serializing, if the XxxSpecified values are false, then it won’t serialize Xxx property. This is good, because if there are lots of optional elements, we want the XML to stay lean to save bandwidth. However, as a type author, I don’t want this: the type is harder to use, because now a user of my type will have to set two properties to set a value or read two properties to get a value. Then they’ll curse my name and my future children for putting them through such torture.
As a way to get around it, I change the property implementation like so:
public const int SentinelValue = –1;
private int category;
[XmlAttribute(“category”)]
public int Category
{
get { return this.category; }
set
{
this.category = value;
this.categorySpecified = this.category != SentinelValue;
}
}
private bool categorySpecified;
[XmlIgnore]
public bool CategorySpecified
{
get { return this.categorySpecified; }
set { this.categorySpecified = value; }
}
The bool is still there, because we need it for the XmlSerializer as mentioned above, however, now, programmers only have to set the Category property. They now have to know about a “no-value” value, but that can be documented. This method works even better if only a range of values are valid, which can be enforced through range checking and exceptions. If that is the case, the choice of “no-value” value is much easier.
With .NET 2.0, we get a host of new programming toys to play with. One of the less glamorous is nullable types. Nullable<T> is a generic value type that enables us programmers to express the “value type without a value” more succinctly. Nullable<T> will wrap the Xxx and XxxSpecified into one value type and you can check for null like a reference type. C# has some syntactic sugar to make them easier to use:
int? i = null;
Console.WriteLine(i == null); //prints true
Console.WriteLine(i.HasValue);//prints false
which is the equivalent as saying:
Nullable<int> i = null;
Console.WriteLine(i == null);//prints true
Console.WriteLine(i.HasValue);//prints false
They’re slower than using the real value type, but that’s an implementation detail. I’m no database guy, but I think it is equivalent to DB NULL for a field (correct me if I’m wrong). So working with the XmlSerializer like I’ve been, and watching the new framework developments unfold, a couple questions popped into my mind: Would it be possible to remove those XxxSpecified properties and just use Nullable types instead? Would the XmlSerializer treat them as equivalent, since, semantically, they are? Well, let’s find out. First, we’ll remove the XxxSpecified properties, then we’ll change the file generated so that both the category attribute and the timestamp element are nullable types:
private int? category;
[XmlAttribute(“category”)]
public int? Category
{
get { return this.category; }
set { this.category = value; }
}
private System.DateTime? timestamp;
[XmlElement(“timestamp”)]
public System.DateTime? Timestamp
{
get { return this.timestamp; }
set { this.timestamp = value; }
}
If we try to serialize an instance of this, we get the following exception nested in like three InvalidOperationExceptions (a quirk of the XmlSerializer) courtesy of the totally unhandy Exception Assistant (seriously, that’s the next Clippy): Cannot serialize member ‘Category’ of type System.Nullable`1[System.Int32]. XmlAttribute/XmlText cannot be used to encode complex types.
Bummer.
Well let’s see if it will work with elements; XmlElementAttribute can handle complex types. Change the file so that Category is no longer a nullable, and try to serialize it. We get the following XML:
<?xml version=“1.0“ encoding=“utf-8“?>
<log xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance“
xmlns:xsd=“http://www.w3.org/2001/XMLSchema“
xmlns=“http://www.jasonkemp.ca/logentry.xsd“
name=“MyLog“>
<logEntry category=“1“>
<data>This is an entry in my log.</data>
<timestamp>2005-10-31T22:37:26.140625-08:00</timestamp>
</logEntry>
<logEntry category=“2“>
<data>
This is another entry taking advantage of
the fact that I don’t need a timestamp
</data>
<timestamp xsi:nil=“true“ />
</logEntry>
</log>
Open this bad boy in VS 2005 and watch the XML validator complain that the timestamp element is invalid, that it cannot be empty.
Total bummer.
Looks like my questions are answered in the negative. Nullable types are not supported by the XmlSerializer. However, since they were a late addition and a change was made regarding them late in the game, I’ll forgive them.
Besides, they should have something to do for .NET 3.0. đŸ˜‰
Hi, what about something like this (.NET 2.0)?
///
/// Latitude.
///
[XmlAttribute]
public double Lat
{
get { return (double)lat; }
set { lat = value; }
}
public bool LatSpecified
{
get { return lat.HasValue; }
}
Hi there. I would just like to let you know that nullable types can indeed be supported if you just tweak your pattern a little bit. Consider a design like this:
class MyClass
{
// this is the nullable value container
private int? number;
// A regular integer property accessor.
// If the private nullable container is not
// defined, it will return the default value.
[XmlAttribute(“number”)]
public int Number
{
get { return number != null ? number.Value : default(int); }
set { this.number = value; }
}
// This boolean property uses the nullable container
// to check whether some value has been assigned
// XmlSerializer and Xsd.exe will use this adequately.
public bool NumberSpecified
{
get { return number != null; }
}
}
In this way you can have full nullable semantics. The public object interface still does not support ‘int?’, but the interaction is semantically equivalent, since you can simply ask whether ‘NumberSpecified’ is true to know if the value has been assigned or not.
Or if you really need a nullable property, you can just expose that with a [XmlIgnore] tag.
Best regards,
Gonçalo