“I was gonna blog tonight.”

I don’t know how many times I said that to myself over the past month. But after a while I could sense my readers were egging me on.

“Do it.”

“Do it,” they were saying, by not commenting and delete their subscriptions in their aggregators. I couldn’t let them down. I mustered my will and determination. I pressed on. There were times when I wavered, when I almost clicked that New Post link, but I resisted. All for you, dear reader. I didn’t want to bother you with my thoughts during the busiest time of the year.

I have no posts for the month of December 2005. That month does not exist on this blog. It might as well never have occurred.

This is not my resolution – it’s more of a tentative commitment – but I’m going to try to blog more. Not for me, but for you, dear reader.

Wait, don’t go, it’s for me, alright.

Writing about nerd-only topics like implementing your own UriParser is just too nerdy most of the time. I’ve decided to branch out and write about new things. I tend to read people’s blogs for the knowledge they can pass on; some are to make me laugh. That’s why I’m going to write about a couple things that my ego feels I know something about: the first, in support of this year’s resolution, is getting back in shape (I used to be a geek trapped in a hot guy’s body). Seriously, I’ve been feeling like an old man (who’s really fat)(and I’m only 27), and I don’t like it.

The second I’ll leave as a surprise (that’s called foreshadowing ;)) because I’m tired of typing.

Writing Your Own UriParser

I posted before about my displeasure using System.Uri for advanced scenarios. It won’t parse mailto-style URIs and extending it is just evil. Yet, I’d still recommend you use System.Uri for your URI needs, even if it won’t parse your URI.

 

Why? Well, with each version of the framework, Microsoft will work to make it better (at least that’s what they promise us). Your classes should version well across framework versions if you use their guidelines. Therefore, if you’re using FxCop to check your code against the guidelines, I’d recommend fixing violations of the “Use System.Uri instead of string” rule.

 

It just so happens that they kinda worked around my two earlier complaints: they deprecated those heinous protected methods; currently you only get a compiler warning, but it’s a start. They also provided a mechanism to parse URI schemes that the framework doesn’t know about: extend UriParser.

 

Before I start with the nitty-gritty details of building your own UriParser, I’d like to point out something amusing (well, amusing to total nerds who’d rather write about parsing URIs than play video games or watch Arrested Development). Go to the documentation on MSDN2 for UriParser, you’ll read this:

 

The UriParser class enables you to create parsers for new URI schemes. You can write these parsers in their entirety, or the parsers can be derived from well-known schemes (HTTP, FTP, and other schemes based on network protocols). If you want to create a completely new parser, inherit from GenericUriParser. If you want to create a parser that extends a well-known URI scheme, inherit from FtpStyleUriParser, HttpStyleUriParser, FileStyleUriParser, GopherStyleUriParser, or LdapStyleUriParser.

Notice something missing? That’s right, there is no fucking MailtoStyleUriParser!

 

Although I’m glad gopher is finally getting its due.

 

When the product feedback center came out, I figured I’d give it a shot so I entered a new feature request to get Uri to parse sip, sips and tel URIs. Read the comment at the bottom by Mr. Daniel Roth; you’ll see that writing your own mailto-style URI parser is easy because mailto is a “so rudimentary” scheme. That kinda begs the question: if it’s so rudimentary, then WHY ISN’T THERE ONE IN THE FRAMEWORK?

 

To add further to the fire, here’s the second paragraph from the UriParser class description:

 

Microsoft strongly recommends that you use a parser shipped with the .NET Framework. Building your own parser increases the complexity of your application, and will not perform as well as the shipped parsers.

Hmph.

I’m not going to go into detail about inheriting from GenericUriParser in this post. Instead, let’s suppose I’ve already done that. I think I should show you how to let the framework know about your parser so that it will load it and use it. I feel obligated to tell you about it because the documentation for these methods is really fucking terrible. Suppose I have two types called PresUriParser and SipUriParser both of which inherit from GenericUriParser. To tell the framework about these parsers, I call the UriParser.Register(uriParser, schemeName, defaultPort) method:

 

      UriParser.Register(new PresUriParser(), “pres”, 1);

       UriParser.Register(new SipUriParser(), “sip”, 5060);

 

That browser of yours isn’t playing tricks: the first call has a port of -1; that means no default port. The terrible, terrible documentation doesn’t show that. Nor does it show that every argument passed is validated and an exception is thrown if validation fails. The exception table should be the following for argument validation:

 

Type of Exception

Reason

ArgumentNullException

if uriParser, schemeName is null

ArgumentOutOfRangeException

if schemeName has length 1 or violates the rules for scheme names through myriad different ways.

ArgumentOutOfRangeException

if defaultPort >= 65535 or (defaultPort < 0 and defaultPort != -1)

 

But wait, there’s more! Suppose you get passed the argument validation, you’re not yet through the exception danger zone. Suppose you had code like this:

 

      PresUriParser p = new PresUriParser();

       UriParser.Register(p, “pres”, 1);

       UriParser.Register(p, “presence”, 5060);

 

This will throw an InvalidOperationException; apparently, you can only have one instance parse one scheme, even if your parser can parse multiple schemes. This makes sense, especially in multi-threaded scenarios, but does the documentation say this? Hell, no!

 

So you get the argument validation and now a potentially invalid operation. All undocumented. But I have more for you: a free set of knives!

 

Suppose you wanted to parse http on your own, with code like this:

 

  public class HttpUriParser : HttpStyleUriParser

  {

    public HttpUriParser() { }

  }

 

And then I register my parser, like so:

 

  UriParser.Register(new HttpUriParser(), “http”, 80);

 

Guess what? InvalidOperationException! That scheme is already registered.

Makes me wonder why there are all those parsers for known schemes if you can’t use them on the schemes that they parse.

 

There was a way to do it through the config file, but that content has been retired according to MSDN2. There appear to be some issues with Uri and security, outlined here in the breaking changes list. Perhaps Microsoft didn’t want others screwing up the perfectly good parsing for the known schemes, like they can by overriding the methods I mentioned last time.

 

Whatever the case, hopefully this quick article helps with the documentation for System.UriParser. Next time, we’ll get our hands dirty overriding methods and abusing Console.WriteLine().

Item Templates in Visual Studio 2005

When Beta 1 of the VS Express Editions came out, I wrote an article about Item Templates for Visual Studio 2005; a topic I was excited about ’cause I hate typing more than I have to (if I had a quarter for everytime I wrote [TestFixture]…). I always meant to update it for Beta 2.

Um, yeah: I didn’t do that, and when RTM came around, I promised myself, I’ll update it again. But I’m too lazy to do so for three reasons: 1) Most of you will create it with the Export Template… menu item under the File menu (which didn’t work in any beta I tried), so knowing about the XML Schema for vstemplate files won’t matter much; 2) If you really need to know about it, look at a few of the built-in templates: you’ll see a pattern, I reckon; finally 3) David Hayden already did a fantastic job of how to modify the existing templates and creating custom templates with the Export Template Wizard.

XmlSerializer, Xsd.exe, Nullable<T> and you.

At work, I’ve been using Xsd.exe and XmlSerializer in V1.1 a lot lately. There are a number of things that aren’t satisfactory about both of them, but this post only talks about a few of them.  Since .NET 2.0 was just released, I began trying out a few things to see if they fixed things. You’ll see that they’ve fixed some issues, but there is still a lot left that they can do.

 

One of the big things Microsoft has said about .NET, and it’s true, is that it has built-in support for XML. There is a lot of support, which is handy, because of Microsoft’s marketing message, when .NET came out, was all: “XML! XML! XML! Use XML everywhere.” I detest working with XML, but there are tools that make it bearable like the XmlSerializer. There are times when you have no choice to use XML, but with the XmlSerializer, you can hide most of the XML and use normal classes. Likewise, Xsd.exe is pretty handy; a Swiss Army-knife like XML tool, it can take an assembly and produce a schema of the types; it can take an XML file and generate a schema based on that file; give it a schema to generate C# or VB classes or a strongly-typed DataSet; give it a kitchen sink, it’ll do something.

 

I use it to generate class files from a schema that typically is beyond my control. It generates some truly heinous code for you, embarrassing code; if the code were a person, it’d wear jogging pants to a wedding, laugh at the worst jokes and have terrible teeth.

 

Suppose I have an XML schema that defines a log file. You can click here to view it. It defines for me XML files like so:

 

<?xml version=1.0 encoding=utf-8?>

<log xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance

     xmlns:xsd=http://www.w3.org/2001/XMLSchema 

     xmlns=http://www.jasonkemp.ca/logentry.xsd

     name=MyLog>

  <logEntry category=1>

    <data>This is an entry in my log.</data>

    <timestamp>2005-10-31T20:22:35.75-08:00</timestamp>

  </logEntry>

  <logEntry category=2>

    <data>

This is another entry taking advantage of the fact that I don’t need a timestamp

</data>

  </logEntry>

</log>

 

Although the details of the schema aren’t important, there are two things I’d like to point out: Both category attributes (which are typed to long, an XML Schema-way of saying Int32) and timestamp elements are optional. Keyword: optional. You’ll see why in a second. Like I said earlier, I use Xsd.exe to generate class files from a schema. So if I pass that mother through the tool, I’ll get C# code on the other end.

 

Click here for the code generated by Xsd.exe V1.1.

Click here for the code generated by Xsd.exe V2.0.

 

You’ll see in the V1.1 file that what you get is quite appalling: public fields (aaaaaagggh), incorrect casing, etc. You should feel compelled to take what the tool generated, and add to it so it doesn’t suck so much. In this contrived case, I’d probably save some browsing around in a command shell by writing the whole thing myself, however, once the schema gets large enough (i.e. lots of complex types), then modifying what the tool gives you will save you some time. With tools like ReSharper, it’s pretty easy to add properties and constructors to make the type more usable.

 

Contrast that with the 2.0 version: properties are there now, but they still don’t take advantage of the XmlElementAttribute overload that will take a tag name. The classes are partial and liberally sprinkled with oodles of design-time attributes. These attributes are useless for my scenario, but may be used for some of the other scenarios that Xsd.exe supports. (I typically use the tool, keep the source files, and throw away the schema.)

 

However, note that in both files, there is a pattern for value types. This is what I really want to talk about. Remember that I said the schema defined the timestamp element and the category attribute as optional? In the generated class files, these values are represented by value types. And how do we represent value types that don’t have a value set? Not elegantly, for certain. So how does the Xsd.exe tool do this? Consider the category attribute; the tool generates this code (kinda, I had to make it better):

 

    private int category;

    [XmlAttribute(“category”)]

    public int Category

    {

       get { return this.category; }

       set { this.category = value; }

    }

 

    private bool categorySpecified;

    [XmlIgnore]

    public bool CategorySpecified

    {

       get { return this.categorySpecified; }

       set { this.categorySpecified = value; }

    }

 

In order for the XmlSerializer to know that this optional property has a value, there is an additional property, CategorySpecified, to tell the serializer that there is indeed a value. If it’s true, then there is a value, otherwise, there isn’t. The serializer uses this when both serializing and deserializing. When serializing, if the XxxSpecified values are false, then it won’t serialize Xxx property. This is good, because if there are lots of optional elements, we want the XML to stay lean to save bandwidth. However, as a type author, I don’t want this: the type is harder to use, because now a user of my type will have to set two properties to set a value or read two properties to get a value. Then they’ll curse my name and my future children for putting them through such torture.

 

As a way to get around it, I change the property implementation like so:

 

    public const int SentinelValue = 1;

    private int category;

    [XmlAttribute(“category”)]

    public int Category

    {

       get { return this.category; }

       set

       {

          this.category = value;

          this.categorySpecified = this.category != SentinelValue;

       }

    }

 

    private bool categorySpecified;

    [XmlIgnore]

    public bool CategorySpecified

    {

       get { return this.categorySpecified; }

       set { this.categorySpecified = value; }

    }

 

The bool is still there, because we need it for the XmlSerializer as mentioned above, however, now, programmers only have to set the Category property. They now have to know about a “no-value” value, but that can be documented. This method works even better if only a range of values are valid, which can be enforced through range checking and exceptions. If that is the case, the choice of “no-value” value is much easier.

 

With .NET 2.0, we get a host of new programming toys to play with. One of the less glamorous is nullable types. Nullable<T> is a generic value type that enables us programmers to express the “value type without a value” more succinctly. Nullable<T> will wrap the Xxx and XxxSpecified into one value type and you can check for null like a reference type. C# has some syntactic sugar to make them easier to use:

 

         int? i = null;

         Console.WriteLine(i == null); //prints true

         Console.WriteLine(i.HasValue);//prints false

 

which is the equivalent as saying:

 

   Nullable<int> i = null;

         Console.WriteLine(i == null);//prints true

         Console.WriteLine(i.HasValue);//prints false

 

They’re slower than using the real value type, but that’s an implementation detail. I’m no database guy, but I think it is equivalent to DB NULL for a field (correct me if I’m wrong). So working with the XmlSerializer like I’ve been, and watching the new framework developments unfold, a couple questions popped into my mind: Would it be possible to remove those XxxSpecified properties and just use Nullable types instead? Would the XmlSerializer treat them as equivalent, since, semantically, they are? Well, let’s find out. First, we’ll remove the XxxSpecified properties, then we’ll change the file generated so that both the category attribute and the timestamp element are nullable types:

 

    private int? category;

    [XmlAttribute(“category”)]

    public int? Category

    {

       get { return this.category; }

       set { this.category = value; }

    }

 

    private System.DateTime? timestamp;

    [XmlElement(“timestamp”)]

    public System.DateTime? Timestamp

    {

       get { return this.timestamp; }

       set { this.timestamp = value; }

    }

 

If we try to serialize an instance of this, we get the following exception nested in like three InvalidOperationExceptions (a quirk of the XmlSerializer) courtesy of the totally unhandy Exception Assistant (seriously, that’s the next Clippy): Cannot serialize member ‘Category’ of type System.Nullable`1[System.Int32]. XmlAttribute/XmlText cannot be used to encode complex types.

 

Bummer.

 

Well let’s see if it will work with elements; XmlElementAttribute can handle complex types. Change the file so that Category is no longer a nullable, and try to serialize it. We get the following XML:

 

<?xml version=1.0 encoding=utf-8?>

<log xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance

     xmlns:xsd=http://www.w3.org/2001/XMLSchema 

     xmlns=http://www.jasonkemp.ca/logentry.xsd

     name=MyLog>

  <logEntry category=1>

    <data>This is an entry in my log.</data>

    <timestamp>2005-10-31T22:37:26.140625-08:00</timestamp>

  </logEntry>

  <logEntry category=2>

    <data>

       This is another entry taking advantage of

       the fact that I don’t need a timestamp

    </data>

    <timestamp xsi:nil=true />

  </logEntry>

</log>

 

Open this bad boy in VS 2005 and watch the XML validator complain that the timestamp element is invalid, that it cannot be empty.

 

Total bummer.

 

Looks like my questions are answered in the negative. Nullable types are not supported by the XmlSerializer. However, since they were a late addition and a change was made regarding them late in the game, I’ll forgive them.

 

Besides, they should have something to do for .NET 3.0. 😉