Why System.Uri sucks, part 2

Look up irony in a dictionary; do you know what you’ll find?

The definition of the word irony.

That may not help you, how about examples of irony? Alanis Morrisette’s song Ironic – none of the situations in the song are ironic, yet the name of the song is … Ironic; another example of irony is the recommendation, in a book about public APIs, of a class that has a pretty bad public API: System.Uri.

I wrote last time about System.Uri’s inability to parse mailto-like URI schemes. This time I’m going to talk about what you have to go through to if you want to remedy Uri’s problem: I’ll discuss the API exposed to inheritors.

Go to the documentation on MSDN about System.Uri; browse the members of Uri. It’ll show you one protected method in the list: the static EscapeString(). However, inherit from System.Uri, then type “override” followed by a space in Visual Studio 2003. You’ll see a number of methods that you can override: Canonicalize(), CheckSecurity(), Escape(), Parse(), and Unescape(). I believe that the reason they are not documented in MSDN is because they never should have shipped. Too bad Abrams and Cwalina didn’t point this mistake out when they advised using this class, like they did throughout the book with other classes.

Documentation probably wouldn’t make much difference because these methods aren’t very useful. They take no arguments, and return nothing (with the exception of Unescape() which takes a string and returns a string). There is no protected property exposing the string given to the constructor, so overriding the methods won’t help you. However, there’s nothing to stop a malicious, or incompetent, coder from overriding them and passing an instance of their class to your API. Will it compromise your system? I’m no security expert, but I don’t think so: they won’t be able to steal passwords with it. But they could take the system down: override Unescape() to return null and you’ll get a NullReferenceException.

Some further suckiness for this API is that they violate one of the rules that Abrams and Cwalina recommend in the Framework Design Guidelines book: don’t call virtual methods from constructors. Run the code below and you’ll see the order of execution.

using System;

public class MyUri : Uri

{

public MyUri(string uriString) : base(uriString)

{

Console.WriteLine(“In ctor”);

string escaped = EscapeString(uriString);

Console.WriteLine(“escaped = {0}”, escaped);

}

protected override void Canonicalize()

{

Console.WriteLine(“In Canonicalize()”);

}

protected override void CheckSecurity()

{

Console.WriteLine(“In CheckSecurity()”);

}

protected override void Escape()

{

Console.WriteLine(“In Escape()”);

}

protected override void Parse()

{

Console.WriteLine(“In Parse()”);

}

protected override string Unescape(string path)

{

Console.WriteLine(“In Unescape(path = {0})”, path);

return path;

}

}

class Program

{

static void Main()

{

new MyUri(pres:jason@example.com;param= pvalue”);

Console.ReadLine();

}

}

Yields the following output:

In Parse()

In Canonicalize()

In Escape()

In Unescape(path = ::-1pres:jason@example.com;param= pvalue)

In ctor

escaped = pres:jason@example.com;param=%20pvalue

Hmph. Not so hot.

They changed a lot in .NET 2.0, of course, so you can now override how to parse URIs that System.Uri doesn’t know about without extending System.Uri. Oh, and they deprecated the above methods, so you’ll get compiler warnings. I’ll talk more about to override UriParser next time. (I won’t follow the same title scheme next time, just to keep it interesting. So pay attention! J)

Item Templates in Visual Studio 2005

When Beta 1 of the VS Express Editions came out, I wrote an article about Item Templates for Visual Studio 2005; a topic I was excited about ’cause I hate typing more than I have to (if I had a quarter for everytime I wrote [TestFixture]…). I always meant to update it for Beta 2.

Um, yeah: I didn’t do that, and when RTM came around, I promised myself, I’ll update it again. But I’m too lazy to do so for three reasons: 1) Most of you will create it with the Export Template… menu item under the File menu (which didn’t work in any beta I tried), so knowing about the XML Schema for vstemplate files won’t matter much; 2) If you really need to know about it, look at a few of the built-in templates: you’ll see a pattern, I reckon; finally 3) David Hayden already did a fantastic job of how to modify the existing templates and creating custom templates with the Export Template Wizard.

Why System.Uri sucks, Part 1

I recently reviewed Framework Design Guidelines, the new book by Abrams and Cwalina about designing frameworks for .NET. In the review, I mentioned that I disagreed with their advice of using System.Uri (v.1.1) to represent some URIs. Here’s why.

 

The name System.Uri implies, to me, a URI, any URI. If your URI scheme follows the “rules” of the URI RFC (when .NET 1.1 came out, that would be RFC 2396), the System.Uri should be able to parse it, and you should be able to use the properties exposed by the type for information inside the URI: the host, the scheme, the user info, etc.

 

For the “big” URI schemes – http, ftp, file, mailto, news, and, of course, gopher – this is not a problem. They are parsed according to the rules and everything works as expected. Even if you decided to use your own, custom hierarchical URI scheme (http is an example of a hierarchical URI scheme), then System.Uri will parse it, as the code below demonstrates.

 

However, try giving System.Uri a mailto-style URI (i.e. something:user@example.com), and it’ll shit the bed. Then it’ll deny anything is wrong. You may be struggling to come up with an example, but, trust me, there are many; and they will quickly gain importance in our daily programmer lives: SIP, XMPP (kinda), and common presence and IM all represent users with mailto-style URIs. The only property that doesn’t return null with those URIs is Uri.Scheme. When I first found this out, I was shocked; I thought I was doing something wrong. One of the things that Brad and Krzysztof advise is not to surprise the user of your API, or go against their expectations.

 

Do you think System.Uri succeeds here?

 

using System;

 

class Program

{

   static void Main(string[] args)

   {

      Uri[] u = new Uri[7];

      u[0] = new Uri(“http://www.jasonkemp.ca/Rss.aspx”);

      u[1] = new Uri(“feed://www.jasonkemp.ca/Rss.aspx”);

      u[2] = new Uri(“purplemonkeydishwasher://www.jasonkemp.ca/Rss.aspx”);

 

      u[3] = new Uri(“mailto:jasonkemp@example.com”);

      u[4] = new Uri(“msn:jasonkemp@example.com”);

      u[5] = new Uri(“sip:jasonkemp@example.com”);

      u[6] = new Uri(“pres:jasonkemp@example.com”);

 

      for (int i = 0; i < u.Length; i++)

      {

         PrintHost(u[i]);

      }

 

      for (int i = 3; i < u.Length; i++)

      {

         PrintUserInfo(u[i]);

      }

   }

 

   private static void PrintUserInfo(Uri uri)

   {

      Console.WriteLine(“uri.UserInfo for {0} = {1}”,

uri, uri.UserInfo);

   }

 

   private static void PrintHost(Uri uri)

   {

      Console.WriteLine(“uri.Host for {0} = {1}”, uri, uri.Host);

   }

}

 

In part 2, I’ll discuss why extending Uri to make up for the above deficiencies will cause you to shit the bed.

XmlSerializer, Xsd.exe, Nullable<T> and you.

At work, I’ve been using Xsd.exe and XmlSerializer in V1.1 a lot lately. There are a number of things that aren’t satisfactory about both of them, but this post only talks about a few of them.  Since .NET 2.0 was just released, I began trying out a few things to see if they fixed things. You’ll see that they’ve fixed some issues, but there is still a lot left that they can do.

 

One of the big things Microsoft has said about .NET, and it’s true, is that it has built-in support for XML. There is a lot of support, which is handy, because of Microsoft’s marketing message, when .NET came out, was all: “XML! XML! XML! Use XML everywhere.” I detest working with XML, but there are tools that make it bearable like the XmlSerializer. There are times when you have no choice to use XML, but with the XmlSerializer, you can hide most of the XML and use normal classes. Likewise, Xsd.exe is pretty handy; a Swiss Army-knife like XML tool, it can take an assembly and produce a schema of the types; it can take an XML file and generate a schema based on that file; give it a schema to generate C# or VB classes or a strongly-typed DataSet; give it a kitchen sink, it’ll do something.

 

I use it to generate class files from a schema that typically is beyond my control. It generates some truly heinous code for you, embarrassing code; if the code were a person, it’d wear jogging pants to a wedding, laugh at the worst jokes and have terrible teeth.

 

Suppose I have an XML schema that defines a log file. You can click here to view it. It defines for me XML files like so:

 

<?xml version=1.0 encoding=utf-8?>

<log xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance

     xmlns:xsd=http://www.w3.org/2001/XMLSchema 

     xmlns=http://www.jasonkemp.ca/logentry.xsd

     name=MyLog>

  <logEntry category=1>

    <data>This is an entry in my log.</data>

    <timestamp>2005-10-31T20:22:35.75-08:00</timestamp>

  </logEntry>

  <logEntry category=2>

    <data>

This is another entry taking advantage of the fact that I don’t need a timestamp

</data>

  </logEntry>

</log>

 

Although the details of the schema aren’t important, there are two things I’d like to point out: Both category attributes (which are typed to long, an XML Schema-way of saying Int32) and timestamp elements are optional. Keyword: optional. You’ll see why in a second. Like I said earlier, I use Xsd.exe to generate class files from a schema. So if I pass that mother through the tool, I’ll get C# code on the other end.

 

Click here for the code generated by Xsd.exe V1.1.

Click here for the code generated by Xsd.exe V2.0.

 

You’ll see in the V1.1 file that what you get is quite appalling: public fields (aaaaaagggh), incorrect casing, etc. You should feel compelled to take what the tool generated, and add to it so it doesn’t suck so much. In this contrived case, I’d probably save some browsing around in a command shell by writing the whole thing myself, however, once the schema gets large enough (i.e. lots of complex types), then modifying what the tool gives you will save you some time. With tools like ReSharper, it’s pretty easy to add properties and constructors to make the type more usable.

 

Contrast that with the 2.0 version: properties are there now, but they still don’t take advantage of the XmlElementAttribute overload that will take a tag name. The classes are partial and liberally sprinkled with oodles of design-time attributes. These attributes are useless for my scenario, but may be used for some of the other scenarios that Xsd.exe supports. (I typically use the tool, keep the source files, and throw away the schema.)

 

However, note that in both files, there is a pattern for value types. This is what I really want to talk about. Remember that I said the schema defined the timestamp element and the category attribute as optional? In the generated class files, these values are represented by value types. And how do we represent value types that don’t have a value set? Not elegantly, for certain. So how does the Xsd.exe tool do this? Consider the category attribute; the tool generates this code (kinda, I had to make it better):

 

    private int category;

    [XmlAttribute(“category”)]

    public int Category

    {

       get { return this.category; }

       set { this.category = value; }

    }

 

    private bool categorySpecified;

    [XmlIgnore]

    public bool CategorySpecified

    {

       get { return this.categorySpecified; }

       set { this.categorySpecified = value; }

    }

 

In order for the XmlSerializer to know that this optional property has a value, there is an additional property, CategorySpecified, to tell the serializer that there is indeed a value. If it’s true, then there is a value, otherwise, there isn’t. The serializer uses this when both serializing and deserializing. When serializing, if the XxxSpecified values are false, then it won’t serialize Xxx property. This is good, because if there are lots of optional elements, we want the XML to stay lean to save bandwidth. However, as a type author, I don’t want this: the type is harder to use, because now a user of my type will have to set two properties to set a value or read two properties to get a value. Then they’ll curse my name and my future children for putting them through such torture.

 

As a way to get around it, I change the property implementation like so:

 

    public const int SentinelValue = 1;

    private int category;

    [XmlAttribute(“category”)]

    public int Category

    {

       get { return this.category; }

       set

       {

          this.category = value;

          this.categorySpecified = this.category != SentinelValue;

       }

    }

 

    private bool categorySpecified;

    [XmlIgnore]

    public bool CategorySpecified

    {

       get { return this.categorySpecified; }

       set { this.categorySpecified = value; }

    }

 

The bool is still there, because we need it for the XmlSerializer as mentioned above, however, now, programmers only have to set the Category property. They now have to know about a “no-value” value, but that can be documented. This method works even better if only a range of values are valid, which can be enforced through range checking and exceptions. If that is the case, the choice of “no-value” value is much easier.

 

With .NET 2.0, we get a host of new programming toys to play with. One of the less glamorous is nullable types. Nullable<T> is a generic value type that enables us programmers to express the “value type without a value” more succinctly. Nullable<T> will wrap the Xxx and XxxSpecified into one value type and you can check for null like a reference type. C# has some syntactic sugar to make them easier to use:

 

         int? i = null;

         Console.WriteLine(i == null); //prints true

         Console.WriteLine(i.HasValue);//prints false

 

which is the equivalent as saying:

 

   Nullable<int> i = null;

         Console.WriteLine(i == null);//prints true

         Console.WriteLine(i.HasValue);//prints false

 

They’re slower than using the real value type, but that’s an implementation detail. I’m no database guy, but I think it is equivalent to DB NULL for a field (correct me if I’m wrong). So working with the XmlSerializer like I’ve been, and watching the new framework developments unfold, a couple questions popped into my mind: Would it be possible to remove those XxxSpecified properties and just use Nullable types instead? Would the XmlSerializer treat them as equivalent, since, semantically, they are? Well, let’s find out. First, we’ll remove the XxxSpecified properties, then we’ll change the file generated so that both the category attribute and the timestamp element are nullable types:

 

    private int? category;

    [XmlAttribute(“category”)]

    public int? Category

    {

       get { return this.category; }

       set { this.category = value; }

    }

 

    private System.DateTime? timestamp;

    [XmlElement(“timestamp”)]

    public System.DateTime? Timestamp

    {

       get { return this.timestamp; }

       set { this.timestamp = value; }

    }

 

If we try to serialize an instance of this, we get the following exception nested in like three InvalidOperationExceptions (a quirk of the XmlSerializer) courtesy of the totally unhandy Exception Assistant (seriously, that’s the next Clippy): Cannot serialize member ‘Category’ of type System.Nullable`1[System.Int32]. XmlAttribute/XmlText cannot be used to encode complex types.

 

Bummer.

 

Well let’s see if it will work with elements; XmlElementAttribute can handle complex types. Change the file so that Category is no longer a nullable, and try to serialize it. We get the following XML:

 

<?xml version=1.0 encoding=utf-8?>

<log xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance

     xmlns:xsd=http://www.w3.org/2001/XMLSchema 

     xmlns=http://www.jasonkemp.ca/logentry.xsd

     name=MyLog>

  <logEntry category=1>

    <data>This is an entry in my log.</data>

    <timestamp>2005-10-31T22:37:26.140625-08:00</timestamp>

  </logEntry>

  <logEntry category=2>

    <data>

       This is another entry taking advantage of

       the fact that I don’t need a timestamp

    </data>

    <timestamp xsi:nil=true />

  </logEntry>

</log>

 

Open this bad boy in VS 2005 and watch the XML validator complain that the timestamp element is invalid, that it cannot be empty.

 

Total bummer.

 

Looks like my questions are answered in the negative. Nullable types are not supported by the XmlSerializer. However, since they were a late addition and a change was made regarding them late in the game, I’ll forgive them.

 

Besides, they should have something to do for .NET 3.0. 😉

Never, ever, ever, ever, ever ever ever, use the editor in .Text to write your posts

I just spent the last two hours writing an article for all of you about Nullable<T> and the XmlSerializer, but my blog engine prompted me with a login screen when I hit post and the post was subsequently lost. I may appear calm and civilized with my text here, but I’ve been swearing non-stop since it happened.

So here’s a note to myself: Print this out.

Dear Jason, 

Never, EVER use the editor in .Text to write a long post. You will lose it and the time you spent on it. Your 2 readers will lose out as well. They await with bated breath for every post. We both know you don’t post enough to satisfy them, so you must – must – not waste the time you spend on posts and then lose them. Use Word. Save the document. Then paste it in, hit Post.

Thank you,

Jason