Writing Your Own UriParser

I posted before about my displeasure using System.Uri for advanced scenarios. It won’t parse mailto-style URIs and extending it is just evil. Yet, I’d still recommend you use System.Uri for your URI needs, even if it won’t parse your URI.

 

Why? Well, with each version of the framework, Microsoft will work to make it better (at least that’s what they promise us). Your classes should version well across framework versions if you use their guidelines. Therefore, if you’re using FxCop to check your code against the guidelines, I’d recommend fixing violations of the “Use System.Uri instead of string” rule.

 

It just so happens that they kinda worked around my two earlier complaints: they deprecated those heinous protected methods; currently you only get a compiler warning, but it’s a start. They also provided a mechanism to parse URI schemes that the framework doesn’t know about: extend UriParser.

 

Before I start with the nitty-gritty details of building your own UriParser, I’d like to point out something amusing (well, amusing to total nerds who’d rather write about parsing URIs than play video games or watch Arrested Development). Go to the documentation on MSDN2 for UriParser, you’ll read this:

 

The UriParser class enables you to create parsers for new URI schemes. You can write these parsers in their entirety, or the parsers can be derived from well-known schemes (HTTP, FTP, and other schemes based on network protocols). If you want to create a completely new parser, inherit from GenericUriParser. If you want to create a parser that extends a well-known URI scheme, inherit from FtpStyleUriParser, HttpStyleUriParser, FileStyleUriParser, GopherStyleUriParser, or LdapStyleUriParser.

Notice something missing? That’s right, there is no fucking MailtoStyleUriParser!

 

Although I’m glad gopher is finally getting its due.

 

When the product feedback center came out, I figured I’d give it a shot so I entered a new feature request to get Uri to parse sip, sips and tel URIs. Read the comment at the bottom by Mr. Daniel Roth; you’ll see that writing your own mailto-style URI parser is easy because mailto is a “so rudimentary” scheme. That kinda begs the question: if it’s so rudimentary, then WHY ISN’T THERE ONE IN THE FRAMEWORK?

 

To add further to the fire, here’s the second paragraph from the UriParser class description:

 

Microsoft strongly recommends that you use a parser shipped with the .NET Framework. Building your own parser increases the complexity of your application, and will not perform as well as the shipped parsers.

Hmph.

I’m not going to go into detail about inheriting from GenericUriParser in this post. Instead, let’s suppose I’ve already done that. I think I should show you how to let the framework know about your parser so that it will load it and use it. I feel obligated to tell you about it because the documentation for these methods is really fucking terrible. Suppose I have two types called PresUriParser and SipUriParser both of which inherit from GenericUriParser. To tell the framework about these parsers, I call the UriParser.Register(uriParser, schemeName, defaultPort) method:

 

      UriParser.Register(new PresUriParser(), “pres”, 1);

       UriParser.Register(new SipUriParser(), “sip”, 5060);

 

That browser of yours isn’t playing tricks: the first call has a port of -1; that means no default port. The terrible, terrible documentation doesn’t show that. Nor does it show that every argument passed is validated and an exception is thrown if validation fails. The exception table should be the following for argument validation:

 

Type of Exception

Reason

ArgumentNullException

if uriParser, schemeName is null

ArgumentOutOfRangeException

if schemeName has length 1 or violates the rules for scheme names through myriad different ways.

ArgumentOutOfRangeException

if defaultPort >= 65535 or (defaultPort < 0 and defaultPort != -1)

 

But wait, there’s more! Suppose you get passed the argument validation, you’re not yet through the exception danger zone. Suppose you had code like this:

 

      PresUriParser p = new PresUriParser();

       UriParser.Register(p, “pres”, 1);

       UriParser.Register(p, “presence”, 5060);

 

This will throw an InvalidOperationException; apparently, you can only have one instance parse one scheme, even if your parser can parse multiple schemes. This makes sense, especially in multi-threaded scenarios, but does the documentation say this? Hell, no!

 

So you get the argument validation and now a potentially invalid operation. All undocumented. But I have more for you: a free set of knives!

 

Suppose you wanted to parse http on your own, with code like this:

 

  public class HttpUriParser : HttpStyleUriParser

  {

    public HttpUriParser() { }

  }

 

And then I register my parser, like so:

 

  UriParser.Register(new HttpUriParser(), “http”, 80);

 

Guess what? InvalidOperationException! That scheme is already registered.

Makes me wonder why there are all those parsers for known schemes if you can’t use them on the schemes that they parse.

 

There was a way to do it through the config file, but that content has been retired according to MSDN2. There appear to be some issues with Uri and security, outlined here in the breaking changes list. Perhaps Microsoft didn’t want others screwing up the perfectly good parsing for the known schemes, like they can by overriding the methods I mentioned last time.

 

Whatever the case, hopefully this quick article helps with the documentation for System.UriParser. Next time, we’ll get our hands dirty overriding methods and abusing Console.WriteLine().

XmlSerializer, Xsd.exe, Nullable<T> and you.

At work, I’ve been using Xsd.exe and XmlSerializer in V1.1 a lot lately. There are a number of things that aren’t satisfactory about both of them, but this post only talks about a few of them.  Since .NET 2.0 was just released, I began trying out a few things to see if they fixed things. You’ll see that they’ve fixed some issues, but there is still a lot left that they can do.

 

One of the big things Microsoft has said about .NET, and it’s true, is that it has built-in support for XML. There is a lot of support, which is handy, because of Microsoft’s marketing message, when .NET came out, was all: “XML! XML! XML! Use XML everywhere.” I detest working with XML, but there are tools that make it bearable like the XmlSerializer. There are times when you have no choice to use XML, but with the XmlSerializer, you can hide most of the XML and use normal classes. Likewise, Xsd.exe is pretty handy; a Swiss Army-knife like XML tool, it can take an assembly and produce a schema of the types; it can take an XML file and generate a schema based on that file; give it a schema to generate C# or VB classes or a strongly-typed DataSet; give it a kitchen sink, it’ll do something.

 

I use it to generate class files from a schema that typically is beyond my control. It generates some truly heinous code for you, embarrassing code; if the code were a person, it’d wear jogging pants to a wedding, laugh at the worst jokes and have terrible teeth.

 

Suppose I have an XML schema that defines a log file. You can click here to view it. It defines for me XML files like so:

 

<?xml version=1.0 encoding=utf-8?>

<log xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance

     xmlns:xsd=http://www.w3.org/2001/XMLSchema 

     xmlns=http://www.jasonkemp.ca/logentry.xsd

     name=MyLog>

  <logEntry category=1>

    <data>This is an entry in my log.</data>

    <timestamp>2005-10-31T20:22:35.75-08:00</timestamp>

  </logEntry>

  <logEntry category=2>

    <data>

This is another entry taking advantage of the fact that I don’t need a timestamp

</data>

  </logEntry>

</log>

 

Although the details of the schema aren’t important, there are two things I’d like to point out: Both category attributes (which are typed to long, an XML Schema-way of saying Int32) and timestamp elements are optional. Keyword: optional. You’ll see why in a second. Like I said earlier, I use Xsd.exe to generate class files from a schema. So if I pass that mother through the tool, I’ll get C# code on the other end.

 

Click here for the code generated by Xsd.exe V1.1.

Click here for the code generated by Xsd.exe V2.0.

 

You’ll see in the V1.1 file that what you get is quite appalling: public fields (aaaaaagggh), incorrect casing, etc. You should feel compelled to take what the tool generated, and add to it so it doesn’t suck so much. In this contrived case, I’d probably save some browsing around in a command shell by writing the whole thing myself, however, once the schema gets large enough (i.e. lots of complex types), then modifying what the tool gives you will save you some time. With tools like ReSharper, it’s pretty easy to add properties and constructors to make the type more usable.

 

Contrast that with the 2.0 version: properties are there now, but they still don’t take advantage of the XmlElementAttribute overload that will take a tag name. The classes are partial and liberally sprinkled with oodles of design-time attributes. These attributes are useless for my scenario, but may be used for some of the other scenarios that Xsd.exe supports. (I typically use the tool, keep the source files, and throw away the schema.)

 

However, note that in both files, there is a pattern for value types. This is what I really want to talk about. Remember that I said the schema defined the timestamp element and the category attribute as optional? In the generated class files, these values are represented by value types. And how do we represent value types that don’t have a value set? Not elegantly, for certain. So how does the Xsd.exe tool do this? Consider the category attribute; the tool generates this code (kinda, I had to make it better):

 

    private int category;

    [XmlAttribute(“category”)]

    public int Category

    {

       get { return this.category; }

       set { this.category = value; }

    }

 

    private bool categorySpecified;

    [XmlIgnore]

    public bool CategorySpecified

    {

       get { return this.categorySpecified; }

       set { this.categorySpecified = value; }

    }

 

In order for the XmlSerializer to know that this optional property has a value, there is an additional property, CategorySpecified, to tell the serializer that there is indeed a value. If it’s true, then there is a value, otherwise, there isn’t. The serializer uses this when both serializing and deserializing. When serializing, if the XxxSpecified values are false, then it won’t serialize Xxx property. This is good, because if there are lots of optional elements, we want the XML to stay lean to save bandwidth. However, as a type author, I don’t want this: the type is harder to use, because now a user of my type will have to set two properties to set a value or read two properties to get a value. Then they’ll curse my name and my future children for putting them through such torture.

 

As a way to get around it, I change the property implementation like so:

 

    public const int SentinelValue = 1;

    private int category;

    [XmlAttribute(“category”)]

    public int Category

    {

       get { return this.category; }

       set

       {

          this.category = value;

          this.categorySpecified = this.category != SentinelValue;

       }

    }

 

    private bool categorySpecified;

    [XmlIgnore]

    public bool CategorySpecified

    {

       get { return this.categorySpecified; }

       set { this.categorySpecified = value; }

    }

 

The bool is still there, because we need it for the XmlSerializer as mentioned above, however, now, programmers only have to set the Category property. They now have to know about a “no-value” value, but that can be documented. This method works even better if only a range of values are valid, which can be enforced through range checking and exceptions. If that is the case, the choice of “no-value” value is much easier.

 

With .NET 2.0, we get a host of new programming toys to play with. One of the less glamorous is nullable types. Nullable<T> is a generic value type that enables us programmers to express the “value type without a value” more succinctly. Nullable<T> will wrap the Xxx and XxxSpecified into one value type and you can check for null like a reference type. C# has some syntactic sugar to make them easier to use:

 

         int? i = null;

         Console.WriteLine(i == null); //prints true

         Console.WriteLine(i.HasValue);//prints false

 

which is the equivalent as saying:

 

   Nullable<int> i = null;

         Console.WriteLine(i == null);//prints true

         Console.WriteLine(i.HasValue);//prints false

 

They’re slower than using the real value type, but that’s an implementation detail. I’m no database guy, but I think it is equivalent to DB NULL for a field (correct me if I’m wrong). So working with the XmlSerializer like I’ve been, and watching the new framework developments unfold, a couple questions popped into my mind: Would it be possible to remove those XxxSpecified properties and just use Nullable types instead? Would the XmlSerializer treat them as equivalent, since, semantically, they are? Well, let’s find out. First, we’ll remove the XxxSpecified properties, then we’ll change the file generated so that both the category attribute and the timestamp element are nullable types:

 

    private int? category;

    [XmlAttribute(“category”)]

    public int? Category

    {

       get { return this.category; }

       set { this.category = value; }

    }

 

    private System.DateTime? timestamp;

    [XmlElement(“timestamp”)]

    public System.DateTime? Timestamp

    {

       get { return this.timestamp; }

       set { this.timestamp = value; }

    }

 

If we try to serialize an instance of this, we get the following exception nested in like three InvalidOperationExceptions (a quirk of the XmlSerializer) courtesy of the totally unhandy Exception Assistant (seriously, that’s the next Clippy): Cannot serialize member ‘Category’ of type System.Nullable`1[System.Int32]. XmlAttribute/XmlText cannot be used to encode complex types.

 

Bummer.

 

Well let’s see if it will work with elements; XmlElementAttribute can handle complex types. Change the file so that Category is no longer a nullable, and try to serialize it. We get the following XML:

 

<?xml version=1.0 encoding=utf-8?>

<log xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance

     xmlns:xsd=http://www.w3.org/2001/XMLSchema 

     xmlns=http://www.jasonkemp.ca/logentry.xsd

     name=MyLog>

  <logEntry category=1>

    <data>This is an entry in my log.</data>

    <timestamp>2005-10-31T22:37:26.140625-08:00</timestamp>

  </logEntry>

  <logEntry category=2>

    <data>

       This is another entry taking advantage of

       the fact that I don’t need a timestamp

    </data>

    <timestamp xsi:nil=true />

  </logEntry>

</log>

 

Open this bad boy in VS 2005 and watch the XML validator complain that the timestamp element is invalid, that it cannot be empty.

 

Total bummer.

 

Looks like my questions are answered in the negative. Nullable types are not supported by the XmlSerializer. However, since they were a late addition and a change was made regarding them late in the game, I’ll forgive them.

 

Besides, they should have something to do for .NET 3.0. 😉

Let’s take back the fun.

Over the past few weeks, Microsoft has provided plenty of juicy, new software for us developers: Vista Beta 1, WinFx Beta 1 RC 1. They’ve also announced quite a few name changes. Longhorn became Windows Vista. Avalon became Windows Presentation Foundation. Indigo became Windows Communication Framework.

I don’t know yet how I feel about the name Windows Vista, but it at least has some character to it. But Windows Presentation Foundation? Windows Communication Framework? Who wants to use those products? Why did some of the most exciting new software to come out of the Borg hive have to get boring-ass names? Did developers complain that Avalon and Indigo didn’t sound professional?

It makes me wonder what Word, Excel or Outlook would have been called if they were released today: Microsoft Document Editor Framework, Microsoft Data Table Manager, and Microsoft Electronic Mail Personal Organizer. The whole thing would be sold as the Microsoft Knowledge Worker Productivity Suite, rather than just Office. Would it dominate the market so thoroughly if it had used my clunky suggestions back when there were actual competitors? Couldn’t PowerPoint be renamed to Windows Presentation Foundation?

Furthermore, what “market segment“ are the Windows Presentation Foundation and Windows Communication Framework aimed at that renaming them would be thought necessary? I would think developers would be the ones to use them. Developers use products with names like NetBeans, Resharper and Watir; programming languages called Python, Perl, Java and Ruby. What would you rather use, dear reader: Windows Communication Framework or Indigo? Think of all the time, typing and paper saved if we were to go back to Avalon and Indigo? All those RDs, MVPs, and speakers at the PDC could save valuable seconds of their lecture time, so they can tell us how great the technology is rather than waste time saying “Windows Presentation Foundation.” Those giant bricks that tech publishers call books will be that much thinner if the Window Communication Framework was changed back to Indigo.

We should know what’s really going to happen during those talks, and in those books. We may have Ruby and Perl, but we also have XML, SVG, WMI, ASP, VB and AJAX. So Windows Presentation Foundation and Windows Communication Framework will become WPF and WCF, respectively. We don’t need more three-letter acronyms. In fact, Microsoft may not want those abbreviations. WPF and WCF: don’t they sound like organizations that dope-smoking granolas will throw rocks at cops for, like the WTO? Or maybe they sound like something the US Army is searching for in the far reaches of Iraq? Can a company like Microsoft afford those connotations?

Avalon and Indigo exist to make our developer lives easier. They allow us to write elegant code. Let’s make them keep the elegant names.

Who’s with me?

Update: Larry Osterman, venerable Microsoft old-timer, posts about Microsoft’s product naming. And I thought Windows Presenation Framework was bad. Sheesh!

Update 2: Adam Nathan, creator of pinvoke.net, chimes in on the whole WPF/WCF debate. Check the first comment: someone’s copying me 🙂

Norton Antivirus is a terrible application.

This weekend, my father had a computer problem. Since there’s a rule about computer nerds being help desk support for everyone they know, he asked me to help him. (To be fair, though, I did build the machine for him. Nor  did I complain: he makes me dinner just about every week, and he’s my dad.) Windows would stop loading after a little while, then the machine would reboot.

I’m not sure what happened, but running chkdsk a couple times fixed the disk errors. If there’s a problem with booting, that’s one of the first things I’ll do. I love chkdsk! It’s solved a number of problems for me in the past.

When was the last time you checked your disk for errors?

Disk problems are always traumatic: a cpu, a videocard, RAM are all replaceable if they break, but a disk isn’t; a disk has valuable data on it: pictures, video, documents, spreadsheets, music… I always get a little nervous dealing with disk issues. That and BIOS settings.

When was the last time you backed up your data?

Running chkdsk finally made it bootable, but it looked like some parts of the registry were reset, and some system libraries went missing. So then I did a System Restore. God bless System Restore! Just about everything went back to normal after System Restore. There were only a few things to fix after that: Outlook.pst was corrupt (no problem: there’s a tool to fix that, but let me know where the tool is in the damn dialog! I shouldn’t have to search my hard drive to find the tool. But it’s ok, Microsoft, you gave me chkdsk and System Restore, so you’re off the hook… this time.) and Norton Antivirus was going insane with stupid-ass dialogs every minute.

Only one app to reinstall: not bad. So I uninstall, I get an error. [Something I screwed up when I made the machine was keeping his old drive in the machine, so the new disk got assigned F:. I didn’t notice until Windows was totally installed, but by then it was too late. However, everything has worked just fine. Everything except for Norton Antivirus. I had to be called in to install the damn thing when he first bought it. I had to use subst.exe to do it.] I tell the uninstaller to ignore the error. I restart. I reinstall, using the hack discussed in the aside. I restart. Still gives me the stupid-ass dialogs. I give up trying to solve it myself and go to Symantec’s support page. The message was something to the effect of, “I can’t my Instant Messenger virus scanning service.” The first thing on their support page is to disable the service, so I open up “the Integrator“. Oh wait, something’s wrong: can’t even load the f’ing gui. (I’m guessing The Integrator is what the programmers call their main window, because that’s what shows up on the error dialog. And it’s an appropriate name: there’s like 50 executables to this piece of shit app.)

What’s their solution for that? You guessed it: uninstall. But I have to remove every Symantec product. So I start removing: 2 reboots to do that. Finally, we’re clean of Symantec products. I was tempted to leave it at that, but my father wants the peace of mind of an antivirus, so I carry on. I install Norton Antivirus…again. I reboot…again. I update it…again. I get the stupid-ass dialog…again. This time the Integrator loads, though, so I disable the shit-ass IM detection service and stop following the steps after that.

The really bad thing about this is Norton expects normal people to be capable of this. Granted: this is an extraordinary situation, but the uninstall should uninstall everything. And a reinstall should be able to fix the problem. Also, you should be able to install the damn thing on a non-standard drive letter. Bitches.

Raise your hand if you can’t wait for Microsoft to write an antivirus.

Norton Antivirus, if you were a person, I’d kick you in the nuts. Really hard. 

Windows Socket Version 2 API error code documentation

[Update: Looks Mike, future PM of System.Net and all ’round genius, cleared things up in the comments. Thanks, Mike. He gets 20 jasonkemp.ca points! Save those up and you may get a toaster. Or a beer since I know you and you live in Victoria. Incidently, before I checked for comments to this post, I opened up MSDN help, typed in “socket“ without quotes and no filter, and found the error codes in a few links. Who knew? 🙂 Now we just have to correct that documentation UdpClient.]

So I’m trying to play with UdpClient from the System.Net.Sockets namespace in good ol’ .NET 1.1.

I’m using the code from the Receive() example in MSDN Help. Almost exactly. Only in my version, the fucking thing always throws a SocketException. Luckily the SocketExpection has an ErrorCode property so that I can “refer to the Windows Socket Version 2 API error code documentation in MSDN for a detailed description of the error.” And yet, I still don’t know what’s going wrong because as far as my local MSDN Library, MSDN for the Internet, and Google are concerned there is no such fucking Windows Socket Version 2 API error code documentation. (In fact, if Google is worth its salt, this post will begin coming up when that string is searched.) All I can use to debug is the stack trace. And it’s not my code that’s being difficult so break points won’t work. Could it be XP SP2? Don’t know; ’cause I can’t for the life of me find that info on MSDN either. I know it’s not Windows Firewall because I tried this code with the Firewall turned off. I haven’t used the excellent Reflector on this either, but I don’t see how that’ll help right now. And I know how to use Google, in case you were wondering! Sometimes Microsoft really pisses me off.

     1:         [STAThread]
     2:         static void Main(string[] args)
     3:         {
     4:             UdpClient udpClient = new UdpClient();
     5:  
     6:             IPEndPoint endPoint = new IPEndPoint(IPAddress.Any,  0);
     7:             try
     8:             {
     9:                 byte[] received = udpClient.Receive( ref endPoint );
    10:  
    11:                 string recvMessage = Encoding.ASCII.GetString(received);
    12:  
    13:                 Console.WriteLine("recvMessage = {0}", recvMessage);
    14:             }
    15:             catch ( SocketException e)
    16:             {
    17:                 Console.WriteLine(e);
    18:             }
    19:             catch( Exception e )
    20:             {
    21:                 Console.Out.WriteLine("e = {0}", e);
    22:             }
    23:  
    24:             Console.ReadLine();
    25:         }

It always throws an exception on line 9. Can anyone tell me why this is failing? 10 jasonkemp.ca points, and my eternal gratitude, to anyone who can. 10 jasonkemp.ca points to anyone can point to good Winsock 2 API documenation also.