In my last article on UriParser, I explained how to register your custom parser as well as outline the pitfalls you may encounter along the way. The documentation is sparse when it comes to UriParser and its descendents. This time I’ll start writing my own custom parser and explain the details of UriParser.GetComponents(), a virtual method that you will have to override for your custom parser.
Before I get into UriParser, it helps to know the terms for the parts of a URI, because there is some overlap with the terms and the Uri class uses them. Without looking up the RFC that defines URIs, they have the following format:
[scheme]://[userinfo]:[password]@[host|authority]:[port]/[path]?[query]#[fragment]
Something like that anyway; I’m probably missing something, but this will be good enough for the article. For http, the userinfo and password aren’t typically used; for email addresses, there is no path.
For my examples, Ill again use the SIP URI scheme as my example URI scheme. It is described thus (in RFC 3261, Section 19.1.1):
The “sip:” and “sips:” schemes follow the guidelines in RFC 2396 [5]. They use a form similar to the mailto URL, allowing the specification of SIP request-header fields and the SIP message-body.
Essentially, it looks a like mailto URI except the scheme is sip. For example: sip:jason@example.com is a valid SIP URI.
Creating a Custom UriParser
To create a custom parser, the documentation recommends inheriting from one of the well known URI scheme parsers: FtpStyleUriParser, HttpStyleUriParser, FileStyleUriParser, GopherStyleUriParser, or LdapStyleUriParser. However, if your scheme is mailto-like, then you have to inherit from either the base UriParser, or GenericUriParser, because there is no MailtoStyleUriParser. So for this article, I’ll inherit from GenericUriParser. However, suppose I was doing this for Subversion URLs or feed URIs, I’d inherit from HttpStyleUriParser.
My custom SIP URI parser class will have the following declaration:
public class SipUriParser : GenericUriParser{}
GenericUriParser has one public constructor that takes GenericUriParserOptions flags to configure the parser. These parser options map to an internal enum, UriSyntaxFlags, that were used in V1.x. One thing I should warn you about is that trying to follow whats going on with these classes will drive you insane. The reason it took so long for part 2 of my UriParser class? I was stuck in the corner of my bedroom, dried tears on my face, with my head and eyes making sharp, jerky movements left and right. I was still haunted by all the back and forth tracing one must do to figure out the relationship between UriParser and Uri. I think there’s a technical term for this kind of code. I realize re-writing working code is something that you shouldn’t do, but if you’re a giant software company with the best and brightest, as they claim, and you have five years to redesign a set of classes, then I think it’s fair to expect that you follow your own freakin’ pattern.
All the code for parsing the Uri class is still in the Uri class, as it was in V1.x. If you Reflector into the UriParser code, you’ll see that the base class has private const values for all XxxStyleUriParser sub-classes. It handles all the schemes that the framework handles by default: http, ftp, etc. by calling all the internal methods on the Uri class. This is code you don’t want associating with your code. It’s the type of code that smokes in the bathroom and wears leather jackets in those high-school movies about the 50s. Sure, it may seem cool to work with code so dangerous, but I have no idea what I was talking about. Hey, Golden Girls is on!
Where was I?
You can find the definition for GenericUriParserOptions online at MSDN. Since SIP URIs look like mailto URIs, I want the following GenericUriParserOptions: NoFragment, GenericAuthority and Default (For the example, it could be anything; make sure you study the rules for any URI scheme you have to implement). So, my SipUriParser consists of the following, right now:
using Options = System.GenericUriParserOptions;
public class SipUriParser : GenericUriParser
{
public SipUriParser()
: base(Options.NoFragment|Options.GenericAuthority|Options.Default)
{
}
}
If I were to register my parser as it is now and create a Uri instance that contained a SIP URI, like in the code below, then it would fail on the constructor: it throws a UriFormatException:
class Program
{
static void Main(string[] args)
{
UriParser.Register(new SipUriParser(), “sip”, 5060);
Uri u = new Uri(“sip:jason@jasonkemp.ca”); //fails here
PrintUri(u);
Console.ReadLine();
}
}
That’s because my parser is using the base method for InitializeAndValidate(). Everytime a Uri is created, it calls into the InitializeAndValidate method of the parser. The default method ends up in one of those labyrinthine calls through Uri which results in throwing a UriFormatException. The InitializeAndValidate method is the first chance to see if the string passed in the Uri constructor is a valid URI for that scheme. This is scheme-specific and since the myriad details of valid SIP URIs isn’t important to this article, I’ll just pretend all values passed are valid. I’ll add the following method to my parser:
protected override void InitializeAndValidate(Uri uri,
out UriFormatException parsingError)
{
Console.WriteLine(“In InitializeAndValidate()”);
parsingError = null;
}
Note that I don’t call the base method. Calling the base methods for any of the virtual methods is not only not recommended, its downright forbidden: you’ll get an InvalidOperationException if you call the base with a message saying, don’t do that! They probably should have made them abstract.
To access the string that was passed to the Uri constructor in the InitializeAndValidate() method, you should access it through the uri.OriginalString property. Any other property will cause GetComponents() to be called.
Overriding GetComponents()
In V1.1 of the framework, entering a valid URI string as an argument to the Uri constructor that Uri did not know how to parse would result in null values for any Uri property called. Microsoft’s 2.0-solution to that problem was the UriParser. The UriParser.GetComponents() method is where all the magic happens.
Suppose I created the following Uri instance with my parser registered:
Uri u = new Uri(“sip:jason@jasonkemp.ca”);
If I wanted to know the user info portion of the URI, I’d call u.UserInfo. This would, in turn, call SipUriParser.GetComponents() with the Uri instance, a UriComponents enum value and a UriFormat enum value as arguments. With those three things, the parser is responsible for returning the requested component of the URI. In my example, the UriComponents.UserInfo enum value would be passed. The UriFormat enum is used for character escaping.
The UriComponents enum is marked with the FlagsAttribute; thus, the values can be OR’ed together. In fact, there are some that youll never see in isolation; and some youll see often, sometimes isolated, sometimes not. Just for you, dear reader, I’ve compiled a table of what property on Uri causes a call to UriParser.GetComponent with what UriComponents enum values. The | character denotes that they are OR’ed together.
Uri Property |
UriComponents passed to GetComponents |
AbsoluteUri |
AbsoluteUri |
AbsolutePath |
Path | KeepDelimiter |
Authority |
Host | Port |
DnsSafeHost |
Two Calls: 1. Host 2. StrongPort |
Fragment |
Fragment | KeepDelimiter |
Host |
Host |
PathAndQuery |
PathAndQuery |
Port |
Two Calls 1. Host 2. StrongPort |
Query |
Query | KeepDelimiter |
UserInfo |
UserInfo |
But that’s not all: there is also a public Uri.GetComponents() method. This allows you to grab practically any combination of UriComponents enum values OR’ed together. Great design! Sure, the documentation says that invalid combinations will cause an exception to be thrown, but they don’t tell you which ones. Here at jasonkemp.ca, we go the extra mile for our 4 readers. Looking through Reflector, again, it shows that anything OR’ed with UriComponents.SerializationInfoString will throw an ArgumentOutOfRangeException before it reaches your parser; everything else is the responsibility of your parser to validate. So, even though asking for Scheme and Query at the same time is invalid for …oh… every URI scheme ever, you’ll have to put that argument checking into your parser, for every parser you write. Thanks for the reuse, Microsoft.
The idea and design of the UriParser/Uri relationship is basically a good one; its too bad the implementation is so butt ugly that I’d rather watch Golden Girls rather than write about it. The GetComponents method is the key to your custom parser. This article has uncovered some of the landmines you may encounter, hopefully before you got started on implementing your own URI parser.