For years, I thought I had the one, true answer to Equals() from seeing something in some MSDN article a long, long time ago – like 2002 or 2003. Or maybe it was on Brad’s or Krystof’s blog and freaked out because I wasn’t doing it. Whatever the case, I’d make sure to point out the “proper” way to do Equals() to my colleagues. And I always made sure that I’d do it the same way for all my types that needed Equals overridden. Then I decided to measure it.
So, what did I think the best way to do Equals was? Consider this type:
class MyClass { public int NumberValue; public string StringValue; }
If I were to write Equals the way I used to, I would write it the following way:
public override bool Equals(object obj) { if (obj == null || obj.GetType() != GetType()) return false; if (ReferenceEquals(obj, this)) return true; MyClass other = (MyClass) obj; return other.NumberValue == this.NumberValue && other.StringValue == this.StringValue; }
Note that the above implementation suffices the conditions for a robust Equals. The important part of Equals() is that it covers all the following cases:
- it returns false if obj is null;
- it returns false if obj is not the same type as this;
- it returns true if the references are the same;
- it doesn’t throw Exceptions; and
- it doesn’t allocate memory.
The actual evaluation of equality is per class and changes for every class. In the example above, the equality evaluation is the return statement comparing the string and the int of both MyClass instances. The above list of conditions is boilerplate and should be met for every Equals method you write.
So what’s the problem? My Equals method does everything in that list just fine. Right?
Two of the conditions are trivial to meet: the check for null and check for reference equality. The hard one to meet, perhaps because are there so many ways of doing it, is checking for the right type. In my method above, I check the type by comparing GetType() of both obj and this. If they aren’t equal, I return false. That turns out to be 5 times slower than the other two ways of doing it: the is and as operator.
The .NET Design Guidelines recommend you use the as operator to check the type rather than the is operator because it does the type check and assignment all at once. So let’s re-write the Equals method to use the as operator:
public override bool Equals(object obj) { if (ReferenceEquals(obj, this)) return true; MyClass other = obj as MyClass; if (other != null) return other.NumberValue == this.NumberValue && other.StringValue == this.StringValue; return false; }
This method meets all the conditions of a good Equals, but has the advantage of being pretty fast, faster than the first way I did it anyway. Since the gurus in Redmond recommend the as operator, you’d think that it’s the fastest: wrong! Check it:
public override bool Equals(object obj) { if (ReferenceEquals(obj, this)) return true; if (obj is MyClass) { MyClass other = (MyClass) obj; return other.NumberValue == this.NumberValue && other.StringValue == this.StringValue; } return false; }
Equals with the is operator and then casting is actually the fastest of them all (by about 10% when compared to the as operator). All three methods meet the conditions of a good Equals method, but the least intuitive one – to me at least – has the advantage of being the fastest. And it’s cheap speed, too: you get it just by implementing Equals the same way everytime for every type. You generally want Equals to be pretty fast because it will show up a lot in loops and operations on collections.
My point? Always measure – don’t assume you’re doing things right. It’s good to go back and think about the fundamentals once in a while.
So what is the best way to measure? I came across some crappy code in our codebase that seemed to always throw exceptions which were caught with an empty handler. The intent was to use the default behaviour for exceptions.
I found that a lower level object was null quite a lot, so instead of the exception handling, I added a test for null. I tried adding some time measurements, but they always returned 0 milliseconds.
I used the TimeSpan class and was looking at TotalMilliseconds. Both seemed equally fast, yet I seem to remember reading somewhere that exceptions like this took longer than checking for null (makes sense to me any way).
Thoughts?
First, empty catch blocks are a scourge. They should just be deleted and the consequences dealt with. Unfortunately, the coder who deleted the useless catch block will be assigned the bug, not the lazy idiot who put it there in the first place. I ran into that today, so it’s fresh in my mind. 🙂
Checking for null is probably much faster when you think about it: you’re checking references, whereas catching anything requires gathering stack info, creating an exception instance and then throwing it, and then popping the call stack until it’s caught and running all the finally blocks.
Also, checking for null is such a common occurrence, I suspect MS did something deep down in the CLR to optimize that, not that it needs that much optimizing, it’s just checking addresses.
I got tired of having all strings be non-null in our app so I did some digging and found that checking an empty string for null and empty is faster that checking it for just empty. Weird,eh?
Which leads me to the second thing you’re looking for: System.Diagnostics.Stopwatch is what I used to measure Equals(). I ran each class through a loop a million times to get some meaningful numbers. You need to run the test enough for the differences show up, but all the details of all that testing is a beyond the scope of a comment, but not a blog post…