C# Equality and order comparisons (Part 2)

Posted by : on

Category : C#

Introduction

This is a continuation of my previous post about equality in C#.

After looking at the == operator and the three Equals methods (the static object.Equals, the virtual object.Equals and the IEquatable <T> Equals) let’s continue with the GetHashCode() method, why it should always be implemented when we implement the Equals method and the remaining ways to check for equality in C#.

HashCodes

Both the Dictionary and Hashtable collections depend on hashes to perform fast equality checks between their elements. The reason they are fast is because equality checks depend on the hash primarily, which in C# is a 32 bit integer.

The return value of the object.GetHashCode must be the same between two objects for which the Equals method returns true. When we have referential equality, the hash code is calculated by an internal token which is unique for each instance. That is the default behaviour for our classes, but when we implement the Equals method to have value equality, then we have to find a way to calculate the hash code, so that is the same number for two objects that are considered equal.

Let’s remember the previous example where we implemented the IEquatable<T> interface in our EnemyClass.

public class EnemyClass : IEquatable<EnemyClass>
{
   public int Level { get; init; }
   public int HitPoints { get; set; }
   private readonly float _gold;

   public EnemyClass(float gold) => _gold = gold;
   
   public bool Equals(EnemyClass? other)
   {
      return other != null && GetType() == other.GetType() && Level == other.Level && HitPoints == other.HitPoints;
   }

   public override bool Equals(object obj)
   {
      return Equals(obj as EnemyClass);
   }
}

Now we have to find a way to implement the GetHashCode() so that it only depends from the Level and HitPoints fields. Fortunately C# has some methods that can help with that.

The first thing, we have to remember is that the hashcode has to be the same value for two objects that are considered equal, or for the same object if it hasn’t changed. But that doesn’t mean that we cannot have the same hashcode for two different objects.

The hashcode is an 32 bit integer and that means that there are 2^32 different values. Obviously we may have more than 2^32 different objects. For example we can have infinite strings.

For that reason although we should try to implement the GetHashCode() method in a way that gives different values for different objects, the collections that depend on the hashcode, also do an equality check between two objects that have the same hashcodes. That is why, it is important to have as many unique hashcodes as possible, so that the performance of these collections won’t suffer by also checking the Equals method for objects that aren’t equal but happen to have the same hashcode.

The second thing, is that the hashcodes are not the same, for the same object in every run of our program. Although the same object has the same hashcode in the same run, in multiple runs can have different hashcodes. For example the hashcode for a string will be different every time we run our program. So saving the hashcodes in a file is not a good solution for keeping track of our objects in multiple runs of our program.

Finally the last thing we have to remember, is that the data from which we calculate our hashcode has to be immutable. If that data changes, then obviously the hashcode will change and that will make our object inside the collection inaccessible.

After all that, let’s see the implementation:

public class EnemyClass : IEquatable<EnemyClass>
{
   public int Level { get; init; }
   public int HitPoints { get; set; }
   private readonly float _gold;

   public EnemyClass(float gold) => _gold = gold;
   
   public bool Equals(EnemyClass? other)
   {
      return other != null && GetType() == other.GetType() && Level == other.Level && HitPoints == other.HitPoints;
   }

   public override bool Equals(object obj)
   {
      return Equals(obj as EnemyClass);
   }

   public override int GetHashCode()
   {
      return Level.GetHashCode();
   }
}

Here are some things worth noticing:

  • Here we calculate the hashcode, only by the Level field, the reason is that the HitPoints field is mutable and if we change it after we have added our object to a collections that uses the hashcodes ( for example as a key in a dictionary), the hashcode would also change and our object would become inaccessible.
  • That though means that the search for our object will not be as efficient, because all the objects with the same level would have the same hashcode and then the equals method will have to run.
  • If our HitPoints field was immutable, for example public int HitPoints { get; init; } then we can use the Combine method of the HashCode class: return HashCode.Combine(Level, HitPoints);
  • Finally we can also create a HashCode object instead of the Combine method like this:
HashCode hash = new HashCode();
hash.Add(Level);
hash.Add(HitPoints);
return hash.ToHashCode();

again only if our HitPoints field was immutable, unless we make sure that the HitPoints field will not change, as long as our object is inside a collection that depends on hashcodes for equality.

ReferenceEquals

The static object.ReferenceEquals method ensures referential equality. The reason it exists, is that even if the == operator and the Equals method are overloaded, we still have a way to be sure that we use referential equality.

Effectively, it has the same result as when we cast our objects to the type object and then use the == operator.

Plug in protocols

Sometimes we might not want the equality methods we have for an object, to be used inside a collection.

For example we don’t want the strings "uppercase" and "UpperCase" to be considered equal in general, but to be considered equal when we use them as keys to a dictionary.

For that reason, C# has some plug-in protocols:

The IEqualityComparer and IEqualityComparer<T> interfaces

Those two interfaces are the same, we have a generic version and a non generic that uses the object class. They have two methods, the Equals and the GetHashCode which behave the same way as before, but take effect only when we pass the class that implements those interfaces as a parameter to our collection. For example:

public class EnemyClass
{
   public int Level { get; init; }
   public int HitPoints { get; set; }
   private readonly float _gold;

   public EnemyClass(float gold) => _gold = gold;
}

public class PluggedInEquality : IEqualityComparer<EnemyClass>
{
   public bool Equals(EnemyClass? enemy1, EnemyClass? enemy2)
   {
      if (ReferenceEquals(enemy1, enemy2)) return true;
      if (ReferenceEquals(enemy1, null)) return false;
      if (ReferenceEquals(enemy2, null)) return false;
      if (enemy1.GetType() != enemy2.GetType()) return false;
      return enemy1.Level == enemy2.Level && enemy1.HitPoints == enemy2.HitPoints;
   }

   public int GetHashCode(EnemyClass obj)
   {
      return HashCode.Combine(obj.Level, obj.HitPoints);
   }
}

then we can use it like this:

PluggedInEquality pluggedInEquality = new PluggedInEquality();
Dictionary<EnemyClass, string> dictionary = new Dictionary<EnemyClass, string>(pluggedInEquality);

As before, if we change the HitPoints while an object of the EnemyClass is being used as a key in the dictionary, the hashcode will change and our object will become inaccessible.

If we don’t want to implement both of those interfaces, there is also the EqualityComparer<T> abstract class that implements them and we only have to override one Equals and one GetHashCode method, if we derive from it.

Finally there is also the EqualityComparer<T>.Default method. It has the same behavior as the static object.Equals method, but first uses the Equals method from the IEquatable<T> interface if it is implemented so that it can avoid boxing and if not will use the object.Equals method.

ReferenceEqualityComparer.Instance

The ReferenceEqualityComparer.Instance returns a plug-in instance that will always perform referential equality.

The IStructuralEquatable interface

In the first part, we saw that structs always perform by default structural equality. Sometimes we may need structural value equality for other types.

For example we may need a way to check if the values of two arrays are the same. By creating a class that implements the IStructuralEquatable interface, we can pass this class as a parameter to the equals method, or we can use an equality comparer that already exists, for example:

string name1 = "John Smith";
string name2 = "JOHN SMITH";
string[] name1Array = name1.Split();
string[] name2Array = name2.Split();
IStructuralEquatable name1ArrayStatic = name1Array;


Console.WriteLine(name1.Equals(name2)); // false
Console.WriteLine(name1ArrayStatic.Equals(name2Array, StringComparer.CurrentCultureIgnoreCase)); // true

Conclusion

And that’s it for the equality in C#. It is a big subject but hopefully these two posts have in one place all the different ways we can have equality checks. In part 3, we will see the different ways of comparison that exist between objects.

Go to part 3 here

Thank you for reading, if you think I forgot something or if you have any questions or comments, you can use the comments section or contact me directly via the contact form or by email. Also if you don’t want to miss any of the new blog posts, you can always subscribe to my newsletter or the RSS feed.


About Giannis Akritidis

Hi, I am Giannis Akritidis. Programmer and Unity developer.

Follow @meredoth
Follow me: